If you are a SysOps person, you probably have to deal with a lot of data generated by your servers, and in order to run an efficient IT department, you must be able to receive almost real-time notifications when something goes wrong. While there are already a lot of tools out there, sooner or later we always end up with our own scripts to tie everything together.
When I started experimenting with Apache NiFi, I came to realize that there are better ways to manage your servers' data flow...
Apache NiFi is a dataflow tool that is quickly becoming quite popular in the Big Data world. According to the website, NiFi is:
...an easy to use, powerful, and reliable system to process and distribute data.
I think the Apache NiFi guys are being a bit too modest here :-) The way I would describe NiFi is:
Apache NiFi is a web-based tool that allows you to get data from almost any source, and transform/route it to almost any destination using an intuitive WYSIWYG workflow designer.
At the moment, you can receive/send data from/to the following data sources:
As a system operator, you probably deal with a lot of data already that needs to be processed and evaluated. Over the years, you probably developed your own solutions to deal with this data. Did you ever create scripts for one or more of these tasks:
If the answer is yes to any of the questions, then NiFi might be an asset for your IT environment. True, writing your own scripts to solve those issues can give you a high sense of satisfaction, but the most important issue with this approach is this:
System operators should be focused on the data your environment generates, and not the code that processes that data.
Okay, some readers are probably rolling their eyes right now, but allow me to elaborate. First, let me ask you a few questions about the integration-scripts you developed yourself:
As someone in system operations, you probably don't want to deal with all the "details" mentioned above, you just want to get your data, transform it to what you want it to be, and send it to where it needs to go.
Maybe you have a team of coders that can handle those issues mentioned above, but they are probably busy developing your company's product, and probably don't have the resources either to assist you every time. You might consider a proprietary solution, but most of the time you will be stuck with what the vendor offers. You want tools that adapt to your workflow, not the other way around. Apache NiFi is free, and allows you to create any workflow you want, with any data you want.
If you are already using an ElasticSearch-LogStash-Kibana (ELK) stack, you might wonder how Apache NiFi fits in. In my opinion, they are two different systems that complement each other:
I admit, I'm a big fan of ChatOps. Having a chat-room as the primary hub of communication for your operations team encourages teamwork, and makes it a lot easier to work with remote teams in different time zones as they have access to all the conversations that happened when they were still asleep :-)
One of the things I wanted, was a chat-room that acts as a live-feed of all the syslog messages generated by my servers. This is the first workflow I built in NiFi, and I was surprised I had everything up and running in less than 3 hours. Mind you, I had zero experience with NiFi when I built this, so I still needed to get the hang of it. If I had to develop this in a programming language I had no prior experience with, I think it would have taken longer than 3 hours.
I use HipChat for team chatrooms, so I need to format the data to something that HipChat expects, before posting it to the API HTTP server.
Here is what I ended up with:
Take a good look at the picture. Even without any NiFi experience, it's quite easy to figure out what's going on:
The only thing left to do, was to reconfigure my servers so syslog messages get forwarded to my NiFi server.
The output as shown in HipChat:
The formatting could be improved, but it ain't bad for a first attempt :-)
Here is another example that shows how you can easily build a HTTP-to-FTP gateway with NiFi:
Once again, the flow is quite easy to follow:
Time to implement: 30 minutes more or less. Once again, no coding required.
So, is NiFi optimized for real-time processing or batch-processing? The answer is simple: it depends on how you configure it. Every box in the diagram is called a "processor", and its throughput can be configured and tuned to your own wishes:
I believe that Apache NiFi is a valuable asset to manage the data flow of your IT environment. I have a simple test to determine if a tool is worthwhile to me or not: if I can come up with more than 3 scenarios where this particular tool can help me, I consider it a winner. Apache NiFi beats that test without any doubt.
While the examples shown here are quite simple, it can handle very complex workflows, allows flows to be arranged in different process groups, and NiFi server also supports clustering.
I only scratched the surface of what Apache NiFi can do. There is a great introduction video from OSCON 2015, given by Joe Witt of HortonWorks. I recommend you check it out.