Backup Your Whole NiFi Flow (Template) Using Python
NiFi is a super flexible tool that makes stream manipulation easy using flows, without the need for bespoke code. When you’re working with NiFi in an enterprise, it’s good to be able to take regular backups of the flows you’ve created, and also switch them between environments without too much hassle. Normally this a point and click job, but there are API calls that can be made directly - this script automates those calls and downloads the root flow, saving it in a compressed format.
Flows and Templates
In NiFi, the canvas you work on to manipulate data (and the processors you use to do that) is known as a flow. Like a file system, there is a top-level (root) flow, which is where you’ll typically start when you hit the web interface at /nifi
. It’s possible to have sub-flows contained within the root, using a process group.
Since the main aim of this script is to safely backup an entire NiFi installation’s flows, the root flow is used to obtain information on all the processors and process groups available. A NiFi flow is just XML, so it compresses really well - it can also be tracked in Git, just like any other text file!
Backing Up a NiFi Flow
If you’ve installed NiFi, then you’ll notice there are a couple of places you can directly pick up a flow.xml
file. Downloading one of these is easily automated, but comes with a pretty hefty disadvantage in that they are cluster specific. A portion of the GUID for each NiFi component actually contains the NiFi cluster ID. This means you can use your downloaded flow.xml
in your own cluster easily, but moving it to another will cause problems (also the case if you’ve reinstalled). The structure of the flow will be available in the new cluster, but I found the processors deactivated and sort of crossed out.
The solution is to create a template, as you can using the web interface, and then download that. The templating process creates a version of the XML where the GUID of the cluster is set to all 0’s, then the uploading process replaces these with the correct cluster ID. You’re still left with some work to do regarding sensitive fields (which won’t be in the template, sensibly) and configuration that may be different per environment, but the bulk of the work is done for you. On the subject of environment specific configuration, I recommend using an environment variable in the systemd .service
file to provide these values - then Ansible can easily update things like IP addresses when they’re different.
The Code
If you’ve understood the above, then the code should be pretty easy to comprehend as it’s really just API calls. Be aware that this code was written for an environment using HTTPS and client certificates to secure the install. From this point, it would probably be trivial to write an API call to upload a flow to another NiFi cluster, but that was beyond scope here.
The script takes arguments to define the --save-path
, --nifi-address
(include port), --client-cert
(that you use to access the GUI/API), --client-pass
(your cert should have a password!), --ca-bundle
(pem file containing the certificate chain of the NiFi server back to the CA).
Find the code on GitHub, be sure to check and install from the requirements.txt
.
Credit should be given to Sunile Manjee, whose posts here and here were critical for putting this together. Also, Erik Bern who posted a great gist on using pfx with requests