Google Analytics to Logstash and Elasticsearch - The Quick and Dirty Way

Customer focused Google Analytics data can add some interesting variety to more operational dashboards. You can use a modified version of Google’s own Python example to quickly get the data into Logstash, then Elasticsearch; ready for display in your chosen dashboard package.

The Problem

Google Analytics (GA) is great, but the native interface isn’t great for displaying on an office monitor. I’ve recently been working on a Logstash –> Elasticsearch –> Grafana solution for public dashboards. Since keeping a consistent look and feel to these things is important, a hack of tab-rotating to an open Google Analytics page wasn’t going to last! I won’t pretend this solution scales particularly well, but as proof of concept or for low level use, then it works adequately. If you’re after something more full featured, there are open source plugins available (but I have an aversion to gem build etc. when it’s only a simple use case).

The Solution

Example Python client here, example Logstash configuration here.

First, follow all the steps to get your GA account ready for use via the API. Run the vanilla HelloAnalytics.py too, to make sure it all worked. You’ll also need to obtain the correct viewid for your GA account and environment.

Next, scrap the print_response function and use json.dumps to print the response straight to the console as JSON. Make sure the only thing that is printed by your Python client is the GA response in JSON, otherwise Logstash won’t be happy.

If you’re using the ga:nthDay dimension, you’ll also want to convert the 0000 codes into actual time stamps. The example Python client does this by grabbing hold of the day code and calculating an offset from now. Remember the nthDay metric starts at 0000 for your date range. So if you have a date range of 30daysAgo - today, then you need to subtract the nthDay code from 30, then subtract that many days from today. That way 0030 = today. In the example I’ve hard coded a time of 0600 for laziness, because I know this data will be consumed in day-long buckets (so hour doesn’t matter).

Now it’s just a case of getting Logstash to run the code and consume the output. The example configuration uses the exec input plugin to run the Python script, reads the output JSON and creates documents based on it. Pulling fields from the JSON is determined purely by their position in the array, which is a tripping point if you change the Python client. The date filter would also need updating if you decided to use a different format (but that data is so short-lived, I would wonder why!).

In Summary

Using the example Python client and Logstash configuration, a single document is created in Elasticsearch for each day, showing the total number of sessions. It even updates today’s document with the latest figure each hour. Taking care to time stamp the documents with their related day, also makes graphing in Grafana or Kibana a lot easier. I would encourage you to use the metrics and dimensions explorer to see what interesting data you could pull.