I've been playing around with the exports of data from my Equilab App. Equilab helps tracks my horse rides and provides a pretty good user interface or exploring the data, but I obviously needed an excuse to build a data pipeline out of some Jupyter Notebooks.

Before we get started, here's a photo of the pretty boys Brandi and I ride:

Now that that is out of the way, here is the problem I'm running into. When I view the rides in Maps in Elastic it truncates the data to 10,000 data points. Interestingly enough, most rides are less than 10,000, but as we take longer and longer rides I see this being more and more of a problem.

No we didn't stop before completing a loop, my data is truncated!

As is expected with Elastic, I have multiple options for fixing this problem, but the approach I am focusing on today leverages the relatively new Time Series Data Streams. My understanding is that if I implement this same data/dashboard with a TSDS then the resolution of the data will automatically scale to keep the number of data points displayed under 10,000.

I'm following the instructions here on setting up a TSDS, we'll see how it goes.

My entire data pipeline for the Equilab data is in Jupyter, so it only feels right that my Elasticsearch setup should also be in Jupyter. Here's the notebook that does the TSDS setup for me. Specifically, this notebook does several things:

  • creates the Index Life cycle Policy
  • creates the Equilab component template mappings
  • creates the Equilab component template settings
  • finally, puts it all together in an index template.

In order to write to a data stream I had to make one some small modification in the notebooks that uploads data to Elastic. Specifically, I had to change the _op_type from index to create. The notebook that does the actual inserting can be found here.

And here is the final result, in this first view I am zoomed out, and as such some of the finer details of the ride are missing. Elastic is simplifying the path to keep the number of data points under 10,000. In fact, in the below screenshot I have it set to simplify down to 100 points per track.

But I always have the ability to turn that number up to 10,000:

To me, this is a very impressive feature of Time Series Data Streams, and although this is a kind of fun example for one of my hobbies the same method could be applied to many different types of map data! Imagine tracking fleet vehicles, flights, and more!

Have you worked with Time Series Data Streams in Elastic yet? I'd love to hear about your experience, feel free to reach out!