I have quite the history working with Chatbots and data. The growth around AI and NLP is amazing. Elastic, were I spend my 9-5, is all abuzz with GenAI, LLMs, Vector Search, and so much more!

I have a lot of familiarity, dated by now, building chatbots and intelligent agents. When I did this many years ago it was on platforms like Rasa NLU. I even was part of a team creating an intelligent agent platform called Articulate.

I have always dreamed of chatting with my data. All data consumers are looking for an answers. Queries and dashboards are an inefficient a means to an end. That level of functionality is more within reach than ever before.

I spent some time building a fragile agent that leverages OpenAI's gpt-3.5-turbo model. My goal: answer questions on top of 3 sample datasets in Elasticsearch. With how I have it setup, it is capable of answering questions on more datasets, but I had to chose a starting point.

The End Product

Starting with the end, the end product that is:

My solution is a Streamlit app. Streamlit provides a great starting point for building a conversational UI. You'll also notice I am using expanders to hide and show pertinent information. Each chat bubble includes the full prompt sent to the OpenAI API and the agent's "thoughts".

I chose not to leverage a framework like LangChain. More from a masochistic need to experience what it was like to build it myself than anything else. But I do borrow some concepts.

For example, when the user asks, "How many 404 errors were there in the last year?" that is not the entirety of what passes to the OpenAI API. In fact it is quite a bit longer. Here is the full prompt:


You are a Data Engineer working to answer questions from data stored in Elasticsearch. The query you are trying to answer is:

USER: How many 404 errors were there in the last year?

To answer this query, you may use two tools:

  • general - This is the tool you will use when replying to a user with your general knowledge and not needing to query Elasticsearch. It can be used for greetings and general questions.
  • query - This is the tool you should use when you need to query Elastic before replying to the user.

When querying Elastic you have access to the following dataset:

  • flights - This dataset contains flight information between two airports.
  • logs - This dataset can be used to answer questions like "How many 401 errors have there been in the last year?" or "Which operating system had the most errors?"
  • ecommerce - This dataset can be used to answer business related ecommerce questions such as "Which customer has purchased the most products" or "Which day of the week to we sell the most on".

If you do not know the answer to a question with your general knowledge and don't believe any dataset will answer the users query it is okay! Just tell the user you don't believe you can answer their query, this is better than giving a fake or wrong answer.

Your reply must be in the following format with no deviations. Your reply is parsed and must be consistent. Here are some examples:

THOUGHT | To answer the user I can use my general knowledge.

TOOL | general

DATASET | none

RESPONSE | Hello, how can I help you today?

here is another example where we must first query Elasticsearch.

THOUGHT | I need to query the flights dataset in Elastic before replying to the user.

TOOL | query

DATASET | flights

RESPONSE | Let me query the flights dataset to get that information for you.


The tools and datasets are dynamically inserted as well as the users query. This is one of three "prompts" that get used in entirety of responding to a user.

For the most part answering a question requires 3 steps:

  1. Identify that we need to query Elastic
  2. Generate a query to answer the user's query
  3. Interpret the result to generate a response

In the end my agent is simple, fragile, and in need of a lot of love. Building this example by hand helped me appreciate the level of thought behind LangChain. It gave me better a understand of what it takes to get it to the level it is at. But more so an understanding of what it would take to build it to completion.

Here are a few observations:

  • There is no "forcing" these models to respond in a certain way. To an extent it's like asking a child to clean their room. They may do it the first time you ask, the second time you ask, and forget their manners the 3rd time? You may have to supervise them, you may have to do it for them!
  • Error handling is an interesting problem in and of itself. This first pass has almost no error handling in it at all! But I do see a path potential path forward: catching errors (likely because of response parsing), reply to the model in the system context to re-enforce some piece of the prompt, try again. I will likely look to explore this piece more in the future.
  • Speaking in traditional Chatbot language: intent and entity recognition may be less of a concern and handled by the "magic" of the model. But, the same problems around conversational flow, circular loops, etc are all still problems.

Have you built something with Streamlit or OpenAI recently? I'd love to hear more about it and share experiences!