Guide: Adding RAG to your Slack-bot
If you watched or read the previous part to this series, you’ll have built your first Slack-bot, unfortunately it was pretty basic as it just channeled Slack into OpenAI - not very useful unless you want to just internalise OpenAI access into Slack for your team.
Today, we’re going to make our bot a little smarter by having it automatically use something called Retrieval Augmented Generation (RAG) in it’s responses, and to show us what the references are that the bot used to derive it’s response.
To quickly recap: Retreival Augmented Generation is the practice of finding relevant textual content to the input query from a user, and then adding that content to a prompt.
One thing LLMs really struggle with is something called Context Length, which constrains the amount of input an LLM can handle, and the amount of Output the LLM can produce. Many strategies exist to manage this, and it’s why we need to be careful when deciding how big our context window is when building our Bots with RAG.
So let’s get started, the first step is to go to the Bot we’ve already created in the previous part of this series - our grumpy Wednesday bot.
We’re also going to assume you’ve watched or read our other guide on how to create a Text Collection - because you’ll need one in order to make this work. If you haven’t watched it yet, go and do so, then come back…
Step 1
In order to enable RAG, browse to the Bots list, and select our previously created Bot and click “Edit”.
In the Edit view, we need to scroll down to the “RAG Settings” card, and here we need to do three things:
- Assign the vector database to use:
Local ChromaDB
- Assign the embed configuration to encode the query for this database:
OpenAI Default
- Set the namespace we want to search when we perform the RAG operation:
bertrand-001
To explain these setting in more detail: The Embed Settings determine the embedding model to use when submitting our query to the Vector database’s query endpoint.
Embedding are just vectors, the way these vectors are valued and arranged and the number of dimensions they have differ from encoding to encoding. In order to ensure accuracy, we need to make sure they match up with that of the Text Collection.
Why not just let me select a Text Collection?
In production scenarios, you may have pre-existing text collections that were not created via Montag, and are not intended to be managed by Montag, so in order to allow searching of pre-existing text collections, we enable you to specify the embedding model to use when encoding the query.
The Assigned Namespace is the data set that should be searched in the database, you can have multiple namespaces in the same database. More importantly, namespaces are pre-configured with a privacy tier, an that can constrain which LLMs you can work with for your bot.
For example, Wednesday uses OpenAI as it’s model vendor, and so is considered suitable only for sending public-tier level information. If we selected a namespace that contains confidential or PII based data, then when we update Wednesday, there will be an error.
There is this fourth option in the RAG Settings: Resource Exapnders, these are regex trigers that will trigger a script to run to fetch additonal data for the prompt. For example, if you want your bot to be able to access your internal Wiki, you can write an expander that will trigger when it sees a wiki link, and then extract the contents and embed it into the prompt.
Resource expanders also come with privacy tiers attached, so depending on the priuvacy tier mix of any data sources being used by the bot, the LLM with the highest matching Privacy clearance needs to be selected as the back-end.
Step 2: Save and restart
Click Save, and then start the bot (or restart the bot) in the list view.
Step 3: Ask your query
When you next run a query for Wednesday - you’ll see that the response will now include refrences to whatever data source you have selected.