Vector Databases

What is a vector database?

A vector database is a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes. Each vector has a certain number of dimensions, which can range from tens to thousands, depending on the complexity and granularity of the data.

Vector databases are useful for contextual search, in order to find content that is highly related to an input query.

How does Montag use them?

Montag will use vector databases in two cases:

  1. To “learn” a corpus of content to be used by Bots or AI Functions
  2. To query for related content when a prompt is created

Both of these use cases are to enable Retrieval Augmented Generation (RAG), a way to increase the effeciveness of your LLM prompts by injecting relevant data alongside the intial prompt, given the bot one-shot domain-specific context.

There are many vector databases, but currently Montag only supports Pinecone and Chroma DB.

What does “dimensions” mean in the context of vector indexes?

You will come across dimensionality when setting up a new index in Pinecone, each embedding vectoriser will generate embedding vectors with a certain number of dimensions, you will need to check the embedding models documentation for the exact number. The number of dimensions must match between the embedding model and the vector database in order to store and query content.