LLM Prompts

What are Prompts?

Prompts are what are ultimately fed into an LLM completion request, they can be of varying length and usually come with multiple sections:

The instruction: This sets the “tone” of the bot, a simple instruction would be something like: “You are a helpful AI assistant that provides helpful, friendly and concise responses”.
History (optional): If this is a chat-bot, you may want to include the conversation history between the user and the bot for additional context, this can help the bot be more helpful in conversational tasks.
Context (optional): To better inform the LLM, you may want to include information that is specific to the main input or question the user was asking. This context comes from a vector query of the user input to a vector database, and then embedded into the prompt.
User input: The actual question the user is asking

How does Montag use Prompts?

Montag has a pipeline for prompts that it runs when user-input comes into the system, the process looks something like this:

User provides initial prompt
Prompt is “expanded”: Montag will check for commands, such as “help” or “reset”, which interrupt the pipeline and perform a specific action on the bot itself
Prompt is scanned for hyperlinks: if one is found it will be queried, extracted, compressed and included in the prompt
Prompt pre-processing script is executed, this is specified at setup, this script can modify the raw prompt in some way before it is passed to the model
Prompt pre-generation:
1. Context retrieval
2. Length checking
3. History storage
Prompt rendering: the prompt template is rendered with all the variables required for the template:
1. Instructions
2. History
3. Context
4. Prompt
The prompt is sent to the model
Post-processing scripts are run on the response
The response is recorded to the history
The response is sent to the user

What are the various Prompt options?

Instructions: The instructions tot he bot

Help Text: The help text to return when a user sends the @bot help command

Conversation Window: The number of conversation entries to record - this is a rolling window, so think of this as the short-term memory of the bot. It’s worth noting that history is automatically embedded into OpenAI prompts, as OpenAI has a data structure to hold this data as part of it’s API request. However for self-hosted LLMs the {{.History}} key/value map should be iterated in the prompt template.

Prompt Template: The template to execute and send to the model, for example:

INSTRUCTIONS
{{.Instructions}}
{{ if .ContextToRender }}Use the following context to help with your response:
{{ range $ctx := .ContextToRender }}
{{$ctx}}
{{ end }}{{ end }}
====

{{.ModelRoles.User}}: {{.Body}}

Variables available to a response template:

.Instructions: The instructions to the bot
.ContextToRender: The contexts to render (if any)
.ModelRoles.User: The user role

In the above example we use all available variables:

Response Template: The template to structure the response to the user, for example:

<@{{.User}}> {{.Response}}


*References:*{{ range .Titles }}
> {{.}}{{end}}
> (contexts: {{.Contexts}}, history: {{.History}})

Variables available to a response tempalte:

In the above example we use all available variables:

.Response: Is the response from the model
.Titles is a list of the contexts embedded into the prompt
.Contexts is the number of contexts returned
.History is the current size of the conversation window
.User is the user name (if available, with slack interface it is @username)

Number of Context Injections: The number of context chunks to embed in the prompt Context min score: The minimum score the content relevance should have, this is a value between 0 and 1