History Manager

📘 HistoryManager Documentation¶

The HistoryManager is a critical component responsible for managing the conversation history, tool call results, and references during the orchestration process. It ensures that the history provided to the LLM is optimized to fit within the token window constraints while maintaining a coherent and complete context for the conversation.

🔑 Key Responsibilities (Expanded)¶

1. Conversation History Management¶

Tracking and Storing History:
The HistoryManager maintains a detailed record of all user messages, assistant responses, and tool call results. This includes both the current conversation loop and any prior interactions stored in the database.
- User Messages: Captures the original user input and any rendered versions (e.g., processed with Jinja templates).
- Assistant Responses: Tracks the assistant's replies, including system-generated messages and tool call results.
- Tool Call Results: Logs the outputs of tools invoked during the conversation.
Combining Uploaded Content:
Uploaded files, such as documents or images, are integrated into the conversation history. This ensures that the LLM has access to all relevant context when generating responses.
- Uploaded content is processed and merged with the conversation history.
- A portion of the token window is reserved for this content, as configured in the UploadedContentConfig.
Unified View:
The HistoryManager creates a cohesive history by merging uploaded content, user messages, and assistant responses. This unified view is essential for providing the LLM with a complete context for generating accurate and relevant answers.

2. Token Window Optimization¶

Token Limit Awareness:
Each LLM has a fixed token limit for the input it can process in a single API call. The HistoryManager ensures that the conversation history fits within this limit by dynamically adjusting its size.
Dynamic Reduction with Loop Token Reducer:
The Loop Token Reducer is a specialized component used to reduce the size of the history dynamically. It prioritizes the most relevant references and messages while discarding less critical information.
- Reference Reduction: Limits the number of references included in the history to avoid exceeding the token window.
- Message Prioritization: Ensures that the most recent and relevant messages are retained.
Balancing Uploaded Content and History:
The HistoryManager allocates a portion of the token window for uploaded content and the remaining portion for conversation history. This balance is configurable and ensures that both types of information are represented effectively.
Optimization Goals:
- Maximize the amount of relevant information provided to the LLM.
- Ensure that the history remains coherent and complete, even after reduction.

3. Tool Call Integration¶

Appending Tool Call Queries:
When the orchestrator invokes tools, the HistoryManager appends the tool call queries to the history. This ensures that the LLM has a record of the tools it requested and their purposes.
Appending Tool Call Results:
After the tools return their results, the HistoryManager appends these outputs to the history. This includes:
- Successful Results: The content or references generated by the tool.
- Failed Results: Error messages indicating why the tool call failed.
Temporary and Persistent Storage:
- Tool call queries and results are temporarily stored during the current conversation loop.
- Persistent storage of tool call data is not yet implemented but is a planned improvement.
Context for Subsequent Interactions:
By integrating tool call queries and results into the history, the HistoryManager ensures that the LLM has the necessary context for follow-up interactions.

4. Post-Processing and Cleanup¶

Removing Unnecessary Content:
The HistoryManager handles post-processing steps to clean up the history. This includes removing content that is not directly relevant to the LLM's understanding of the conversation. Examples include:
- Follow-Up Questions: Generated by post-processors but not part of the LLM's output.
- Stock Tickers or Graphs: Added for user display but irrelevant to the LLM.
Using remove_from_text:
The remove_from_text function is used to strip out post-processed content from the history. This ensures that the LLM is not confused by content it did not generate.
Maintaining Coherence:
Post-processing ensures that the history remains coherent and focused on the conversation's core context. This improves the LLM's ability to generate accurate and relevant responses.
Dynamic Adjustments:
Post-processing is applied dynamically during each loop of the orchestrator, ensuring that the history is always optimized for the current interaction.

🛠️ Key Functionalities¶

1. Adding Tool Call Results¶

add_tool_call_results(tool_call_results: list[ToolCallResponse])
Appends the results of tool calls to the history. If a tool call fails, an error message is added instead.

def add_tool_call_results(self, tool_call_results: list[ToolCallResponse]):
    for tool_response in tool_call_results:
        if not tool_response.successful:
            self._loop_history.append(
                LanguageModelToolMessage(
                    name=tool_response.name,
                    tool_call_id=tool_response.id,
                    content=f"Tool call {tool_response.name} failed with error: {tool_response.error_message}",
                )
            )
            continue
        self._append_tool_call_result_to_history(tool_response)

_append_tool_call_result_to_history(tool_response: ToolCallResponse)
Adds a successful tool call result to the history.

def _append_tool_call_result_to_history(
    self,
    tool_response: ToolCallResponse,
) -> None:
    tool_call_result_for_history = self._get_tool_call_result_for_loop_history(
        tool_response=tool_response
    )
    self._loop_history.append(tool_call_result_for_history)

2. Retrieving History for Model Calls¶

get_history_for_model_call(...)
Retrieves the conversation history formatted for the LLM, ensuring it fits within the token window. This includes:

The original user message.
The rendered user message (processed via Jinja templates).
The rendered system message.

The remove_from_text functions are handed over function to clean up the history from post processing steps or other artifacts in the messages that are not produced by the LLM. In order to not confuse the LLM with these text snippets that have not been produced by it. E.g. follow-up questions.

async def get_history_for_model_call(
    self,
    original_user_message: str,
    rendered_user_message_string: str,
    rendered_system_message_string: str,
    remove_from_text: Callable[[str], Awaitable[str]],
) -> LanguageModelMessages:
    messages = await self._token_reducer.get_history_for_model_call(
        original_user_message=original_user_message,
        rendered_user_message_string=rendered_user_message_string,
        rendered_system_message_string=rendered_system_message_string,
        loop_history=self._loop_history,
        remove_from_text=remove_from_text,
    )
    return messages

3. Token Window Management¶

The Loop Token Reducer is used to dynamically adjust the size of the history to fit within the LLM's token limit. This involves:
- Reducing the number of references included in the history.
- Prioritizing the most relevant chunks and messages.
- Ensuring that the history remains coherent and complete.

4. Appending Tool Calls¶

_append_tool_calls_to_history(tool_calls: list[LanguageModelFunction])
Adds tool call queries to the history.

def _append_tool_calls_to_history(
    self, tool_calls: list[LanguageModelFunction]
) -> None:
    self._loop_history.append(
        LanguageModelAssistantMessage.from_functions(tool_calls=tool_calls)
    )

5. Assistant Message Management¶

add_assistant_message(message: LanguageModelAssistantMessage)
Appends an assistant message to the history.

def add_assistant_message(self, message: LanguageModelAssistantMessage) -> None:
    self._loop_history.append(message)

🛠️ Areas for Improvement¶

Tool Call and Tool Message Persistence
Currently, tool calls and tool messages are not saved in the database. This limits the ability to reconstruct past interactions fully.
Uploaded Content Correlation
Uploaded images and files are not directly linked to user messages in the history. This makes it difficult to reconstruct the context of uploaded content.
Code Cleanup
The history construction logic, especially for database interactions, requires refactoring for better maintainability and clarity.
Enhanced Reference Management
Improve the integration with the ReferenceManager to better handle references across multiple iterations and tools.