🧭 UniqueAI Orchestrator — Technical Documentation¶

This document explains the architecture and control flow of the UniqueAI agent orchestrator, including how it plans, executes, streams, and coordinates tools, references, thinking steps, evaluations, and post-processing. It includes full function code snippets for clarity and copy/paste.

📌 Summary¶

UniqueAI is an iterative, tool-using AI agent orchestrator.
It receives an incoming chat event and space-scoped configuration, and coordinates:
Planning (LLM call to decide to stream or call tools)
Tool execution (with history updates and reference extraction)
Streaming messages to the frontend
Evaluation and postprocessing on final messages
Thinking/Progress updates for user visibility
The loop is bounded by max_loop_iterations and supports forced tool calls on the first iteration only.
Prompts (system and user) are rendered via Jinja templates, enriched with tool metadata and MCP server instructions.

🧱 Construction & Dependencies¶

Constructor¶

The orchestrator is initialized with:

Event: contains the user message and streaming context
Config: space-scoped config including model, prompts, and loop constraints
Managers:
HistoryManager: builds LLM call history and appends tool calls/results
ToolManager: tool registry, forced tools, execution, and control handoff checks
ReferenceManager: collects referenceable chunks from LLM and tools
ThinkingManager: progress reporter for visible “thinking steps”
DebugInfoManager: aggregates debug metadata and tool traces
EvaluationManager: runs evaluations when a final message is produced (e.g. Hallucination check or other assessments)
PostprocessorManager: transforms LLM outputs (e.g., follow up questions, stock ticker information)
Services:
ChatService: LLM interactions and streaming to frontend

Code:

    def __init__(
          self,
        logger: Logger,
        event: ChatEvent,
        config: UniqueAIConfig,
        chat_service: ChatService,
        content_service: ContentService,
        debug_info_manager: DebugInfoManager,
        reference_manager: ReferenceManager,
        thinking_manager: ThinkingManager,
        tool_manager: ToolManager,
        history_manager: HistoryManager,
        evaluation_manager: EvaluationManager,
        postprocessor_manager: PostprocessorManager,
        mcp_servers: list[McpServer],
    ):
        self._logger = logger
        self._event = event
        self._config = config
        self._chat_service = chat_service
        self._content_service = content_service

        self._debug_info_manager = debug_info_manager
        self._reference_manager = reference_manager
        self._thinking_manager = thinking_manager
        self._tool_manager = tool_manager

        self._history_manager = history_manager

        self._evaluation_manager = evaluation_manager
        self._postprocessor_manager = postprocessor_manager
        self._latest_assistant_id: str = event.payload.assistant_message.id
        self._mcp_servers = mcp_servers

🔁 Main Control flow (run method) — Expanded, Readable Walkthrough¶

The orchestrator runs a bounded, iterative loop that plans, executes, and streams results while coordinating tools, references, evaluations, and post-processing. Here’s a clearer, fuller picture of what happens in each phase.

Initialization and kickoff¶

Fresh session check: Before the loop begins, the agent asks the HistoryManager whether there are any “loop messages” already present. If this is a fresh interaction (no loop messages), it proactively updates the frontend with a short status message like “Starting agentic loop…”. This gives users immediate feedback that the system is alive and preparing to work.
Context setup: Internally, the agent also sets bookkeeping values such as the current iteration index and any existing “start_text” used for progressive streaming (this may be used to guide how text appears or is removed from later outputs).

flowchart LR
    A[Start: Initialize Orchestrator] --> B{Fresh Session?}
    B -- Yes --> B1[Update Frontend: Starting agentic loop...]
    B -- No --> C[Enter Main Loop]
    B1 --> C
    C --> E[Continue to Planning Phase]

Iterative planning and execution¶

Loop boundary: The agent escapes the for-loop capped at config.agent.max_loop_iterations. This hard limit prevents runaway tool-use or infinite deliberation. Each iteration represents a plan-and-act cycle.

Step 1 — Compose Prompts Answer (_plan_or_execute):¶

Part A: Compose Prompts¶

Compose prompts and history: The agent uses _compose_message_plan_execution() to construct a clean message stack for the model call. This includes:

Original user query (verbatim)
A Jinja-rendered user prompt (enriched with tool metadata and MCP hints)
A Jinja-rendered system prompt (which includes tool descriptions, project settings, loop constraints, and any custom instructions)
Prior conversation and tool results assembled by the HistoryManager

flowchart LR 
    B[_compose_message_plan_execution] --> C[Render Prompts System/User using Jinja Templates]
    B --> D[Get History from HistoryManager]
    D --> F
    C --> F[Merge Prompts and History to LLM Message]
    F --> H[Continue to Part B]

Part B: Answer to User or request Tools¶

First iteration with forced tools: If ToolManager has "forced" tools (Tool the user explicitly said need to be executed) and we are at iteration 0, the agent calls the model once per forced toolChoice. It merges all tool calls and references across those calls into a single stream response. This ensures the model must use specific tools at least once.
Last iteration: If we've reached the final allowed iteration, the agent disables tools entirely and asks the model to produce a best-effort final answer. This guarantees termination with a user-facing result.
Default case: The model is allowed to decide whether to call tools or just stream an answer. Tools, references, debug info, and loop parameters are all provided so the model can choose wisely.

flowchart LR 
    A[Part B: Answer or Request Tools] --> B{Iteration Type?}

    B -- First + Forced Tools --> C[Call LLM per forced toolChoice<br/>Merge responses]
    B -- Last Iteration --> D[Call LLM without tools<br/>Force final answer]
    B -- Default --> E[Call LLM with tools<br/>Let model decide]

    C --> F[Return LanguageModelStreamResponse]
    D --> F
    E --> F
    F --> G[Continue to Step 2]

Step 2 — Incorporate References and Thinking Steps¶

The agent pulls any references that the model produced (loop_response.message.references) and feeds them into the ReferenceManager. These references are the building blocks for citations or UI panels showing sources, tool outputs, or other supporting material.
The ThinkingManager receives the loop_response to update visible "thinking" or "progress" indicators. This can show users what the agent is doing (e.g., choosing tools, summarizing findings, or drafting an answer) and helps with transparency during longer operations.

Step 3 — Process the Plan (`_process_plan`):¶

flowchart LR
    A[Step 3: Process Plan] --> B{Response Type?}

    B -- Empty Response --> C[Warn User & Exit Loop]
    B -- Tool Calls Present --> D[Case A: Handle Tool Calls]
    B -- No Tool Calls --> E[Case B: Handle Finalization]

    C --> F[Continue to Exit Phase]
    D --> G[Continue to next iteration or exit if last iteration]
    E --> F

Case A: Model requests Tool Calls:¶

If the model requested tool calls, _handle_tool_calls() is invoked:

Tool calls are appended to history so the full chain-of-thought (without revealing private reasoning) remains consistent for future model calls.
Tools can include MCP tools or A2A tools.
ToolManager executes the tools and returns structured results.
ReferenceManager extracts referenceable chunks from tool outputs. How to construct a ToolResponse that the references can be read is described in the documentation about the ToolResponse.
DebugInfoManager captures any useful traces or diagnostics. How to construct the ToolResponse to contain the debug information is described in the documentation about the ToolResponse.
HistoryManager records tool results so the next iteration can build on them.
Control handoff: If any tool signals that it "takes control" (e.g., a deep research agent that will stream independently), the method returns True to exit the base loop. The Orchestrator "hands off" the responsibility of the user interaction to the control taking tool. Otherwise, the loop continues to the next iteration.

flowchart LR
    A[Case A: Tool Calls Detected] --> B[1. Append tool calls to history]
    B --> C[2. Execute tools via ToolManager]
    C --> D[3. Extract referenceable chunks]
    D --> E[4. Collect debug info]
    E --> F[5. Add results to history]
    F --> G{6. Tool takes control?}

    G -- Yes --> H[Exit Loop - Hand off control]
    G -- No --> I[Continue to Housekeeping Phase]

Case B: Model does not request Tool Calls¶

If no tools were requested, the model likely produced a user-facing response. The agent:

Runs evaluations selected by ToolManager to spot issues (e.g., Hallucination check).
Applies postprocessors (e.g., generation of follow up questions or displaying of stock information).

If any evaluation fails, a warning is logged. A future enhancement could inject corrective instructions and retry; for now, the loop cleanly exits after postprocessing a final answer.

flowchart LR
    A[Case B: No Tool Calls - Final Answer] --> B[Run Evaluations]
    B --> C[Apply Postprocessors]
    C --> D{Evaluations Pass?}

    D -- Yes --> E[Continue to Exit Phase]
    D -- No --> F[Log Warning]
    F --> E

Step 4 — Exit the Loop or do we Continue?¶

The following conditions lead to an exit of the loop:

A final answer has been streamed no more tool calls are requested.
A tool has been handed of the control of communicating with the user. Like the Deep research tool would or possibly another agent.

Case A: `_process_plan()` indicates we’re done.¶

The answer has already been streamed to the user
The ThinkingManager is asked to close its visible steps

Case B: Continue in the Loop¶

The requested tool calls have been executed and integrate into the history. The agent continues to step 6 and decides again to call tools or not.

House Keeping

Continue iteration: If we're still in the loop (e.g., tools just ran)
Given a model requests tool calls and a meaningful answer this can be displayed via the ThinkingManager. Alternatively, if the thinking manager is deactivate it could be appended to the last assistant message or added as another assistant message.

On Loop Exit: Finalization and completion¶

Mark message as complete: Once the loop ends (for any reason), the agent signals completion to the frontend by calling modify_assistant_message_async with set_completed_at=True. This allows the UI to remove loading indicators and present the final state of the message. This is also the indication for other external processes like the Benchmarking to know when the Agent finished its work. This was designed to be non-blocking such that long running jobs of 10 or more minutes can be checked and polled by the external processes.
Outcome: At this point, the conversation history contains user prompts, rendered system/user prompts, any tool calls and results, and the final assistant message. References and debug info are stored, and any evaluation/postprocessing steps have already been applied if we ended on a no-tools final message.

flowchart LR
    A[Exit Conditions Met] --> B[Close thinking steps]
    B --> C[Mark message as complete<br/>set_completed_at=True]
    C --> D[End: Orchestrator Complete]

Why this design works well:

Predictable Termination: max_loop_iterations plus last-iteration “no tools” ensures the user always gets an answer or a clear explanation.
Transparent Progress: ThinkingManager and streaming updates keep users informed during multi-step reasoning or tool use.
High-:uality Outputs: Evaluations and postprocessors refine the final answer before completion.
Extensible: Forced tools, tool control handoff, and template-driven prompts (with MCP integration) make it easy to add capabilities without changing the core loop.