Skip to content

Code Execution

Code execution lets the model run Python in a sandbox for tasks like data analysis, plotting, or file processing. It is available through the Responses API (client.responses.create or ChatService.complete_responses_with_references), not Chat Completions.

What You'll Learn

  • Using an auto-managed container for code execution (quick start)
  • Including code outputs (stdout, stderr, results) in the response
  • Creating and using a custom Azure container for per-chat isolation and lifecycle control
  • Uploading and downloading files from containers
  • Downloading files generated by the model during code execution
  • Calling the Responses API through ChatService instead of the raw OpenAI client
  • Working with the ResponsesLanguageModelStreamResponse output object and its convenience properties
  • Persisting container and file state across stateless turns using ShortTermMemoryService

Prerequisites

  • unique_toolkit and the OpenAI SDK
  • A model that supports code execution (e.g. LanguageModelName.AZURE_GPT_5_2025_0807)

For general client setup, see OpenAI Client.


1. Auto-managed container

Get a client, define the code interpreter tool with container={"type": "auto"}, and call the Responses API. The API manages the container for you.

from openai.types.responses.tool_param import CodeInterpreter
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName

model_name = LanguageModelName.AZURE_GPT_5_2025_0807
client = get_openai_client()

code_interpreter_tool = CodeInterpreter(type="code_interpreter", container={"type": "auto"})

response = client.responses.create(
    model=model_name,
    tools=[code_interpreter_tool],
    input="Use code to print hello world.",
)

print(response.output)

Including code outputs

To get stdout, stderr, or generated images from the code, pass include=["code_interpreter_call.outputs"]. The response.output list will contain both the text block and the code interpreter call with .outputs.

response_with_output = client.responses.create(
    model=model_name,
    tools=[code_interpreter_tool],
    input="Use code to print hello world.",
    include=["code_interpreter_call.outputs"],
)
print(response_with_output.output[1].outputs)

Note: include=["code_interpreter_call.outputs"] returns only images and console output (stdout/stderr). Other files written by the model (e.g. CSVs, Word files) are not included there. To get those, use the approach described in Downloading model-generated files.


2. Custom Azure container

Use a custom container when you need to re-use containers (typically for the same chat sessions), and when you want to upload files in order for the client to operate on.
The client x-model header must match the model used for code execution.

Client: Use get_openai_client(additional_headers={"x-model": model_name}) and use the same model in responses.create.

Create container: Call client.containers.create with a name (include something like chat_id to separate chats) and expires_after (e.g. {"anchor": "last_active_at", "minutes": 20}).

Tool: Use CodeInterpreter(type="code_interpreter", container=container.id).

from openai.types.responses.tool_param import CodeInterpreter
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName

model_name = LanguageModelName.AZURE_GPT_5_2025_0807
# Client header should match the model used for code execution
client = get_openai_client(additional_headers={"x-model": model_name})

container = client.containers.create(
    name="code_execution_container",
    expires_after={"anchor": "last_active_at", "minutes": 20},
)

code_interpreter_tool = CodeInterpreter(type="code_interpreter", container=container.id)

response = client.responses.create(
    model=model_name,
    tools=[code_interpreter_tool],
    input="Use code to print hello world.",
    include=["code_interpreter_call.outputs"],
)

Uploading and downloading files

File upload and download apply to custom containers (you need a container_id). With auto containers, the API manages storage differently.

Upload

Use client.containers.files.create(container.id, file=(filename, file_content)) where file_content is for example bytes (other formats supported, check OpenAI documentation). The call returns a file object with .id; store it for later (e.g. to avoid re-uploading or to download).

# Example: upload a small CSV as bytes
csv_content = b"name,value\na,1\nb,2\n"
openai_file = client.containers.files.create(
    container.id,
    file=("data.csv", csv_content),
)
file_id = openai_file.id  # store for later

Download

  • Metadata: client.containers.files.retrieve(file_id, container_id=container.id)
  • Content (bytes): client.containers.files.content.retrieve(file_id, container_id=container.id)

Container lifecycle (e.g. expires_after) applies to these files as well.

List files

Use client.containers.files.list(container_id) to iterate over all files currently in the container:

for file in client.containers.files.list(container.id):
    print(f"{file.id}  {file.path}")

Checking if a file exists

Call client.containers.files.retrieve(file_id, container_id=...). It raises openai.NotFoundError if the file does not exist. Use try/except to decide whether to upload or skip.

from openai import NotFoundError

try:
    _ = client.containers.files.retrieve(file_id, container_id=container.id)
    # file exists, skip upload
except NotFoundError:
    # upload the file
    openai_file = client.containers.files.create(...)

Checking if a container exists (and is usable)

Call client.containers.retrieve(container_id). It raises openai.NotFoundError if the container does not exist. If it exists, check container.status — only treat as usable when status in ["active", "running"]; otherwise create a new container.

from openai import NotFoundError

try:
    container = client.containers.retrieve(container_id)
    if container.status not in ["active", "running"]:
        # create a new container
        container = client.containers.create(...)
except NotFoundError:
    container = client.containers.create(...)

Downloading model-generated files

When the model writes a file during code execution (e.g. a CSV or plot), it references those files as container_file_citation annotations on output_text content items. This works with both auto and custom containers.

Iterate over response.output, find ResponseOutputMessage items, and read the annotations to get the file_id and filename. Then download the content with files.content.retrieve.

from openai.types.responses import ResponseOutputMessage

generated_file_id = None
for item in response.output:
    if isinstance(item, ResponseOutputMessage):
        for content in item.content:
            if content.type == "output_text":
                for annotation in content.annotations:
                    if annotation.type == "container_file_citation":
                        generated_file_id = annotation.file_id
                        container_id = annotation.container_id
                        print(f"Generated file: {annotation.filename}  ({generated_file_id})")

if generated_file_id:
    file_content = client.containers.files.content.retrieve(
        generated_file_id,
        container_id=container_id,
    )
    generated_bytes = file_content.read()
    print(f"Downloaded {len(generated_bytes)} bytes")

Note: include=["code_interpreter_call.outputs"] returns inline stdout/stderr/images in the response. Files saved to disk by the model (e.g. with df.to_csv(...)) are not included there — use the annotation pattern above to download them.


4. Calling the Responses API via ChatService

Use chat_service.complete_responses_with_references() (sync) or complete_responses_with_references_async() (async) instead of calling client.responses.create directly. These methods handle authentication, streaming, and message writing (with references) to the chat automatically.

The signature accepts the same tools, include, and messages arguments as the raw API:

from openai.types.responses.tool_param import CodeInterpreter
from unique_toolkit.language_model import LanguageModelName

code_interpreter_tool = CodeInterpreter(type="code_interpreter", container=container_id)

response = chat_service.complete_responses_with_references(
    model_name=LanguageModelName.AZURE_GPT_5_2025_0807,
    messages="Read data.csv and plot a histogram. Save the plot as histogram.png.",
    tools=[code_interpreter_tool],
    include=["code_interpreter_call.outputs"],
)

For the async variant:

response = await chat_service.complete_responses_with_references_async(
    model_name=LanguageModelName.AZURE_GPT_5_2025_0807,
    messages="Read data.csv and plot a histogram. Save the plot as histogram.png.",
    tools=[code_interpreter_tool],
    include=["code_interpreter_call.outputs"],
)

The ResponsesLanguageModelStreamResponse output object

complete_responses_with_references returns a ResponsesLanguageModelStreamResponse. Its .output field is the raw list[ResponseOutputItem] — identical in structure to what you get from client.responses.create. In addition, the object exposes convenience properties that save you from iterating manually:

Property Type Description
.output list[ResponseOutputItem] Raw output items (text, code calls, etc.)
.container_files list[AnnotationContainerFileCitation] All container_file_citation annotations across all output messages
.code_interpreter_calls list[ResponseCodeInterpreterToolCall] All code interpreter call items

Instead of manually walking .output to find annotations (as shown in section 3), use .container_files directly:

for citation in response.container_files:
    file_content = client.containers.files.content.retrieve(
        citation.file_id,
        container_id=citation.container_id,
    )
    print(f"{citation.filename}: {len(file_content.read())} bytes")

Each citation has .file_id, .filename, and .container_id.


5. Persisting state with ShortTermMemoryService

The assistant is stateless — a new handler instance is created for every incoming message. Without persistence, a new container would be created on every turn and previously uploaded files would be lost. Use PersistentShortMemoryManager to save the container_id and uploaded file IDs to chat-scoped short-term memory, so they can be reused on the next turn.

Define a memory schema

from pydantic import BaseModel

class CodeExecutionMemory(BaseModel):
    container_id: str | None = None
    file_ids: dict[str, str] = {}  # Unique file id -> OpenAI container file id

Set up the manager

Instantiate at chat scope (using chat_id, not message_id) so memory persists across turns:

from unique_toolkit.short_term_memory.service import ShortTermMemoryService
from unique_toolkit.agentic.short_term_memory_manager.persistent_short_term_memory_manager import (
    PersistentShortMemoryManager,
)

stm_service = ShortTermMemoryService(
    company_id=event.company_id,
    user_id=event.user_id,
    chat_id=event.payload.chat_id,
    message_id=None,  # chat-level scope, not message-level
)
memory_manager = PersistentShortMemoryManager(
    short_term_memory_service=stm_service,
    short_term_memory_schema=CodeExecutionMemory,
    short_term_memory_name="code_execution", # Ideally include a chat_id in the name
)

Per-turn pattern

Load at the start of each turn, update in place, save at the end:

from openai import NotFoundError

# 1. Load (returns None if no memory saved yet)
memory = await memory_manager.load_async() or CodeExecutionMemory()

# 2. Create or reuse container
if memory.container_id is not None:
    try:
        container = await client.containers.retrieve(memory.container_id)
        # This field is not well-typed in the openai sdk, this was found through trial and error
        if container.status not in ["active", "running"]:
            memory = CodeExecutionMemory()  # reset: stale container
    except NotFoundError:
        memory = CodeExecutionMemory()  # reset: container gone

if memory.container_id is None:
    container = await client.containers.create(
        name=f"code_execution_{event.payload.chat_id}",
        expires_after={"anchor": "last_active_at", "minutes": 20},
    )
    memory.container_id = container.id

# 3. Upload files (skip if already uploaded)
for file in files_to_upload:
    if file.id in memory.file_ids:
        try:
            await client.containers.files.retrieve(
                memory.file_ids[file.id], container_id=memory.container_id
            )
            continue  # already there
        except NotFoundError:
            pass  # fall through to re-upload

    openai_file = await client.containers.files.create(
        memory.container_id,
        file=(file.name, file.content_bytes),
    )
    memory.file_ids[file.id] = openai_file.id

# 4. Run inference
code_interpreter_tool = CodeInterpreter(
    type="code_interpreter", container=memory.container_id
)
response = await chat_service.complete_responses_with_references_async(
    model_name=LanguageModelName.AZURE_GPT_5_2025_0807,
    messages=user_message,
    tools=[code_interpreter_tool],
    include=["code_interpreter_call.outputs"],
)

# 5. Save updated memory
await memory_manager.save_async(memory)

Example scripts

Full Example — Quick start with auto container (Click to expand)
# %%
# Code execution with auto-managed container (Responses API)

from openai.types.responses.tool_param import CodeInterpreter
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName

model_name = LanguageModelName.AZURE_GPT_5_2025_0807
client = get_openai_client()

# %%
# Define tool and call Responses API
code_interpreter_tool = CodeInterpreter(type="code_interpreter", container={"type": "auto"})
messages = "Use code to print hello world."

response_with_output = client.responses.create(
    model=model_name,
    tools=[code_interpreter_tool],
    input=messages,
    include=["code_interpreter_call.outputs"],
)

# %%
# response.output is a list (e.g. text block, then code_interpreter_call)
print(response_with_output.output)
print(response_with_output.output[1].outputs)  # type: ignore[union-attr]
Full Example — Custom Azure container (Click to expand)
# %%
# Code execution with custom Azure container (Responses API)
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName

model_name = LanguageModelName.AZURE_GPT_5_2025_0807

client = get_openai_client(additional_headers={"x-model": model_name})

# %%
# Create a custom Azure container
# Recommended to use chat_id in the name to differentiate containers across chats

container = client.containers.create(
    name="code_execution_container",
    expires_after={"anchor": "last_active_at", "minutes": 20},
)
print(f"Created container: {container.id}")

# %%
# Upload a file to the container

csv_bytes = b"name,value\nfoo,1\nbar,2"
uploaded_file = client.containers.files.create(
    container.id,
    file=("data.csv", csv_bytes),
)
print(f"Uploaded file: {uploaded_file.id}")

# %%
# Download a file from the container
# Files produced by code execution are accessible the same way via file_id

file_id = uploaded_file.id  # or any file_id from a code interpreter output
file_content = client.containers.files.content.retrieve(
    file_id,
    container_id=container.id,
)
downloaded_bytes = file_content.read()
assert downloaded_bytes == csv_bytes

# %%
# List all files in the container

for file in client.containers.files.list(container.id):
    print(f"  {file.id}  {file.path}")

# %%
# Define tools and call Responses API

from openai.types.responses.tool_param import CodeInterpreter

code_interpreter_tool = CodeInterpreter(type="code_interpreter", container=container.id)
messages = "Read data.csv and add a random column. Save the result to a new file called data_with_random_column.csv."

response_with_output = client.responses.create(
    model=model_name,
    tools=[code_interpreter_tool],
    input=messages,
    include=["code_interpreter_call.outputs"],
)

# %%
# Download a file generated by the model during code execution
# File citations appear as annotations on output text items

from openai.types.responses import ResponseOutputMessage

generated_file_id = None
for item in response_with_output.output:
    if isinstance(item, ResponseOutputMessage):
        for content in item.content:
            if content.type == "output_text":
                for annotation in content.annotations:
                    if annotation.type == "container_file_citation":
                        generated_file_id = annotation.file_id
                        print(f"Generated file: {annotation.filename}  ({generated_file_id})")

# %%
if generated_file_id:
    generated_content = client.containers.files.content.retrieve(
        generated_file_id,
        container_id=container.id,
    )
    generated_bytes = generated_content.read()
    print(f"Downloaded generated file: {len(generated_bytes)} bytes")
    print(generated_bytes)

# %%
# Check if a container exists and is usable before reusing it
# Useful when storing container.id across sessions (e.g. in chat state)

from openai import NotFoundError

stored_container_id = container.id  # e.g. loaded from persistent state

try:
    existing_container = client.containers.retrieve(stored_container_id)
    if existing_container.status not in ["active", "running"]:
        container = client.containers.create(
            name="code_execution_container",
            expires_after={"anchor": "last_active_at", "minutes": 20},
        )
    else:
        container = existing_container
except NotFoundError:
    container = client.containers.create(
        name="code_execution_container",
        expires_after={"anchor": "last_active_at", "minutes": 20},
    )

# %%
# Check if a file already exists in the container before re-uploading

stored_file_id = uploaded_file.id  # e.g. loaded from persistent state

try:
    client.containers.files.retrieve(stored_file_id, container_id=container.id)
    print("File already exists, skipping upload")
except NotFoundError:
    uploaded_file = client.containers.files.create(
        container.id,
        file=("data.csv", csv_bytes),
    )
    print(f"Re-uploaded file: {uploaded_file.id}")
Full Example — Unique platform patterns (Click to expand)
# %%
# Code execution — Unique platform patterns
# Covers: ChatService responses API, ResponsesLanguageModelStreamResponse, ShortTermMemory

from openai import NotFoundError
from openai.types.responses.tool_param import CodeInterpreter
from pydantic import BaseModel

from unique_toolkit.app.dev_util import get_event_generator
from unique_toolkit.app.schemas import ChatEvent
from unique_toolkit.app.unique_settings import UniqueSettings
from unique_toolkit.agentic.short_term_memory_manager.persistent_short_term_memory_manager import (
    PersistentShortMemoryManager,
)
from unique_toolkit.chat.service import Content
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName
from unique_toolkit.short_term_memory.service import ShortTermMemoryService
from unique_toolkit import ChatService
from unique_toolkit.services.knowledge_base import KnowledgeBaseService

settings = UniqueSettings.from_env_auto_with_sdk_init("qa.env")

# %%
# Memory schema — persists container_id and uploaded file_ids across turns

class CodeExecutionMemory(BaseModel):
    container_id: str | None = None
    file_ids: dict[str, str] = {}  # internal_file_id -> OpenAI container file id


# %%
# Per-turn handler

model_name = LanguageModelName.AZURE_GPT_5_2025_0807

for event in get_event_generator(unique_settings=settings, event_type=ChatEvent):
    chat_service = ChatService(event)
    kb_service = KnowledgeBaseService.from_event(event)
    client = get_openai_client(
        additional_headers={"x-model": model_name}
    )

    # %%
    # Set up short-term memory manager at chat scope (message_id=None)

    stm_service = ShortTermMemoryService(
        company_id=event.company_id,
        user_id=event.user_id,
        chat_id=event.payload.chat_id,
        message_id=None,
    )
    memory_manager: PersistentShortMemoryManager[CodeExecutionMemory] = (
        PersistentShortMemoryManager(
            short_term_memory_service=stm_service,
            short_term_memory_schema=CodeExecutionMemory,
            short_term_memory_name=f"code_execution_{event.payload.chat_id}", 
        )
    )

    # %%
    # Load memory from previous turn (None if first turn)

    memory = memory_manager.load_sync() or CodeExecutionMemory()
    print(f"Loaded memory: container_id={memory.container_id}, files={list(memory.file_ids)}")

    # %%
    # Create or reuse the container

    if memory.container_id is not None:
        try:
            container = client.containers.retrieve(memory.container_id)
            # This field is not well-typed in the openai sdk, this was found through trial and error
            if container.status not in ["active", "running"]:
                print(f"Container status is '{container.status}', recreating")
                memory = CodeExecutionMemory()
        except NotFoundError:
            print("Container not found, recreating")
            memory = CodeExecutionMemory()

    if memory.container_id is None:
        container = client.containers.create(
            name=f"code_execution_{event.payload.chat_id}",
            expires_after={"anchor": "last_active_at", "minutes": 20},
        )
        memory.container_id = container.id
        print(f"Created container: {memory.container_id}")
    else:
        print(f"Reusing container: {memory.container_id}")

    # %%
    # Upload files to the container, skipping any already present
    # Replace `files_to_upload` with actual file objects that have .id, .name, .content_bytes

    files_to_upload: list[Content] = []  # e.g. fetched from KnowledgeBaseService

    for file in files_to_upload:
        if file.id in memory.file_ids:
            try:
                client.containers.files.retrieve(
                    memory.file_ids[file.id],
                    container_id=memory.container_id,
                )
                print(f"File {file.id} already in container, skipping")
                continue
            except NotFoundError:
                pass  # file disappeared — re-upload below

        file_content = kb_service.download_content_to_bytes(
            content_id=file.id
        )
        openai_file = client.containers.files.create(
            memory.container_id,
            file=(file.key, file_content),
        )
        memory.file_ids[file.id] = openai_file.id
        print(f"Uploaded {file.key} -> {openai_file.id}")

    # %%
    # Call the Responses API via ChatService
    # complete_responses_with_references handles auth, streaming, and message writing

    code_interpreter_tool = CodeInterpreter(
        type="code_interpreter",
        container=memory.container_id,
    )

    response = chat_service.complete_responses_with_references(
        model_name=model_name,
        messages=event.payload.user_message.text,
        tools=[code_interpreter_tool],
        include=["code_interpreter_call.outputs"],
    )

    # %%
    # Inspect code interpreter calls from the response
    # response.code_interpreter_calls is a convenience property on ResponsesLanguageModelStreamResponse

    for call in response.code_interpreter_calls:
        print(f"Code interpreter call: {call.id}")

    # %%
    # Download files generated by the model during code execution
    # response.container_files parses all container_file_citation annotations automatically

    for citation in response.container_files:
        file_content = client.containers.files.content.retrieve(
            citation.file_id,
            container_id=citation.container_id,
        )
        generated_bytes = file_content.read()
        print(f"Generated file: {citation.filename}  ({len(generated_bytes)} bytes)")

    # %%
    # Save updated memory (new container_id and/or file_ids) for next turn

    memory_manager.save_sync(memory)
    print(f"Saved memory: container_id={memory.container_id}, files={list(memory.file_ids)}")