Code Execution¶
Code execution lets the model run Python in a sandbox for tasks like data analysis, plotting, or file processing. It is available through the Responses API (client.responses.create or ChatService.complete_responses_with_references), not Chat Completions.
What You'll Learn¶
- Using an auto-managed container for code execution (quick start)
- Including code outputs (stdout, stderr, results) in the response
- Creating and using a custom Azure container for per-chat isolation and lifecycle control
- Uploading and downloading files from containers
- Downloading files generated by the model during code execution
- Calling the Responses API through
ChatServiceinstead of the raw OpenAI client - Working with the
ResponsesLanguageModelStreamResponseoutput object and its convenience properties - Persisting container and file state across stateless turns using
ShortTermMemoryService
Prerequisites¶
unique_toolkitand the OpenAI SDK- A model that supports code execution (e.g.
LanguageModelName.AZURE_GPT_5_2025_0807)
For general client setup, see OpenAI Client.
1. Auto-managed container¶
Get a client, define the code interpreter tool with container={"type": "auto"}, and call the Responses API. The API manages the container for you.
from openai.types.responses.tool_param import CodeInterpreter
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName
model_name = LanguageModelName.AZURE_GPT_5_2025_0807
client = get_openai_client()
code_interpreter_tool = CodeInterpreter(type="code_interpreter", container={"type": "auto"})
response = client.responses.create(
model=model_name,
tools=[code_interpreter_tool],
input="Use code to print hello world.",
)
print(response.output)
Including code outputs¶
To get stdout, stderr, or generated images from the code, pass include=["code_interpreter_call.outputs"]. The response.output list will contain both the text block and the code interpreter call with .outputs.
response_with_output = client.responses.create(
model=model_name,
tools=[code_interpreter_tool],
input="Use code to print hello world.",
include=["code_interpreter_call.outputs"],
)
print(response_with_output.output[1].outputs)
Note:
include=["code_interpreter_call.outputs"]returns only images and console output (stdout/stderr). Other files written by the model (e.g. CSVs, Word files) are not included there. To get those, use the approach described in Downloading model-generated files.
2. Custom Azure container¶
Use a custom container when you need to re-use containers (typically for the same chat sessions), and when you want to upload files in order for the client to operate on.
The client x-model header must match the model used for code execution.
Client: Use get_openai_client(additional_headers={"x-model": model_name}) and use the same model in responses.create.
Create container: Call client.containers.create with a name (include something like chat_id to separate chats) and expires_after (e.g. {"anchor": "last_active_at", "minutes": 20}).
Tool: Use CodeInterpreter(type="code_interpreter", container=container.id).
from openai.types.responses.tool_param import CodeInterpreter
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName
model_name = LanguageModelName.AZURE_GPT_5_2025_0807
# Client header should match the model used for code execution
client = get_openai_client(additional_headers={"x-model": model_name})
container = client.containers.create(
name="code_execution_container",
expires_after={"anchor": "last_active_at", "minutes": 20},
)
code_interpreter_tool = CodeInterpreter(type="code_interpreter", container=container.id)
response = client.responses.create(
model=model_name,
tools=[code_interpreter_tool],
input="Use code to print hello world.",
include=["code_interpreter_call.outputs"],
)
Uploading and downloading files¶
File upload and download apply to custom containers (you need a
container_id). With auto containers, the API manages storage differently.
Upload¶
Use client.containers.files.create(container.id, file=(filename, file_content)) where file_content is for example bytes (other formats supported, check OpenAI documentation). The call returns a file object with .id; store it for later (e.g. to avoid re-uploading or to download).
# Example: upload a small CSV as bytes
csv_content = b"name,value\na,1\nb,2\n"
openai_file = client.containers.files.create(
container.id,
file=("data.csv", csv_content),
)
file_id = openai_file.id # store for later
Download¶
- Metadata:
client.containers.files.retrieve(file_id, container_id=container.id) - Content (bytes):
client.containers.files.content.retrieve(file_id, container_id=container.id)
Container lifecycle (e.g. expires_after) applies to these files as well.
List files¶
Use client.containers.files.list(container_id) to iterate over all files currently in the container:
for file in client.containers.files.list(container.id):
print(f"{file.id} {file.path}")
Checking if a file exists¶
Call client.containers.files.retrieve(file_id, container_id=...). It raises openai.NotFoundError if the file does not exist. Use try/except to decide whether to upload or skip.
from openai import NotFoundError
try:
_ = client.containers.files.retrieve(file_id, container_id=container.id)
# file exists, skip upload
except NotFoundError:
# upload the file
openai_file = client.containers.files.create(...)
Checking if a container exists (and is usable)¶
Call client.containers.retrieve(container_id). It raises openai.NotFoundError if the container does not exist. If it exists, check container.status — only treat as usable when status in ["active", "running"]; otherwise create a new container.
from openai import NotFoundError
try:
container = client.containers.retrieve(container_id)
if container.status not in ["active", "running"]:
# create a new container
container = client.containers.create(...)
except NotFoundError:
container = client.containers.create(...)
Downloading model-generated files¶
When the model writes a file during code execution (e.g. a CSV or plot), it references those files as container_file_citation annotations on output_text content items. This works with both auto and custom containers.
Iterate over response.output, find ResponseOutputMessage items, and read the annotations to get the file_id and filename. Then download the content with files.content.retrieve.
from openai.types.responses import ResponseOutputMessage
generated_file_id = None
for item in response.output:
if isinstance(item, ResponseOutputMessage):
for content in item.content:
if content.type == "output_text":
for annotation in content.annotations:
if annotation.type == "container_file_citation":
generated_file_id = annotation.file_id
container_id = annotation.container_id
print(f"Generated file: {annotation.filename} ({generated_file_id})")
if generated_file_id:
file_content = client.containers.files.content.retrieve(
generated_file_id,
container_id=container_id,
)
generated_bytes = file_content.read()
print(f"Downloaded {len(generated_bytes)} bytes")
Note:
include=["code_interpreter_call.outputs"]returns inline stdout/stderr/images in the response. Files saved to disk by the model (e.g. withdf.to_csv(...)) are not included there — use the annotation pattern above to download them.
4. Calling the Responses API via ChatService¶
Use chat_service.complete_responses_with_references() (sync) or complete_responses_with_references_async() (async) instead of calling client.responses.create directly. These methods handle authentication, streaming, and message writing (with references) to the chat automatically.
The signature accepts the same tools, include, and messages arguments as the raw API:
from openai.types.responses.tool_param import CodeInterpreter
from unique_toolkit.language_model import LanguageModelName
code_interpreter_tool = CodeInterpreter(type="code_interpreter", container=container_id)
response = chat_service.complete_responses_with_references(
model_name=LanguageModelName.AZURE_GPT_5_2025_0807,
messages="Read data.csv and plot a histogram. Save the plot as histogram.png.",
tools=[code_interpreter_tool],
include=["code_interpreter_call.outputs"],
)
For the async variant:
response = await chat_service.complete_responses_with_references_async(
model_name=LanguageModelName.AZURE_GPT_5_2025_0807,
messages="Read data.csv and plot a histogram. Save the plot as histogram.png.",
tools=[code_interpreter_tool],
include=["code_interpreter_call.outputs"],
)
The ResponsesLanguageModelStreamResponse output object¶
complete_responses_with_references returns a ResponsesLanguageModelStreamResponse. Its .output field is the raw list[ResponseOutputItem] — identical in structure to what you get from client.responses.create. In addition, the object exposes convenience properties that save you from iterating manually:
| Property | Type | Description |
|---|---|---|
.output |
list[ResponseOutputItem] |
Raw output items (text, code calls, etc.) |
.container_files |
list[AnnotationContainerFileCitation] |
All container_file_citation annotations across all output messages |
.code_interpreter_calls |
list[ResponseCodeInterpreterToolCall] |
All code interpreter call items |
Instead of manually walking .output to find annotations (as shown in section 3), use .container_files directly:
for citation in response.container_files:
file_content = client.containers.files.content.retrieve(
citation.file_id,
container_id=citation.container_id,
)
print(f"{citation.filename}: {len(file_content.read())} bytes")
Each citation has .file_id, .filename, and .container_id.
5. Persisting state with ShortTermMemoryService¶
The assistant is stateless — a new handler instance is created for every incoming message. Without persistence, a new container would be created on every turn and previously uploaded files would be lost. Use PersistentShortMemoryManager to save the container_id and uploaded file IDs to chat-scoped short-term memory, so they can be reused on the next turn.
Define a memory schema¶
from pydantic import BaseModel
class CodeExecutionMemory(BaseModel):
container_id: str | None = None
file_ids: dict[str, str] = {} # Unique file id -> OpenAI container file id
Set up the manager¶
Instantiate at chat scope (using chat_id, not message_id) so memory persists across turns:
from unique_toolkit.short_term_memory.service import ShortTermMemoryService
from unique_toolkit.agentic.short_term_memory_manager.persistent_short_term_memory_manager import (
PersistentShortMemoryManager,
)
stm_service = ShortTermMemoryService(
company_id=event.company_id,
user_id=event.user_id,
chat_id=event.payload.chat_id,
message_id=None, # chat-level scope, not message-level
)
memory_manager = PersistentShortMemoryManager(
short_term_memory_service=stm_service,
short_term_memory_schema=CodeExecutionMemory,
short_term_memory_name="code_execution", # Ideally include a chat_id in the name
)
Per-turn pattern¶
Load at the start of each turn, update in place, save at the end:
from openai import NotFoundError
# 1. Load (returns None if no memory saved yet)
memory = await memory_manager.load_async() or CodeExecutionMemory()
# 2. Create or reuse container
if memory.container_id is not None:
try:
container = await client.containers.retrieve(memory.container_id)
# This field is not well-typed in the openai sdk, this was found through trial and error
if container.status not in ["active", "running"]:
memory = CodeExecutionMemory() # reset: stale container
except NotFoundError:
memory = CodeExecutionMemory() # reset: container gone
if memory.container_id is None:
container = await client.containers.create(
name=f"code_execution_{event.payload.chat_id}",
expires_after={"anchor": "last_active_at", "minutes": 20},
)
memory.container_id = container.id
# 3. Upload files (skip if already uploaded)
for file in files_to_upload:
if file.id in memory.file_ids:
try:
await client.containers.files.retrieve(
memory.file_ids[file.id], container_id=memory.container_id
)
continue # already there
except NotFoundError:
pass # fall through to re-upload
openai_file = await client.containers.files.create(
memory.container_id,
file=(file.name, file.content_bytes),
)
memory.file_ids[file.id] = openai_file.id
# 4. Run inference
code_interpreter_tool = CodeInterpreter(
type="code_interpreter", container=memory.container_id
)
response = await chat_service.complete_responses_with_references_async(
model_name=LanguageModelName.AZURE_GPT_5_2025_0807,
messages=user_message,
tools=[code_interpreter_tool],
include=["code_interpreter_call.outputs"],
)
# 5. Save updated memory
await memory_manager.save_async(memory)
Example scripts¶
Full Example — Quick start with auto container (Click to expand)
# %%
# Code execution with auto-managed container (Responses API)
from openai.types.responses.tool_param import CodeInterpreter
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName
model_name = LanguageModelName.AZURE_GPT_5_2025_0807
client = get_openai_client()
# %%
# Define tool and call Responses API
code_interpreter_tool = CodeInterpreter(type="code_interpreter", container={"type": "auto"})
messages = "Use code to print hello world."
response_with_output = client.responses.create(
model=model_name,
tools=[code_interpreter_tool],
input=messages,
include=["code_interpreter_call.outputs"],
)
# %%
# response.output is a list (e.g. text block, then code_interpreter_call)
print(response_with_output.output)
print(response_with_output.output[1].outputs) # type: ignore[union-attr]
Full Example — Custom Azure container (Click to expand)
# %%
# Code execution with custom Azure container (Responses API)
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName
model_name = LanguageModelName.AZURE_GPT_5_2025_0807
client = get_openai_client(additional_headers={"x-model": model_name})
# %%
# Create a custom Azure container
# Recommended to use chat_id in the name to differentiate containers across chats
container = client.containers.create(
name="code_execution_container",
expires_after={"anchor": "last_active_at", "minutes": 20},
)
print(f"Created container: {container.id}")
# %%
# Upload a file to the container
csv_bytes = b"name,value\nfoo,1\nbar,2"
uploaded_file = client.containers.files.create(
container.id,
file=("data.csv", csv_bytes),
)
print(f"Uploaded file: {uploaded_file.id}")
# %%
# Download a file from the container
# Files produced by code execution are accessible the same way via file_id
file_id = uploaded_file.id # or any file_id from a code interpreter output
file_content = client.containers.files.content.retrieve(
file_id,
container_id=container.id,
)
downloaded_bytes = file_content.read()
assert downloaded_bytes == csv_bytes
# %%
# List all files in the container
for file in client.containers.files.list(container.id):
print(f" {file.id} {file.path}")
# %%
# Define tools and call Responses API
from openai.types.responses.tool_param import CodeInterpreter
code_interpreter_tool = CodeInterpreter(type="code_interpreter", container=container.id)
messages = "Read data.csv and add a random column. Save the result to a new file called data_with_random_column.csv."
response_with_output = client.responses.create(
model=model_name,
tools=[code_interpreter_tool],
input=messages,
include=["code_interpreter_call.outputs"],
)
# %%
# Download a file generated by the model during code execution
# File citations appear as annotations on output text items
from openai.types.responses import ResponseOutputMessage
generated_file_id = None
for item in response_with_output.output:
if isinstance(item, ResponseOutputMessage):
for content in item.content:
if content.type == "output_text":
for annotation in content.annotations:
if annotation.type == "container_file_citation":
generated_file_id = annotation.file_id
print(f"Generated file: {annotation.filename} ({generated_file_id})")
# %%
if generated_file_id:
generated_content = client.containers.files.content.retrieve(
generated_file_id,
container_id=container.id,
)
generated_bytes = generated_content.read()
print(f"Downloaded generated file: {len(generated_bytes)} bytes")
print(generated_bytes)
# %%
# Check if a container exists and is usable before reusing it
# Useful when storing container.id across sessions (e.g. in chat state)
from openai import NotFoundError
stored_container_id = container.id # e.g. loaded from persistent state
try:
existing_container = client.containers.retrieve(stored_container_id)
if existing_container.status not in ["active", "running"]:
container = client.containers.create(
name="code_execution_container",
expires_after={"anchor": "last_active_at", "minutes": 20},
)
else:
container = existing_container
except NotFoundError:
container = client.containers.create(
name="code_execution_container",
expires_after={"anchor": "last_active_at", "minutes": 20},
)
# %%
# Check if a file already exists in the container before re-uploading
stored_file_id = uploaded_file.id # e.g. loaded from persistent state
try:
client.containers.files.retrieve(stored_file_id, container_id=container.id)
print("File already exists, skipping upload")
except NotFoundError:
uploaded_file = client.containers.files.create(
container.id,
file=("data.csv", csv_bytes),
)
print(f"Re-uploaded file: {uploaded_file.id}")
Full Example — Unique platform patterns (Click to expand)
# %%
# Code execution — Unique platform patterns
# Covers: ChatService responses API, ResponsesLanguageModelStreamResponse, ShortTermMemory
from openai import NotFoundError
from openai.types.responses.tool_param import CodeInterpreter
from pydantic import BaseModel
from unique_toolkit.app.dev_util import get_event_generator
from unique_toolkit.app.schemas import ChatEvent
from unique_toolkit.app.unique_settings import UniqueSettings
from unique_toolkit.agentic.short_term_memory_manager.persistent_short_term_memory_manager import (
PersistentShortMemoryManager,
)
from unique_toolkit.chat.service import Content
from unique_toolkit.framework_utilities.openai.client import get_openai_client
from unique_toolkit.language_model import LanguageModelName
from unique_toolkit.short_term_memory.service import ShortTermMemoryService
from unique_toolkit import ChatService
from unique_toolkit.services.knowledge_base import KnowledgeBaseService
settings = UniqueSettings.from_env_auto_with_sdk_init("qa.env")
# %%
# Memory schema — persists container_id and uploaded file_ids across turns
class CodeExecutionMemory(BaseModel):
container_id: str | None = None
file_ids: dict[str, str] = {} # internal_file_id -> OpenAI container file id
# %%
# Per-turn handler
model_name = LanguageModelName.AZURE_GPT_5_2025_0807
for event in get_event_generator(unique_settings=settings, event_type=ChatEvent):
chat_service = ChatService(event)
kb_service = KnowledgeBaseService.from_event(event)
client = get_openai_client(
additional_headers={"x-model": model_name}
)
# %%
# Set up short-term memory manager at chat scope (message_id=None)
stm_service = ShortTermMemoryService(
company_id=event.company_id,
user_id=event.user_id,
chat_id=event.payload.chat_id,
message_id=None,
)
memory_manager: PersistentShortMemoryManager[CodeExecutionMemory] = (
PersistentShortMemoryManager(
short_term_memory_service=stm_service,
short_term_memory_schema=CodeExecutionMemory,
short_term_memory_name=f"code_execution_{event.payload.chat_id}",
)
)
# %%
# Load memory from previous turn (None if first turn)
memory = memory_manager.load_sync() or CodeExecutionMemory()
print(f"Loaded memory: container_id={memory.container_id}, files={list(memory.file_ids)}")
# %%
# Create or reuse the container
if memory.container_id is not None:
try:
container = client.containers.retrieve(memory.container_id)
# This field is not well-typed in the openai sdk, this was found through trial and error
if container.status not in ["active", "running"]:
print(f"Container status is '{container.status}', recreating")
memory = CodeExecutionMemory()
except NotFoundError:
print("Container not found, recreating")
memory = CodeExecutionMemory()
if memory.container_id is None:
container = client.containers.create(
name=f"code_execution_{event.payload.chat_id}",
expires_after={"anchor": "last_active_at", "minutes": 20},
)
memory.container_id = container.id
print(f"Created container: {memory.container_id}")
else:
print(f"Reusing container: {memory.container_id}")
# %%
# Upload files to the container, skipping any already present
# Replace `files_to_upload` with actual file objects that have .id, .name, .content_bytes
files_to_upload: list[Content] = [] # e.g. fetched from KnowledgeBaseService
for file in files_to_upload:
if file.id in memory.file_ids:
try:
client.containers.files.retrieve(
memory.file_ids[file.id],
container_id=memory.container_id,
)
print(f"File {file.id} already in container, skipping")
continue
except NotFoundError:
pass # file disappeared — re-upload below
file_content = kb_service.download_content_to_bytes(
content_id=file.id
)
openai_file = client.containers.files.create(
memory.container_id,
file=(file.key, file_content),
)
memory.file_ids[file.id] = openai_file.id
print(f"Uploaded {file.key} -> {openai_file.id}")
# %%
# Call the Responses API via ChatService
# complete_responses_with_references handles auth, streaming, and message writing
code_interpreter_tool = CodeInterpreter(
type="code_interpreter",
container=memory.container_id,
)
response = chat_service.complete_responses_with_references(
model_name=model_name,
messages=event.payload.user_message.text,
tools=[code_interpreter_tool],
include=["code_interpreter_call.outputs"],
)
# %%
# Inspect code interpreter calls from the response
# response.code_interpreter_calls is a convenience property on ResponsesLanguageModelStreamResponse
for call in response.code_interpreter_calls:
print(f"Code interpreter call: {call.id}")
# %%
# Download files generated by the model during code execution
# response.container_files parses all container_file_citation annotations automatically
for citation in response.container_files:
file_content = client.containers.files.content.retrieve(
citation.file_id,
container_id=citation.container_id,
)
generated_bytes = file_content.read()
print(f"Generated file: {citation.filename} ({len(generated_bytes)} bytes)")
# %%
# Save updated memory (new container_id and/or file_ids) for next turn
memory_manager.save_sync(memory)
print(f"Saved memory: container_id={memory.container_id}, files={list(memory.file_ids)}")