Chat Service - Image and Document Handling¶
This tutorial demonstrates how to handle images and documents uploaded to a chat session. The Chat Service provides convenient methods to retrieve and process various types of content that users upload during conversations.
Overview¶
When users upload images or documents through the chat interface, you can: - Download and retrieve uploaded content - Process images to include them in model prompts - Handle documents for analysis or context - Build messages that combine text and visual content
Downloading Images and Documents¶
The download_chat_images_and_documents() method retrieves all images and documents that have been uploaded to the current chat session. This returns two separate lists: one for images and one for other document types.
| #chat_service_document_and_image_download | |
|---|---|
Once you have the list of uploaded content, you can download the actual bytes of any specific file using download_chat_content_to_bytes(). This is useful for processing documents or passing images to vision-capable language models.
Building Messages with Images¶
To send images to vision-capable models, you need to construct multi-part messages that include both text and image content. The OpenAIUserMessageBuilder makes this easy by providing methods to append different content types.
In this example:
1. We download the image bytes and determine the MIME type (e.g., image/png, image/jpeg)
2. We create an OpenAIMessageBuilder to construct the message sequence
3. We use OpenAIUserMessageBuilder to create a multi-part user message containing both text and the image
4. The .iterable_content property provides the properly formatted content for the API
Sending the Message¶
Once your message is built with all the necessary content, you can send it to the language model and stream the response back to the user.
| #chat_service_send_message | |
|---|---|
The complete_with_references() method:
- Sends the messages to the specified language model (in this case, GPT-4o with vision capabilities)
- Automatically streams the response back to the chat interface
- Handles reference management if the model returns any citations
Finally, free_user_input() re-enables the chat input field, allowing the user to send another message. This should be called after the model completes its response to restore interactivity.
Key Considerations¶
- Vision Models: Use vision-capable models like GPT-4o when processing images
- MIME Types: Ensure you provide the correct MIME type for images
- Error Handling: Always check if images/documents exist before processing
- Memory Usage: Large images and documents consume memory; consider processing strategies for multiple files