Knowledge Base Service¶
The KnowledgeBaseService provides comprehensive capabilities to interact with the knowledge base, including file upload/download, content search, and metadata filtering. A Content represents a file of any type stored in the knowledge base.
Initialization¶
| #initialize_kb_service_standalone | |
|---|---|
Core Capabilities¶
- Upload & Download: Store and retrieve files securely
- Search: Find content using semantic (vector), keyword, or hybrid search
- Metadata Filtering: Use smart rules to narrow search results
- Chat Integration: Attach files to chat messages for user access
Content Upload¶
Upload from Memory (Recommended)¶
For security, prefer uploading from memory to avoid disk-based information leakage. This method takes raw bytes and uploads them directly to the knowledge base without creating intermediate files on disk.
For implementation examples, see the Content Upload Examples.
Upload from File¶
When you must upload from disk (e.g., when working with large files or when the content is already saved locally):
- skip_ingestion: Controls whether the content should be processed for semantic search. Set to True to make the content searchable via vector/keyword search, or False if you only need to store the file without indexing it.
Use cases:
- Files generated by external libraries that write to disk
- Batch uploads of existing files
Make Uploaded Document Available to User¶
When you generate or process a file that should be shown to the user in the chat interface, you need to:
1. Upload the content to the knowledge base
2. Create a ContentReference linking to the uploaded content
3. Attach the reference to an assistant message
This makes the file appear as a downloadable attachment in the chat.
Common use cases:
- Returning generated reports, summaries, or translations
- Providing processed/converted files (e.g., PDF to Word)
- Making analysis results available for download
Content Download¶
Download to Memory (Recommended)¶
Prefer downloading to memory for security - this approach avoids leaving sensitive data on disk and is suitable for most use cases where you can process the content directly in memory.
How it works:
download_content_to_bytes()retrieves the file content as raw bytes- Use
io.BytesIO()to create a file-like object in memory that many libraries can read from - Process the content directly without touching the filesystem
Common use cases:
- Reading text files
- Processing images with PIL/Pillow
- Parsing JSON/XML/CSV data
- Any operation where the library supports file-like objects or byte streams
For implementation examples, see the Content Download Examples.
Download to Temporary File¶
When you need a file on disk, use secure temporary directories. This is necessary when:
- A library requires a file path and cannot work with file-like objects or bytes
- You need to pass the file to an external command-line tool
- The file format requires random access (seeking) not available with streams
Important security practices:
- Always use
tempfile.mkdtemp()to create a secure, random temporary directory - Use a try/finally block to ensure cleanup happens even if an error occurs
- Delete both the file and the temporary directory when done
Content Deletion¶
Permanently removes content from the knowledge base. This operation:
- Deletes the file from storage
- Removes all indexed chunks from the vector database
For implementation examples, see the Content Deletion Examples.
Content Search¶
Semantic Search (Vector-Based)¶
Use vector search for semantic similarity matching. This search method understands the meaning of your query and finds conceptually similar content, even if the exact words don't match.
How it works:
- Your search string is converted to a vector embedding
- The system finds content chunks with similar embeddings
- Results are ranked by semantic similarity
Parameters:
search_string: Your natural language querysearch_type: Set toContentSearchType.VECTORfor semantic searchlimit: Maximum number of chunks to returnscore_threshold: Minimum similarity score (0.0 to 1.0). Higher values = stricter matchingscope_ids: Optional list of folder IDs to restrict search scope
Best for:
- Natural language queries
- Finding conceptually related content
- When exact keyword matching isn't necessary
Combined Search (Hybrid)¶
Combine semantic and keyword search for best results. This approach provides the most comprehensive results by leveraging both search methods.
How it works:
- Performs both vector (semantic) and keyword (full-text) search in parallel
- Merges and ranks results using a hybrid scoring algorithm
- Returns the most relevant matches from both search types
Recommended as the default search type for most use cases.
Content File Search¶
Search for complete content files (not chunks) by metadata. This is useful when you want to find whole files rather than text snippets.
Difference from chunk search:
search_content_chunks(): Returns text snippets from within filessearch_contents(): Returns complete file metadata objects
Use cases:
- Listing all files in a folder
- Finding files by title, creation date, or custom metadata
- Getting files uploaded to a specific chat
For implementation examples, see the Content Search Examples.
Best Practices¶
Security Considerations¶
-
Prefer Memory Operations: Always prefer
download_content_to_bytes()andupload_content_from_bytes()to avoid disk-based information leakage. -
Temporary File Cleanup: When using temporary files, always clean them up:
-
Secure File Names: Use random names for temporary files to prevent information leakage through file names.
Full Examples¶
For complete, runnable examples of all knowledge base features, see the Knowledge Base Examples section.