Knowledge Base Service¶

The KnowledgeBaseService provides comprehensive capabilities to interact with the knowledge base, including file upload/download, content search, and metadata filtering. A Content represents a file of any type stored in the knowledge base.

Initialization¶

#initialize_kb_service_standalone
kb_service = KnowledgeBaseService.from_settings()

Core Capabilities¶

Upload & Download: Store and retrieve files securely
Search: Find content using semantic (vector), keyword, or hybrid search
Metadata Filtering: Use smart rules to narrow search results
Chat Integration: Attach files to chat messages for user access

Content Upload¶

Upload from Memory (Recommended)¶

For security, prefer uploading from memory to avoid disk-based information leakage. This method takes raw bytes and uploads them directly to the knowledge base without creating intermediate files on disk.

For implementation examples, see the Content Upload Examples.

Upload from File¶

When you must upload from disk (e.g., when working with large files or when the content is already saved locally): - skip_ingestion: Controls whether the content should be processed for semantic search. Set to True to make the content searchable via vector/keyword search, or False if you only need to store the file without indexing it.

Use cases:

Files generated by external libraries that write to disk
Batch uploads of existing files

Make Uploaded Document Available to User¶

When you generate or process a file that should be shown to the user in the chat interface, you need to: 1. Upload the content to the knowledge base 2. Create a ContentReference linking to the uploaded content 3. Attach the reference to an assistant message

This makes the file appear as a downloadable attachment in the chat.

Common use cases:

Returning generated reports, summaries, or translations
Providing processed/converted files (e.g., PDF to Word)
Making analysis results available for download

Content Download¶

Download to Memory (Recommended)¶

Prefer downloading to memory for security - this approach avoids leaving sensitive data on disk and is suitable for most use cases where you can process the content directly in memory.

How it works:

download_content_to_bytes() retrieves the file content as raw bytes
Use io.BytesIO() to create a file-like object in memory that many libraries can read from
Process the content directly without touching the filesystem

Common use cases:

Reading text files
Processing images with PIL/Pillow
Parsing JSON/XML/CSV data
Any operation where the library supports file-like objects or byte streams

For implementation examples, see the Content Download Examples.

Download to Temporary File¶

When you need a file on disk, use secure temporary directories. This is necessary when:

A library requires a file path and cannot work with file-like objects or bytes
You need to pass the file to an external command-line tool
The file format requires random access (seeking) not available with streams

Important security practices:

Always use tempfile.mkdtemp() to create a secure, random temporary directory
Use a try/finally block to ensure cleanup happens even if an error occurs
Delete both the file and the temporary directory when done

Content Deletion¶

Permanently removes content from the knowledge base. This operation:

Deletes the file from storage
Removes all indexed chunks from the vector database

For implementation examples, see the Content Deletion Examples.

Content Search¶

Semantic Search (Vector-Based)¶

Use vector search for semantic similarity matching. This search method understands the meaning of your query and finds conceptually similar content, even if the exact words don't match.

How it works:

Your search string is converted to a vector embedding
The system finds content chunks with similar embeddings
Results are ranked by semantic similarity

Parameters:

search_string: Your natural language query
search_type: Set to ContentSearchType.VECTOR for semantic search
limit: Maximum number of chunks to return
score_threshold: Minimum similarity score (0.0 to 1.0). Higher values = stricter matching
scope_ids: Optional list of folder IDs to restrict search scope

Best for:

Natural language queries
Finding conceptually related content
When exact keyword matching isn't necessary

Combined Search (Hybrid)¶

Combine semantic and keyword search for best results. This approach provides the most comprehensive results by leveraging both search methods.

How it works:

Performs both vector (semantic) and keyword (full-text) search in parallel
Merges and ranks results using a hybrid scoring algorithm
Returns the most relevant matches from both search types

Recommended as the default search type for most use cases.

Content File Search¶

Search for complete content files (not chunks) by metadata. This is useful when you want to find whole files rather than text snippets.

Difference from chunk search:

search_content_chunks(): Returns text snippets from within files
search_contents(): Returns complete file metadata objects

Use cases:

Listing all files in a folder
Finding files by title, creation date, or custom metadata
Getting files uploaded to a specific chat

For implementation examples, see the Content Search Examples.

Best Practices¶

Security Considerations¶

Prefer Memory Operations: Always prefer download_content_to_bytes() and upload_content_from_bytes() to avoid disk-based information leakage.

Temporary File Cleanup: When using temporary files, always clean them up:

import tempfile
import os

temp_dir = tempfile.mkdtemp()
try:
    # Your file operations
    pass
finally:
    # Clean up all files in temp directory
    import shutil
    shutil.rmtree(temp_dir)

Secure File Names: Use random names for temporary files to prevent information leakage through file names.

Full Examples¶

For complete, runnable examples of all knowledge base features, see the Knowledge Base Examples section.