The KnowledgeBaseService provides comprehensive capabilities to interact with the knowledge base, including file upload/download, content search, and metadata filtering. A Content represents a file of any type stored in the knowledge base.
Core Capabilities:
- Upload & Download: Store and retrieve files securely
- Search: Find content using semantic (vector), keyword, or hybrid search
- Metadata Filtering: Use smart rules to narrow search results
- Chat Integration: Attach files to chat messages for user access
For security, prefer uploading from memory to avoid disk-based information leakage. This method takes raw bytes and uploads them directly to the knowledge base without creating intermediate files on disk.
When you must upload from disk (e.g., when working with large files or when the content is already saved locally):
- skip_ingestion: Controls whether the content should be processed for semantic search. Set to True to make the content searchable via vector/keyword search, or False if you only need to store the file without indexing it.
Use cases:
- Files generated by external libraries that write to disk
- Batch uploads of existing files
# Configure ingestion settingscontent=kb_service.upload_content(path_to_content=str(file_path),content_name=Path(file_path).name,mime_type="text/plain",scope_id=scope_id,skip_ingestion=False,# Process the content for searchmetadata={"department":"legal","classification":"confidential"})
When you generate or process a file that should be shown to the user in the chat interface, you need to:
1. Upload the content to the knowledge base
2. Create a ContentReference linking to the uploaded content
3. Attach the reference to an assistant message
This makes the file appear as a downloadable attachment in the chat.
uploaded_content=kb_service.upload_content(path_to_content=str(output_filepath),content_name=output_filepath.name,mime_type=str(mimetypes.guess_type(output_filepath)[0]),chat_id=payload.chat_id,skip_ingestion=skip_ingestion,# Usually True for generated files)reference=ContentReference(id=content.id,sequence_number=1,message_id=message_id,name=filename,source=payload.name,source_id=chat_id,url=f"unique://content/{uploaded_content.id}",# Special URL format for content)self.chat_service.modify_assistant_message(content="Please find the translated document below in the references.",references=[reference],set_completed_at=True,)
Common use cases:
- Returning generated reports, summaries, or translations
- Providing processed/converted files (e.g., PDF to Word)
- Making analysis results available for download
# %%frompathlibimportPathfromdotenvimportdotenv_valuesfromunique_toolkitimport(KnowledgeBaseService,)kb_service=KnowledgeBaseService.from_settings()demo_env_vars=dotenv_values(Path(__file__).parent/"demo.env")scope_id=demo_env_vars.get("UNIQUE_SCOPE_ID")or"unknown"file_path=Path(__file__).parent/"test.txt"# Configure ingestion settingscontent=kb_service.upload_content(path_to_content=str(file_path),content_name=Path(file_path).name,mime_type="text/plain",scope_id=scope_id,skip_ingestion=False,# Process the content for searchmetadata={"department":"legal","classification":"confidential"},)
Prefer downloading to memory for security - this approach avoids leaving sensitive data on disk and is suitable for most use cases where you can process the content directly in memory.
How it works:
1. download_content_to_bytes() retrieves the file content as raw bytes
2. Use io.BytesIO() to create a file-like object in memory that many libraries can read from
3. Process the content directly without touching the filesystem
Common use cases:
- Reading text files
- Processing images with PIL/Pillow
- Parsing JSON/XML/CSV data
- Any operation where the library supports file-like objects or byte streams
# Download content as bytescontent_bytes=kb_service.download_content_to_bytes(content_id=content_idor"unknown",)# Process in memorytext=""withio.BytesIO(content_bytes)asfile_like:text=file_like.read().decode("utf-8")print(text)
When you need a file on disk, use secure temporary directories. This is necessary when:
- A library requires a file path and cannot work with file-like objects or bytes
- You need to pass the file to an external command-line tool
- The file format requires random access (seeking) not available with streams
Important security practices:
1. Always use tempfile.mkdtemp() to create a secure, random temporary directory
2. Use a try/finally block to ensure cleanup happens even if an error occurs
3. Delete both the file and the temporary directory when done
# Download to secure temporary filefilename="my_testfile.txt"temp_file_path=kb_service.download_content_to_file(content_id=content_id,output_filename=filename,output_dir_path=Path(tempfile.mkdtemp())# Use secure temp directory)try:# Process the filewithopen(temp_file_path,'rb')asfile:text=file.read().decode("utf-8")print(text)finally:# Always clean up temporary filesiftemp_file_path.exists():temp_file_path.unlink()# Clean up the temporary directorytemp_file_path.parent.rmdir()
# %%importiofrompathlibimportPathfromdotenvimportdotenv_valuesfromunique_toolkitimport(KnowledgeBaseService,)kb_service=KnowledgeBaseService.from_settings()demo_env_vars=dotenv_values(Path(__file__).parent/"demo.env")content_id=demo_env_vars.get("UNIQUE_CONTENT_ID")or"unknown"# Download content as bytescontent_bytes=kb_service.download_content_to_bytes(content_id=content_idor"unknown",)# Process in memorytext=""withio.BytesIO(content_bytes)asfile_like:text=file_like.read().decode("utf-8")print(text)
# %%importtempfilefrompathlibimportPathfromdotenvimportdotenv_valuesfromunique_toolkitimport(KnowledgeBaseService,)kb_service=KnowledgeBaseService.from_settings()demo_env_vars=dotenv_values(Path(__file__).parent/"demo.env")content_id=demo_env_vars.get("UNIQUE_CONTENT_ID")or"unknown"# Download to secure temporary filefilename="my_testfile.txt"temp_file_path=kb_service.download_content_to_file(content_id=content_id,output_filename=filename,output_dir_path=Path(tempfile.mkdtemp()),# Use secure temp directory)try:# Process the filewithopen(temp_file_path,"rb")asfile:text=file.read().decode("utf-8")print(text)finally:# Always clean up temporary filesiftemp_file_path.exists():temp_file_path.unlink()# Clean up the temporary directorytemp_file_path.parent.rmdir()
Permanently removes content from the knowledge base. This operation:
- Deletes the file from storage
- Removes all indexed chunks from the vector database
Use vector search for semantic similarity matching. This search method understands the meaning of your query and finds conceptually similar content, even if the exact words don't match.
How it works:
- Your search string is converted to a vector embedding
- The system finds content chunks with similar embeddings
- Results are ranked by semantic similarity
Parameters:
- search_string: Your natural language query
- search_type: Set to ContentSearchType.VECTOR for semantic search
- limit: Maximum number of chunks to return
- score_threshold: Minimum similarity score (0.0 to 1.0). Higher values = stricter matching
- scope_ids: Optional list of folder IDs to restrict search scope
Best for:
- Natural language queries
- Finding conceptually related content
- When exact keyword matching isn't necessary
# Search for content using vector similaritycontent_chunks=kb_service.search_content_chunks(search_string="Harry Potter",search_type=ContentSearchType.VECTOR,limit=10,score_threshold=0.7,# Only return results with high similarityscope_ids=[scope_id])print(f"Found {len(content_chunks)} relevant chunks")fori,chunkinenumerate(content_chunks[:3]):print(f" {i+1}. {chunk.text[:100]}...")
Combine semantic and keyword search for best results. This approach provides the most comprehensive results by leveraging both search methods.
How it works:
- Performs both vector (semantic) and keyword (full-text) search in parallel
- Merges and ranks results using a hybrid scoring algorithm
- Returns the most relevant matches from both search types
Recommended as the default search type for most use cases.
# Combined semantic and keyword search for best resultscontent_chunks=kb_service.search_content_chunks(search_string="Harry Potter",search_type=ContentSearchType.COMBINED,limit=15,search_language="english",scope_ids=[scope_id],# Limit to specific scopes if configured)print(f"Combined search found {len(content_chunks)} chunks")
Search for complete content files (not chunks) by metadata. This is useful when you want to find whole files rather than text snippets.
Difference from chunk search:
- search_content_chunks(): Returns text snippets from within files
- search_contents(): Returns complete file metadata objects
Use cases:
- Listing all files in a folder
- Finding files by title, creation date, or custom metadata
- Getting files uploaded to a specific chat
# %%frompathlibimportPathfromdotenvimportdotenv_valuesfromunique_toolkitimport(KnowledgeBaseService,)fromunique_toolkit.content.schemasimport(ContentSearchType,)kb_service=KnowledgeBaseService.from_settings()demo_env_vars=dotenv_values(Path(__file__).parent/"demo.env")scope_id=demo_env_vars.get("UNIQUE_SCOPE_ID")or"unknown"# Search for content using vector similaritycontent_chunks=kb_service.search_content_chunks(search_string="Harry Potter",search_type=ContentSearchType.VECTOR,limit=10,score_threshold=0.7,# Only return results with high similarityscope_ids=[scope_id],)print(f"Found {len(content_chunks)} relevant chunks")fori,chunkinenumerate(content_chunks[:3]):print(f" {i+1}. {chunk.text[:100]}...")
# %%frompathlibimportPathfromdotenvimportdotenv_valuesfromunique_toolkitimport(KnowledgeBaseService,)fromunique_toolkit.content.schemasimport(ContentSearchType,)kb_service=KnowledgeBaseService.from_settings()demo_env_vars=dotenv_values(Path(__file__).parent/"demo.env")scope_id=demo_env_vars.get("UNIQUE_SCOPE_ID")or"unknown"# Combined semantic and keyword search for best resultscontent_chunks=kb_service.search_content_chunks(search_string="Harry Potter",search_type=ContentSearchType.COMBINED,limit=15,search_language="english",scope_ids=[scope_id],# Limit to specific scopes if configured)print(f"Combined search found {len(content_chunks)} chunks")
# %%frompathlibimportPathfromdotenvimportdotenv_valuesfromunique_toolkitimport(KnowledgeBaseService,)kb_service=KnowledgeBaseService.from_settings()demo_env_vars=dotenv_values(Path(__file__).parent/"demo.env")scope_id=demo_env_vars.get("UNIQUE_SCOPE_ID")or"unknown"# Search for specific content filescontents=kb_service.search_contents(where={"title":{"contains":"manual"}},)
importtempfileimportostemp_dir=tempfile.mkdtemp()try:# Your file operationspassfinally:# Clean up all files in temp directoryimportshutilshutil.rmtree(temp_dir)
Secure File Names: Use random names for temporary files to prevent information leakage through file names.