Hybrid Retrieval
Semantic + keyword search with Reciprocal Rank Fusion.
HyperSaaS uses a hybrid retrieval strategy combining vector similarity search and PostgreSQL full-text search, fused with Reciprocal Rank Fusion (RRF).
Search Function
def search_documents(
query: str,
session_id: str,
workspace_id: str,
top_k: int = 5,
semantic_candidates: int = 20,
keyword_candidates: int = 20,
) -> list[dict]:Called by the search_knowledge_base agent tool during chat conversations.
Retrieval Flow
User query
│
▼
1. Resolve active documents
│ - Get KBs attached to ChatSession
│ - Get Documents in those KBs with status="ready"
│ - Scope to workspace
│
├──────────────────────────┐
│ │
▼ ▼
2a. Semantic Search 2b. Keyword Search
(pgvector cosine) (PostgreSQL FTS)
→ top 20 candidates → top 20 candidates
│ │
└──────────┬───────────────┘
│
▼
3. Reciprocal Rank Fusion
→ top 5 results
│
▼
4. Return results with citationsSemantic Search
Embeds the query and finds the closest document chunks using pgvector's cosine distance:
from pgvector.django import CosineDistance
query_embedding = embeddings.embed_query(query)
chunks = (
DocumentChunk.objects
.filter(document_id__in=document_ids)
.annotate(distance=CosineDistance("embedding", query_embedding))
.order_by("distance")
[:semantic_candidates]
)Distance is converted to similarity: score = 1.0 - distance.
The HNSW index (m=16, ef_construction=64, cosine ops) enables approximate nearest neighbor search — fast even with millions of chunks.
Keyword Search
Uses PostgreSQL's built-in full-text search with websearch mode:
from django.contrib.postgres.search import SearchQuery, SearchVector, SearchRank
search_query = SearchQuery(query, search_type="websearch")
search_vector = SearchVector("content")
chunks = (
DocumentChunk.objects
.filter(document_id__in=document_ids)
.annotate(rank=SearchRank(search_vector, search_query))
.filter(rank__gt=0.0)
.order_by("-rank")
[:keyword_candidates]
)The websearch mode supports natural language queries with boolean operators (AND, OR, -exclude).
Reciprocal Rank Fusion
RRF merges the two ranked lists without a learned fusion model:
def _reciprocal_rank_fusion(
semantic_results: list,
keyword_results: list,
k: int = 60,
top_k: int = 10,
) -> list:
scores = {}
for rank, result in enumerate(semantic_results):
scores[chunk_id] = 1 / (k + rank + 1)
for rank, result in enumerate(keyword_results):
scores[chunk_id] += 1 / (k + rank + 1)
return sorted(scores, reverse=True)[:top_k]The constant k=60 reduces tail-heavy bias. Chunks appearing in both lists get higher fused scores.
Example scores:
| Chunk | Semantic Rank | Keyword Rank | RRF Score |
|---|---|---|---|
| A | #0 | #0 | 1/61 + 1/61 = 0.0328 |
| B | #2 | — | 1/63 = 0.0159 |
| C | — | #1 | 1/62 = 0.0161 |
Result Format
Each result includes full citation metadata:
{
"chunk_id": "uuid",
"content": "The chunk text content...",
"document_id": "uuid",
"document_name": "Product Guide.pdf",
"source_type": "file",
"source_url": "",
"chunk_index": 5,
"page_number": 12,
"section_heading": "Installation",
"chunk_metadata": {},
"score": 0.0328
}For YouTube sources, source_url contains the video URL and chunk metadata includes timestamps.
Agent Tool Integration
The RAG tool wraps search_documents as a plain function, then each agent framework adds its own decorator:
# documents/rag_tool.py — framework-agnostic
def search_knowledge_base_impl(query: str, session) -> str:
results = search_documents(
query=query,
session_id=str(session.id),
workspace_id=str(session.workspace_id),
)
return json.dumps(results)The agent decides when to call the search tool based on the user's question. Results are returned as context for the LLM to synthesize a response with citations.
URL Ingest Tool
The ingest_url tool allows the agent to add new content during conversation:
def ingest_url_impl(url: str, session) -> str:
# Auto-detects YouTube vs web_url
# Creates Document + auto-creates session KB
# Dispatches Celery ingestion task
return json.dumps({"status": "processing", "document_id": doc.id})This enables users to say "read this article" or "watch this video" and have it ingested into the session's knowledge base in real time.