HyperSaaS
BackendDocuments & RAG

Hybrid Retrieval

Semantic + keyword search with Reciprocal Rank Fusion.

HyperSaaS uses a hybrid retrieval strategy combining vector similarity search and PostgreSQL full-text search, fused with Reciprocal Rank Fusion (RRF).

Search Function

def search_documents(
    query: str,
    session_id: str,
    workspace_id: str,
    top_k: int = 5,
    semantic_candidates: int = 20,
    keyword_candidates: int = 20,
) -> list[dict]:

Called by the search_knowledge_base agent tool during chat conversations.

Retrieval Flow

User query


1. Resolve active documents
    │  - Get KBs attached to ChatSession
    │  - Get Documents in those KBs with status="ready"
    │  - Scope to workspace

    ├──────────────────────────┐
    │                          │
    ▼                          ▼
2a. Semantic Search        2b. Keyword Search
    (pgvector cosine)          (PostgreSQL FTS)
    → top 20 candidates        → top 20 candidates
    │                          │
    └──────────┬───────────────┘


3. Reciprocal Rank Fusion
    → top 5 results


4. Return results with citations

Embeds the query and finds the closest document chunks using pgvector's cosine distance:

from pgvector.django import CosineDistance

query_embedding = embeddings.embed_query(query)

chunks = (
    DocumentChunk.objects
    .filter(document_id__in=document_ids)
    .annotate(distance=CosineDistance("embedding", query_embedding))
    .order_by("distance")
    [:semantic_candidates]
)

Distance is converted to similarity: score = 1.0 - distance.

The HNSW index (m=16, ef_construction=64, cosine ops) enables approximate nearest neighbor search — fast even with millions of chunks.

Uses PostgreSQL's built-in full-text search with websearch mode:

from django.contrib.postgres.search import SearchQuery, SearchVector, SearchRank

search_query = SearchQuery(query, search_type="websearch")
search_vector = SearchVector("content")

chunks = (
    DocumentChunk.objects
    .filter(document_id__in=document_ids)
    .annotate(rank=SearchRank(search_vector, search_query))
    .filter(rank__gt=0.0)
    .order_by("-rank")
    [:keyword_candidates]
)

The websearch mode supports natural language queries with boolean operators (AND, OR, -exclude).

Reciprocal Rank Fusion

RRF merges the two ranked lists without a learned fusion model:

def _reciprocal_rank_fusion(
    semantic_results: list,
    keyword_results: list,
    k: int = 60,
    top_k: int = 10,
) -> list:
    scores = {}
    for rank, result in enumerate(semantic_results):
        scores[chunk_id] = 1 / (k + rank + 1)
    for rank, result in enumerate(keyword_results):
        scores[chunk_id] += 1 / (k + rank + 1)
    return sorted(scores, reverse=True)[:top_k]

The constant k=60 reduces tail-heavy bias. Chunks appearing in both lists get higher fused scores.

Example scores:

ChunkSemantic RankKeyword RankRRF Score
A#0#01/61 + 1/61 = 0.0328
B#21/63 = 0.0159
C#11/62 = 0.0161

Result Format

Each result includes full citation metadata:

{
  "chunk_id": "uuid",
  "content": "The chunk text content...",
  "document_id": "uuid",
  "document_name": "Product Guide.pdf",
  "source_type": "file",
  "source_url": "",
  "chunk_index": 5,
  "page_number": 12,
  "section_heading": "Installation",
  "chunk_metadata": {},
  "score": 0.0328
}

For YouTube sources, source_url contains the video URL and chunk metadata includes timestamps.

Agent Tool Integration

The RAG tool wraps search_documents as a plain function, then each agent framework adds its own decorator:

# documents/rag_tool.py — framework-agnostic
def search_knowledge_base_impl(query: str, session) -> str:
    results = search_documents(
        query=query,
        session_id=str(session.id),
        workspace_id=str(session.workspace_id),
    )
    return json.dumps(results)

The agent decides when to call the search tool based on the user's question. Results are returned as context for the LLM to synthesize a response with citations.

URL Ingest Tool

The ingest_url tool allows the agent to add new content during conversation:

def ingest_url_impl(url: str, session) -> str:
    # Auto-detects YouTube vs web_url
    # Creates Document + auto-creates session KB
    # Dispatches Celery ingestion task
    return json.dumps({"status": "processing", "document_id": doc.id})

This enables users to say "read this article" or "watch this video" and have it ingested into the session's knowledge base in real time.

On this page