Intelli Search

IntelliSearch is a production-grade hybrid search system over a dataset of millions of company profiles. It combines classical BM25 keyword retrieval with dense vector search and an agentic reasoning layer — all unified behind a single POST /search/intelligent endpoint that auto-selects the right strategy per query.

The core insight is that no single retrieval method is universally best. Keyword search dominates on precise terms; embeddings dominate on meaning. An intent classifier decides which to use — or escalates to an LLM-orchestrated agent for queries that need live data or multi-step reasoning. Reciprocal Rank Fusion merges the signals without manual weight tuning.

Three execution modes, one entry point

RegularBM25

Fast lexical match over indexed fields. Wins on exact terms, names, and domain-specific jargon that embeddings can dilute.

SemantickNN · 384-dim HNSW

Dense vector search over sentence-transformer embeddings. Catches synonyms, paraphrases, and conceptual intent that keyword search misses.

AgenticLangGraph + GPT-4o

Orchestrated pipeline that issues structured sub-queries, fetches live data via external tools, and re-ranks with an LLM. Used for reasoning-heavy or time-sensitive queries.

How it works

Intent Classification

Every query passes through an GPT-4o-mini before touching the index. The classifier decides which execution mode — regular, semantic, or agentic — fits the query, then routes accordingly. Confident queries go straight to the index; ambiguous ones land in semantic mode by default.

Reciprocal Rank Fusion

BM25 and kNN return separate ranked lists. RRF merges them by position rather than raw score, which means you don't need to hand-tune weighting constants. Empirically, this outperforms any single-signal ranking on recall-at-10 for mixed query types.

Pre-warmed HNSW Index

7 million company profiles, each carrying a 384-dimensional embedding, land in an OpenSearch HNSW graph that weighs 5–7 GB. A startup hook calls the OpenSearch warmup API before the first request, so cold-start latency does not surface to users.

Streaming Results via SSE

The agentic mode streams intermediate progress — classification, embedding, vector search, tool calls — as Server-Sent Events. The UI renders a live thinking panel so users see what the system is doing instead of staring at a spinner.

Cascading Filters

Country → state → city facets load progressively from OpenSearch keyword aggregations. Industry, company size, and founded-year ranges layer on top. All filters are hard constraints sent alongside the query; the classifier cannot override them.

Data Ingestion Pipeline

A standalone Python pipeline reads raw company data in chunks, cleans and enriches each record, generates dense embeddings in batch, then bulk-indexes into OpenSearch with parallel workers. Fully re-runnable — re-index by re-running the pipeline.

Stack

FastAPIOpenSearch (kNN + BM25)Redis (facet cache)Tavily (web + LinkedIn tools)SSE streamingGPT-4o-mini (classifier)GPT-4o (agentic extractor)all-MiniLM-L6-v2 (embeddings)

Loading…