Why Your Search Returns Nothing — And How MongoDB Vector Search Fixes It
Keyword search can only find what's literally there. When users search for 'laptop bag' and your documents say 'notebook carrying case,' regex won't help. Vector search understands meaning — and MongoDB Atlas supports it natively.
The hidden results problem
Your search works. Or at least, it appears to. Users type a query, results come back, nobody complains too loudly. But there's a class of failure that's almost invisible: the results that should have appeared but didn't.
Consider a product catalog for an online store. A user types "laptop bag" into the search bar. The system returns two products that contain the word "laptop bag" in their title or description. Seems fine.
But there are a dozen other relevant products in the catalog. A "notebook carrying case," a "padded sleeve for 15-inch computers," a "tech commuter backpack with device compartment." None of them contain the literal string "laptop bag." So none of them appear.
The user sees two results and assumes that's all you carry. The system silently hid the most relevant products because the search mechanism is structurally incapable of understanding what the user meant.
How keyword search actually works
Most internal search implementations I've encountered use some form of regex or substring matching. The query string is scanned against a set of indexed fields — productName, description, category, specifications — and any document containing that exact character sequence is returned.
db.products.find({
$or: [
{ productName: { $regex: "laptop bag", $options: "i" } },
{ description: { $regex: "laptop bag", $options: "i" } },
{ category: { $regex: "laptop bag", $options: "i" } },
{ specifications: { $regex: "laptop bag", $options: "i" } },
]
})
This works when the user's language exactly matches the document's language. Searching "laptop" finds documents that say "laptop." Searching "bag" also works because it's a substring of "bags."
But the approach is purely lexical. It has no concept of meaning.
Where it breaks
The failure modes are systematic, not edge cases:
| User searches for | Expects to find | Why keyword search fails |
|---|---|---|
| "laptop bag" | Notebook sleeves, tech backpacks | The product says "carrying case," not "bag" |
| "winter jacket" | Parkas, puffer coats, insulated shells | The product says "thermal outerwear" |
| "kids tablet" | Educational devices, learning pads | The product says "children's interactive screen" |
| "gift for a runner" | Running shoes, fitness trackers, hydration gear | No field contains the concept of "gift for a runner" |
| "something for a road trip" | Coolers, car chargers, travel pillows | Conceptual queries have no literal match |
No amount of field indexing can anticipate every way a user might express their intent. The limitation isn't in the implementation — it's in the paradigm.
The denormalization band-aid
One common reaction is to denormalize: pull related data from other collections into the searchable document. Say your catalog has a products collection with basic metadata, but the rich keyword-friendly descriptions live in a separate productDetails collection linked by SKU.
// Before: lean product document with references
{
"_id": "prod_2241",
"productName": "TechShield Commuter Pack",
"brand": "TechShield",
"skus": ["TS-441", "TS-442", "TS-443"]
}
// After: enriched with detail metadata
{
"_id": "prod_2241",
"productName": "TechShield Commuter Pack",
"brand": "TechShield",
"skus": ["TS-441", "TS-442", "TS-443"],
"variantNames": [
"TechShield Padded Laptop Bag 15-inch, Black",
"TechShield Padded Laptop Bag 15-inch, Navy",
"TechShield Padded Laptop Sleeve 13-inch, Gray"
]
}
Now a search for "laptop bag" will match this product because the string appears in variantNames. This works as a tactical fix. But it introduces a trade-off: every product document must be updated whenever variant data changes, the redundancy must be maintained over time, and you're still playing catch-up with user vocabulary.
A user who searches for "backpack for my MacBook" still won't match "Padded Laptop Bag" unless you keep expanding the denormalized fields. You're patching a fundamentally lexical system one synonym at a time.
Vector search: matching by meaning
Vector search takes a completely different approach. Instead of comparing character sequences, it compares meaning.
The core idea: convert text into high-dimensional numerical representations called embeddings. These are generated by machine-learning models (Voyage AI, OpenAI's text-embedding-3-small, open-source models like nomic-embed-text) trained on massive text corpora. The models learn semantic relationships between words and concepts.
In embedding space:
- Words with similar meanings cluster close together (small vector distance)
- Words with different meanings are far apart (large vector distance)
"laptop bag" → [0.021, -0.187, 0.443, 0.078, ..., 0.312] (768 dimensions)
"notebook sleeve" → [0.019, -0.174, 0.451, 0.065, ..., 0.298] (nearby)
"refrigerator" → [-0.342, 0.501, -0.113, 0.227, ..., -0.089] (distant)
When a user searches for "laptop bag," the query is converted into an embedding and compared against the pre-computed embeddings of all documents. Results are ranked by cosine similarity. The "notebook carrying case" appears — not because of a string match, but because the model understands that carrying cases and bags for laptops inhabit the same semantic neighborhood.
MongoDB Atlas Vector Search: implementation
MongoDB Atlas supports vector search natively. No separate search infrastructure, no Elasticsearch sidecar, no data synchronization pipeline. It runs on your existing cluster.
Step 1: Generate embeddings
For each document, concatenate the semantically meaningful fields and pass them through an embedding model:
def build_embedding_text(product):
parts = [
product.get("productName", ""),
product.get("brand", ""),
product.get("description", ""),
product.get("category", ""),
product.get("specifications", ""),
]
return " | ".join(part for part in parts if part)
For the commuter pack, this produces:
"TechShield Commuter Pack | TechShield | Durable backpack with padded
device compartment and organizer pockets | Bags & Accessories | Water-
resistant nylon, fits up to 15-inch devices"
The resulting embedding captures the concept — "a bag for carrying tech devices, backpack form factor, protective padding." Store it as a new field on the document:
{
"_id": "prod_2241",
"productName": "TechShield Commuter Pack",
"embedding": [0.019, -0.174, 0.451, 0.065, "...", 0.298]
}
Step 2: Create a vector search index
Define the index in MongoDB Atlas:
{
"type": "vectorSearch",
"fields": [
{
"path": "embedding",
"type": "vector",
"numDimensions": 768,
"similarity": "cosine"
}
]
}
The numDimensions must match your embedding model's output size. Cosine similarity is the standard choice for text embeddings.
Step 3: Query with $vectorSearch
At search time, embed the user's query with the same model and pass it to the $vectorSearch aggregation stage:
db.products.aggregate([
{
$vectorSearch: {
index: "product_vector_index",
path: "embedding",
queryVector: embedQuery("laptop bag"),
numCandidates: 100,
limit: 20
}
},
{
$project: {
productName: 1,
brand: 1,
category: 1,
score: { $meta: "vectorSearchScore" }
}
}
])
A search for "laptop bag" now returns:
| Rank | Product | Score |
|---|---|---|
| 1 | TechShield Commuter Pack | 0.92 |
| 2 | SlimGuard Notebook Sleeve 15" | 0.89 |
| 3 | UrbanGear Padded Carrying Case | 0.86 |
| 4 | ProTravel Tech Backpack | 0.81 |
The TechShield product appears — not because of a string match, but because the model understands that a "commuter pack with padded device compartment" is semantically what someone means when they search for "laptop bag."
Why this is fundamentally better
| Dimension | Keyword / Regex Search | Vector Search |
|---|---|---|
| Matching mechanism | Exact substring match | Semantic similarity |
| Handles synonyms | No ("bag" ≠ "case" ≠ "sleeve") | Yes (understands equivalence) |
| Handles paraphrasing | No | Yes ("something to carry my laptop in" → bags) |
| Requires denormalization | Yes — must copy data into searchable fields | No — meaning is captured in the embedding |
| Maintenance burden | High — keep redundant fields in sync | Low — re-embed only when source text changes |
| Typo tolerance | No ("laptpo bag" fails) | Partial (embeddings are robust to minor variations) |
| Conceptual queries | Impossible | Yes ("gear for tech commuters" surfaces relevant products) |
| Ranking quality | Binary (match or no match) | Continuous relevance score |
The most significant advantage is the last one. Keyword search is binary — either a document contains the string or it doesn't. Vector search produces a relevance score, which means results can be ranked by how closely they match the user's intent.
Hybrid search: the pragmatic choice
Pure vector search has one weakness: exact matches. If a user types the precise product name — "TechShield Commuter Pack 15-inch Black" — keyword search will nail it immediately, while vector search might rank it highly but not necessarily first.
MongoDB Atlas supports hybrid search — combining full-text search scores with vector similarity scores using Reciprocal Rank Fusion (RRF):
db.products.aggregate([
{
$vectorSearch: {
index: "product_vector_index",
path: "embedding",
queryVector: embedQuery("laptop bag"),
numCandidates: 100,
limit: 50
}
},
{
$unionWith: {
coll: "products",
pipeline: [
{
$search: {
index: "product_text_index",
text: {
query: "laptop bag",
path: ["productName", "brand", "description", "category"]
}
}
},
{ $limit: 50 }
]
}
}
// Reciprocal Rank Fusion to merge and re-rank results
])
This gives you the best of both worlds:
- Exact product name searches are handled crisply by keyword matching
- Exploratory or conceptual queries ("something waterproof for hiking with my laptop") are handled by vector similarity
- Both signals are fused into a single ranked result set
Enriching embeddings for better results
While vector search solves the hidden results problem without denormalization, you can further improve quality by including related data in the embedding source text. This is a lighter-weight cousin of the denormalization approach — instead of restructuring documents for keyword scanning, you append context to the text that gets embedded:
def build_enriched_embedding_text(product, variant_names):
base = build_embedding_text(product)
variants = " | ".join(variant_names)
return f"{base} | Variants: {variants}"
This gives the embedding model richer context, strengthening the semantic signal for terms that appear in variant details but not in the main product listing. The document structure remains unchanged — only the embedding benefits.
When to adopt this
If your search is backed by regex or basic $text queries in MongoDB, the path forward is clear:
-
Immediate: Audit your highest-traffic search queries. Identify which ones return fewer results than they should. This quantifies the hidden results problem in your system.
-
Short-term: If vector search adoption needs time, consider targeted denormalization for the worst-performing queries. This buys time without architectural change.
-
Medium-term: Implement MongoDB Atlas Vector Search. Generate embeddings from your document metadata, create the index, and validate with A/B testing against the current search.
-
Long-term: Adopt hybrid search combining keyword and vector signals. Extend to additional surfaces — product discovery, recommendations, conversational search.
The result is a search experience where users find what they're looking for, even when they don't use the exact words that appear in your data. That's not a nice-to-have. For any system where search drives engagement or revenue, it's the difference between a product that feels smart and one that feels broken.
This article is part of a series on databases and data infrastructure.
Related Posts

Building a Production Embedding Pipeline with MongoDB Atlas and Voyage AI
Generating embeddings is the easy part. Keeping them in sync as your data changes — at scale, without downtime — is where the real engineering lives. Here's how to build the full pipeline on MongoDB Atlas.

Database Patterns You Should Know Before Choosing Your Next Database
The choice between Postgres and MongoDB isn't about which is 'better.' It's about understanding the access patterns, consistency requirements, and operational constraints of your system.

Building for Scale: Architecture Patterns That Actually Work
Most scaling advice is generic. Here are the patterns that have consistently worked across real systems handling millions of requests — and the ones that sound good but fail in practice.