Beyond Chunks: Why Faceted Context Is the Future of RAG

Context with structure: counts, categories, and what lies beyond the results
We shipped a clinical decision support tool in 2023 that I was proud of. The retrieval pipeline was clean. The embeddings were fine-tuned on clinical notes. We had metadata filtering, recency weighting, a careful chunking strategy that respected document boundaries. By every retrieval metric I cared about, it was good.
It failed in the field anyway. Not catastrophically — it did not hallucinate dangerous recommendations. It failed quietly. Physicians stopped using it. When I asked why, the feedback was consistent: "It misses context." A query about a patient's antibiotic tolerance would return relevant text from the allergy section, but the physician still had to go dig through the chart manually. They felt like the tool was handing them puzzle pieces instead of a picture.
I spent a long time thinking that was a retrieval quality problem. More chunks. Better reranking. Hybrid search with BM25. I tried all of it.
The real problem was something different. The agent was returning results. It had no understanding of the information landscape it was searching inside.
What Chunk-Based RAG Actually Gets Wrong
The standard RAG mental model: embed the query, find similar chunks, inject the top-k into context. This works fine when the task is "find the answer." It breaks when the task is "understand enough about this patient to make a good decision."
Those are not the same task. The second one requires the agent to know not just what it found, but what it did not find — and whether the absence of something is meaningful.
Consider a physician asking about a patient's response to a specific medication. A chunk-based system might surface two notes that mention the drug. It might also silently fail to surface that there are eighteen more mentions in the chart, several flagged as adverse events, spanning four years. The agent got a result. The physician got an incomplete picture. The tool gave an answer that was technically grounded and practically dangerous.
The issue is not retrieval quality in the narrow sense. The issue is that the agent has no peripheral vision. It cannot see what it is not returning. It cannot reason about the shape of the information space it just searched.
What Agents Actually Need
In healthcare, the information landscape for any patient query is not flat. It is a structure of types: labs, medications, clinical notes, problem lists, procedures, diagnoses, allergies, social history. Each type has its own temporal dimension. Lab trends matter differently than allergy records. A problem flagged three months ago matters differently than one flagged three years ago. Medication start and stop dates define a completely different picture than medication mentions in notes.
When I think about what a competent clinician does when they open a chart, they are not reading sequentially. They are scanning the structure. How many active medications? When was the last relevant lab panel? Are there recent specialist notes? What is the temporal density of encounters in the last ninety days? That scan gives them a map before they read a word. It tells them what kind of chart this is, where to focus, and what patterns to look for.
A chunk-based RAG agent has none of this. It gets results and acts on them. It cannot tell the difference between a chart with two notes and a chart with two hundred — unless you explicitly tell it.
That is the gap faceted context fills.
Faceted Context Design: What It Is
Faceted context is borrowed from search UI design, where a faceted interface shows you categories, counts, and distributions alongside your results. You search for "antibiotics" and the sidebar tells you: 847 results, 600 from the last year, broken down by drug class, with a filter for adverse events. That sidebar is what turns a list of results into a navigable space.
The insight is that agents need the same thing. Not just the results, but the metadata aggregations, the counts, the category breakdowns, the signal about what exists beyond what was returned. You are not building a better retrieval pipeline. You are teaching the agent to have a mental model of the information landscape.
In practice, this means structuring every tool response so that it contains two distinct layers:
The results layer. What you found. The actual chunks, records, or documents that match the query. This is what every RAG system returns today.
The landscape layer. What exists around what you found. Total record counts per data type. Temporal distribution — how many records exist in the last 30 days, last 90 days, last year. Category breakdowns. Flags for high-density clusters the current query did not surface. Signal about what adjacent queries might be worth making.
Together, these give the agent peripheral vision. It can see what it found and it can see the shape of what it did not find.
A Concrete Healthcare Implementation
Here is what this looked like when we rebuilt the clinical decision support tool.
The old tool response for a medication query looked roughly like this:
{
"results": [
{ "text": "Patient tolerated amoxicillin well in 2021...", "source": "note_482" },
{ "text": "No documented penicillin allergy on file...", "source": "allergy_record_12" }
]
}
The new tool response for the same query looks like this:
{
"results": [
{ "text": "Patient tolerated amoxicillin well in 2021...", "source": "note_482", "date": "2021-03-14", "type": "clinical_note" },
{ "text": "No documented penicillin allergy on file...", "source": "allergy_record_12", "date": "2019-08-22", "type": "allergy_record" }
],
"landscape": {
"total_medication_mentions": 47,
"adverse_event_flags": 3,
"temporal_distribution": {
"last_30_days": 2,
"last_90_days": 8,
"last_365_days": 21,
"older": 26
},
"data_types_present": {
"clinical_notes": 31,
"allergy_records": 4,
"pharmacy_records": 8,
"procedure_notes": 4
},
"result_coverage": 0.04,
"suggested_facets": ["adverse_events", "pharmacy_dispensing", "specialist_notes"]
}
}
That result_coverage field — 0.04 — tells the agent it is looking at four percent of the relevant records. The adverse_event_flags: 3 tells it something clinically significant may be outside its current view. The suggested_facets field gives it explicit navigation hints: here are the adjacent areas worth exploring before drawing a conclusion.
The agent no longer acts on what it found. It uses the landscape to decide whether it has found enough.
What Changes in Practice
The behavioral difference is significant. With chunk-based retrieval, the agent's reasoning pattern was: I found some relevant records, I will synthesize them. With faceted context, the reasoning pattern becomes: I found some relevant records, but I can see there are 45 more I have not looked at, including three adverse event flags — I should query the adverse events explicitly before concluding.
That is not a smarter model. That is the same model with better peripheral vision.
A few specific changes we made that mattered most:
Per-type temporal density signals. For labs especially, "47 results" is meaningless without knowing when they occurred. A patient with 47 lab records spread over ten years is a different clinical picture than one with 47 records in the last six months. The temporal distribution metadata changed how the agent weighted urgency.
Explicit coverage signals. Every search response now includes the percentage of matching records that were returned. Agents treat low coverage as a prompt to refine the query, not a green light to proceed.
Suggested facets as navigation hints. When the landscape layer identifies data types that are present in the full record set but not represented in the returned results, it surfaces them explicitly. The agent treats these as a checklist of things to check before committing to a response.
Absence signaling. Some absence is meaningful. If an agent is reasoning about a patient's cardiac history and the landscape layer reports zero cardiology notes in five years of records, that absence is itself clinical signal. We added explicit fields for expected-but-absent data types based on the query category.
The Broader Principle
What this gets at is a reframe of what RAG is actually for. The goal is not to retrieve relevant chunks. The goal is to give an agent enough understanding of an information space that it can navigate it competently.
Navigation requires maps. A map is not a list of what is at your current location. A map tells you what else is nearby, what the terrain looks like, where the edges are, and what you should explore next. Faceted context is how you build a map for an agent.
In healthcare, the cost of bad navigation is obvious — a wrong conclusion acted on by a clinician. But the principle holds anywhere the information landscape is complex and heterogeneous. Legal document review. Financial due diligence. Enterprise knowledge bases. Any domain where the gap between "the ten results I surfaced" and "the full record set" is meaningful.
Most RAG systems are optimizing retrieval. The engineers who are ahead of this are optimizing navigation. They are asking not "did we find the right chunk?" but "did we give the agent enough visibility into the information landscape to make a good decision about what to look for?"
That question leads to different architecture. Different tool response shapes. Different evaluation criteria.
And in my experience, to agents that physicians actually use.
Where to Start
If you are building a RAG system today and want to move toward faceted context, the minimum viable change is this: add a total_matching_records count and a result_coverage percentage to every tool response. That single change — the agent knowing what fraction of the relevant data it is seeing — will shift its reasoning behavior measurably.
From there, add per-category breakdowns and temporal distributions. Add explicit absence signals for data types you would expect to see but are not present. Add suggested adjacent queries the agent should consider before concluding.
The underlying retrieval can stay the same. You are not rebuilding the search engine. You are building the map on top of it.
The agent will do the rest. It just needs to be able to see.
