Shubham Shrivastava | AI Product Leader

Feb 17, 2026

5 min read

PageIndex vs Vector RAG: A New Approach to Document Retrieval

I spent the last week comparing two approaches to AI retrieval. One has been the industry default for three years. The other launched in September 2025 and claims to make the first one obsolete — at least for a specific class of problems.

The short version: they solve different problems, and most teams are using the wrong one for their use case.

Here's what I found.

The Problem With "Semantic Similarity"

Vector RAG — the dominant approach for the last three years — works like this:

Take your document. Chop it into 500-token pieces. Convert each piece into a list of numbers (an embedding). When someone asks a question, convert that question into numbers too. Find the pieces with the closest numbers. Feed those to your LLM. Done.

It's elegant. And it fails in a specific, infuriating way.

Semantic similarity ≠ relevance.

Ask a vector RAG system "what's the Q3 revenue?" and it might return Q2 and Q4 figures because quarters are semantically similar. They're not similar in meaning — they're different answers to your question. But the math doesn't know that.

A RAG practitioner put it bluntly: "Even after optimizing the chunking + embedding + vector store pipeline, accuracy is usually below 60%."

Below 60%. For enterprise use cases. That's not a bug in your implementation. It's a structural ceiling.

What Chunking Actually Does to Your Document

The real damage isn't the retrieval step. It's what happens before it.

When you chunk a 50-page financial report into 500-token pieces, you destroy its structure. You cut through the middle of a section. You separate the table from the caption that explains it. You lose the cross-reference that says "see Appendix G." You turn a structured, hierarchical document into a bag of text fragments.

Then you ask an LLM to answer precise questions from those fragments.

No wonder it gets the wrong answer. The right answer often requires navigating across sections — from the high-level summary to the specific footnote to the referenced table. You can't do that from a fragment.

The Alternative: Navigate the Document Like a Human

In September 2025, a team at VectifyAI released PageIndex. The premise: instead of embedding and chunking, build a hierarchical tree from the document — essentially an intelligent table of contents with summaries at every level — and have an LLM reason its way through it.

Traditional RAG asks: "Find chunks with high cosine similarity to this query."

PageIndex asks: "If I need Q3 2024 revenue, I'd navigate to Financial Statements → Revenue Analysis → Q3 subsection."

No vector database. No chunking. The LLM navigates the document the same way you would: read the table of contents, identify the relevant section, drill down.

The results on FinanceBench — a benchmark for precise financial document questions — were striking. PageIndex hit 98.7% accuracy versus ~50% for traditional vector RAG. That's not a marginal improvement. That's a different class of system.

Why This Works (And When It Doesn't)

PageIndex works because it preserves what chunking destroys: structure.

Financial reports, legal contracts, technical specs, PRDs — these documents are written to be navigated. They have hierarchies. They have cross-references. They have summaries that point to detail. When you chunk them, you throw all of that away. When you build a tree, you keep it.

The other thing it fixes: multi-hop reasoning. Vector RAG treats every query independently. If answering a question requires connecting information from three different sections — "what's the exception handling when the vendor is flagged AND the PO doesn't match?" — vector RAG struggles. It doesn't know what it already retrieved. PageIndex can follow the chain.

But there are real trade-offs.

Speed and cost: Vector search returns results in under a second. PageIndex makes multiple LLM calls per query to navigate the tree. That's slower and more expensive per query.

Scale: Vector RAG handles large corpora of many documents well. PageIndex is designed for deep retrieval over individual long documents — not fuzzy search across thousands of short ones.

Production readiness: PageIndex was published in September 2025. The benchmarks are impressive. The production deployments are not yet documented. It's a research direction becoming a product — not a drop-in replacement for Pinecone.

The Right Framework for Picking

It's not "PageIndex vs Vector RAG." It's: what kind of retrieval problem do you have?

	Vector RAG	PageIndex
Document type	Short, many, unstructured	Long, structured, hierarchical
Query type	Exploratory, fuzzy	Precise, multi-hop
Stakes	Medium	High (finance, legal, compliance)
Speed requirement	Fast (sub-second)	Slower (LLM-in-the-loop)
Explainability needed	Low	High
Production maturity	Battle-tested	Early-stage

If you're building customer support search across 10,000 knowledge base articles: vector RAG.

If you're building a system that answers CFO questions over 200-page audit reports: PageIndex is the right architecture, even if it's not yet the right product.

The Meta-Lesson

Most AI builders I know hit the retrieval accuracy ceiling and start debugging chunks. Smaller chunks. Larger chunks. Better embeddings. Reranking. Hybrid search. Metadata filtering.

These are all optimizations on top of a flawed assumption: that semantic similarity is a reliable proxy for relevance.

Sometimes the better move is to question the assumption.

PageIndex's contribution isn't just a new tool. It's a proof that reasoning-based retrieval can dramatically outperform similarity-based retrieval for structured, precision-sensitive documents. That mental model shift is worth more than any benchmark number.

The question isn't "which RAG system should I use?"

The question is: does your retrieval problem require finding similar content — or correct content?

Those are different problems. They need different solutions.