Been playing around with Gemini 1.5 Pro, which packs a 2M token context window. At first, I was like, “Who needs RAG or vector databases when I can just throw everything in there?” Yeah… turns out I was very wrong.
✅ Fed it highly relevant info (200k-400k tokens) → Amazing results
❌ Threw in some random irrelevant stuff (50/50) → Quality tanked
It’s like trying to have a convo in a noisy room—sure, you can hear everything, but filtering out the noise is a whole other challenge.
Just because an LLM can handle 2M tokens doesn’t mean you should dump everything in. Focused, high-quality context always beats sheer quantity.
When I use the large context window to provide all relevant info, the results seem better than with smaller windows (e.g., Claude 100k, GPT-4o 128k, etc.)
However, when the context gets messy or irrelevant, RAG with a good chunking strategy works better—leading to fewer hallucinations or made-up responses.
Sometimes less (but more relevant) is more!