We ❤️ Open Source
A community education resource
The unstructured data challenge: Why context is key for scaling AI agents
How to engineer context for AI agents that actually work by turning chaotic data into clarity.
As enterprises race to adopt agentic AI across their business, the demand for context engineering will surge because relevance and reliability from AI agents will hinge on their ability to access and utilize the right data — structured and unstructured —at the right time. In agentic AI, unstructured data is messy, but an equal partner to relational databases and data warehouses. Whether it’s Slack messages, PDF contracts, issue tracker notes, system logs, or the notes column of the CRM that the sales team uses, unstructured data is the key to unlocking critical insights—but it’s inherently difficult for AI to interpret without deliberate preparation.
Developers and engineers face overwhelming volumes of such data. A single Kubernetes cluster can generate 30 to 50 gigabytes of logs daily. Attempting to find meaningful signals in this noise can leave human operators drowning in the data, and blind AI systems alike. Historically, many companies have resorted to costly and brittle pipelines, dropping valuable data or “logging and forgetting” rather than harnessing it effectively.
The consequence? Many AI projects fail not because the models lack intelligence but because they lack relevant context for that enterprise. Without precise access to the right pieces of data, AI agents produce inaccurate, incomplete or inconsistent outputs.
Why context engineering is the solution
An ideal agentic AI platform must be engineered to overcome the shortcomings that plague most enterprise AI deployments: hallucinations, context confusion, and stale data. These issues arise when agents operate without sufficient context, relying on generic or outdated knowledge rather than domain-specific, up-to-date insights.
The solution lies in a set of practices collectively known as context engineering, which tailors LLMs to the unique needs of each organization. Key components include:
- Retrieval-augmented generation (RAG): RAG enables agents to pull in relevant, real-time information from enterprise knowledge bases, documents, and databases, grounding their responses in accurate, up-to-date context. This mitigates the risk of hallucinations and ensures that agents can reason over domain-specific data.
- Prompt refinement: By refining prompts and instructions, context engineering ensures that agents understand the nuances of each task, reducing ambiguity, and improving the quality of outputs.
- Memory management: Effective memory management allows agents to maintain context across interactions, remembering past conversations and user preferences to deliver more personalized and coherent responses.
- Better binary quantization: Also known as BBQ, compresses vectors and clusters them into compact partitions for selective disk reads. This reduces random access memory usage, avoids spikes in data retrieval time, and improves system performance for data ingestion and organization.
These practices enable AI agents to quickly and accurately retrieve the data from unstructured sources, while ensuring data is governed and orchestrated in accordance with organizational policies.
At Elastic, we’ve built our platform to address the enterprise shortcomings with agentic AI by embedding context engineering at its core.
Streams: Proactively use logs to problem solve
For example, Elastic recently announced Streams, an agentic AI-powered solution that rethinks how teams work with logs to enable much faster incident investigation and resolution.
From raw, voluminous, messy data, Streams automatically creates structure, putting it into a form that is usable, automatically alerts you to issues and helps you remediate them.
Streams uses AI to automatically partition and parse raw logs to extract relevant fields, greatly reducing the effort required of Site Reliability Engineers (SREs) to make logs usable. It uncovers significant events such as critical errors and anomalies from context-rich logs, giving SREs early warnings and a clear understanding of their workloads, enabling them to investigate and resolve issues faster.
Logs, the richest information source, shouldn’t be a last-resort tool. It should be your primary signal for automation and rapid remediation.
The open source imperative
At its core, context engineering thrives in an open source ecosystem that encourages community collaboration, shared innovation, and extensibility across diverse use cases. That’s why Elastic embraces open source principles, from contributing to observability standards through the Elastic Common Schema, to acquiring open source pioneer Jina AI to advance retrieval, embeddings, and context engineering to power agentic AI.
The future of agentic AI depends on mastering unstructured data through context engineering. By turning chaotic data sources into precise, governed, and orchestrated knowledge ecosystems, context engineering enables AI agents to operate intelligently, reliably, and at scale.
Developers and enterprises that embrace these practices, anchored in open-source collaboration and powered by AI-driven platforms, will unlock unprecedented value from their data in 2026 and beyond.
More from We Love Open Source
- Testing Firestore security rules without touching production
- Usage rules: Making AI coding tools accessible to everyone
- How curiosity, Kubernetes, and community shaped my open source journey
- Detecting vulnerabilities in public Helm charts
- Deep dive into the Model Context Protocol
The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.