Be the first to know and get exclusive access to offers by signing up for our mailing list(s).

Subscribe

We ❤️ Open Source

A community education resource

4 min read

The unstructured data challenge: Why context is key for scaling AI agents

How to engineer context for AI agents that actually work by turning chaotic data into clarity.

As enterprises race to adopt agentic AI across their business, the demand for context engineering will surge because relevance and reliability from AI agents will hinge on their ability to access and utilize the right data — structured and unstructured —at the right time. In agentic AI, unstructured data is messy, but an equal partner to relational databases and data warehouses. Whether it’s Slack messages, PDF contracts, issue tracker notes, system logs, or the notes column of the CRM that the sales team uses, unstructured data is the key to unlocking critical insights—but it’s inherently difficult for AI to interpret without deliberate preparation.

Developers and engineers face overwhelming volumes of such data. A single Kubernetes cluster can generate 30 to 50 gigabytes of logs daily. Attempting to find meaningful signals in this noise can leave human operators drowning in the data, and blind AI systems alike. Historically, many companies have resorted to costly and brittle pipelines, dropping valuable data or “logging and forgetting” rather than harnessing it effectively.

The consequence? Many AI projects fail not because the models lack intelligence but because they lack relevant context for that enterprise. Without precise access to the right pieces of data, AI agents produce inaccurate, incomplete or inconsistent outputs.

Why context engineering is the solution

An ideal agentic AI platform must be engineered to overcome the shortcomings that plague most enterprise AI deployments: hallucinations, context confusion, and stale data. These issues arise when agents operate without sufficient context, relying on generic or outdated knowledge rather than domain-specific, up-to-date insights.

The solution lies in a set of practices collectively known as context engineering, which tailors LLMs to the unique needs of each organization. Key components include:

  • Retrieval-augmented generation (RAG): RAG enables agents to pull in relevant, real-time information from enterprise knowledge bases, documents, and databases, grounding their responses in accurate, up-to-date context. This mitigates the risk of hallucinations and ensures that agents can reason over domain-specific data.
  • Prompt refinement: By refining prompts and instructions, context engineering ensures that agents understand the nuances of each task, reducing ambiguity, and improving the quality of outputs.
  • Memory management: Effective memory management allows agents to maintain context across interactions, remembering past conversations and user preferences to deliver more personalized and coherent responses.
  • Better binary quantization: Also known as BBQ, compresses vectors and clusters them into compact partitions for selective disk reads. This reduces random access memory usage, avoids spikes in data retrieval time, and improves system performance for data ingestion and organization.

These practices enable AI agents to quickly and accurately retrieve the data from unstructured sources, while ensuring data is governed and orchestrated in accordance with organizational policies.

At Elastic, we’ve built our platform to address the enterprise shortcomings with agentic AI by embedding context engineering at its core.

Streams: Proactively use logs to problem solve

For example, Elastic recently announced Streams, an agentic AI-powered solution that rethinks how teams work with logs to enable much faster incident investigation and resolution.

From raw, voluminous, messy data, Streams automatically creates structure, putting it into a form that is usable, automatically alerts you to issues and helps you remediate them.

Streams uses AI to automatically partition and parse raw logs to extract relevant fields, greatly reducing the effort required of Site Reliability Engineers (SREs) to make logs usable. It uncovers significant events such as critical errors and anomalies from context-rich logs, giving SREs early warnings and a clear understanding of their workloads, enabling them to investigate and resolve issues faster.

Logs, the richest information source, shouldn’t be a last-resort tool. It should be your primary signal for automation and rapid remediation.

The open source imperative

At its core, context engineering thrives in an open source ecosystem that encourages community collaboration, shared innovation, and extensibility across diverse use cases. That’s why Elastic embraces open source principles, from contributing to observability standards through the Elastic Common Schema, to acquiring open source pioneer Jina AI to advance retrieval, embeddings, and context engineering to power agentic AI.

The future of agentic AI depends on mastering unstructured data through context engineering. By turning chaotic data sources into precise, governed, and orchestrated knowledge ecosystems, context engineering enables AI agents to operate intelligently, reliably, and at scale.

Developers and enterprises that embrace these practices, anchored in open-source collaboration and powered by AI-driven platforms, will unlock unprecedented value from their data in 2026 and beyond.

More from We Love Open Source

About the Author

Chief Product Officer, Elastic

Read Ken Exner's Full Bio

The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.

Want to contribute your open source content?

Contribute to We ❤️ Open Source

Help educate our community by contributing a blog post, tutorial, or how-to.

We're hosting two world-class events in 2026!

Join us for All Things AI, March 23-24 and for All Things Open, October 19-20.

Open Source Meetups

We host some of the most active open source meetups in the U.S. Get more info and RSVP to an upcoming event.