DEV Community: Dumebi Okolo

Demystifying RAG Architecture for Enterprise Data: A Technical Blueprint

Dumebi Okolo — Fri, 10 Apr 2026 11:00:47 +0000

This article teaches how to engineer a robust Retrieval-Augmented Generation (RAG) pipeline to unlock LLM potential with proprietary information

The advent of Large Language Models (LLMs) has ushered in a new era of AI-powered applications, promising to revolutionize how enterprises interact with information, automate tasks, and generate insights. From crafting marketing copy to summarizing complex legal documents, the capabilities of models like OpenAI's GPT series, Anthropic's Claude, and Meta's Llama have captured the imagination of developers and business leaders alike.

However, the path from impressive public demos to practical, production-ready enterprise solutions is fraught with challenges. While LLMs excel at general knowledge tasks, their utility often diminishes when confronted with an organization's most valuable asset: its proprietary data.

This is where Retrieval-Augmented Generation (RAG) architecture emerges as a critical enabler. RAG provides a robust, scalable, and cost-effective framework for connecting the immense generative power of LLMs with the specific, dynamic, and often sensitive knowledge locked within an enterprise's data silos. It addresses the inherent limitations of standalone LLMs, transforming them from general-purpose conversationalists into domain-specific experts.

This article serves as a comprehensive technical blueprint for software engineers, data engineers, and technical product managers looking to build sophisticated AI features leveraging LLMs with private enterprise data. We will dissect the core problems LLMs face in an enterprise context, introduce the RAG paradigm, and meticulously walk through its three-step pipeline: ingestion and chunking, storage and semantic search, and context-aware generation. We'll also explore common pitfalls and provide actionable insights to ensure your RAG implementation is not just functional, but performant and reliable. By the end, you'll have a clear understanding of how to engineer a RAG solution that empowers your LLMs to speak with authority, accuracy, and relevance on your enterprise's terms.

The Problem with Standalone LLMs

Before diving into the solution, it's crucial to understand the fundamental limitations that prevent standard, off-the-shelf LLMs from being directly applicable to most enterprise use cases without significant augmentation.

The Knowledge Cutoff Problem

Large Language Models are trained on vast datasets of publicly available text and code. This training process is computationally intensive and takes a significant amount of time, meaning that once a model is released, its knowledge base is inherently static. This creates what's known as a knowledge cutoff. For example, an LLM released in early 2023 would have no inherent knowledge of events, products, or company policies that emerged later that year or in 2024.

For enterprise applications, this limitation is critical. Organizations operate in dynamic environments where information changes constantly. An LLM relying solely on its pre-trained knowledge cannot answer questions like:

"What was our Q2 revenue performance for the current fiscal year?"
"What is the latest iteration of our employee expense policy?"
"Which customer accounts are currently in our new pilot program?"
"What are the technical specifications of our newly released product version 3.1?"

These are questions that demand real-time, proprietary, and often granular data. A standalone LLM, without external context, simply doesn't have access to this information, rendering it largely ineffective for internal business intelligence or operational support.

The Hallucination Risk

Perhaps even more concerning than a lack of knowledge is the phenomenon of hallucination. LLMs are sophisticated pattern-matching machines, not factual databases. They are designed to predict the most statistically probable next token based on their training data. When an LLM encounters a query about information it doesn't possess, especially if the query's structure is similar to questions it can answer, it doesn't respond with "I don't know." Instead, it confidently generates plausible-sounding but entirely fabricated information.

In an enterprise context, hallucinations are not merely an inconvenience; they pose significant risks:

Misinformation and Bad Decisions: An LLM providing incorrect financial figures, outdated compliance advice, or non-existent product features can lead to flawed business strategies, operational errors, and reputational damage.
Erosion of Trust: If users repeatedly receive inaccurate information, their trust in the AI system, and by extension, the underlying business process, will quickly diminish.
Legal and Compliance Exposure: In regulated industries, incorrect AI-generated responses could lead to severe compliance violations, legal liabilities, and financial penalties.
Security Risks: While less direct, a hallucinating LLM might inadvertently reveal sensitive patterns or generate seemingly innocuous but misleading data that could be exploited.

The core issue is that LLMs are trained to be generative, not necessarily truthful. They prioritize fluency and coherence over factual accuracy when lacking concrete information. This fundamental characteristic makes them unsuitable for direct deployment on proprietary tasks without a mechanism to ground their responses in verifiable, up-to-date data. This mechanism is precisely what Retrieval-Augmented Generation provides.

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is an architectural pattern designed to bridge the gap between the powerful generative capabilities of LLMs and the need for factual accuracy, recency, and domain-specificity in enterprise applications. At its heart, RAG is about providing an LLM with external, relevant, and verifiable information at the time of inference, allowing it to generate responses that are grounded in truth rather than relying solely on its pre-trained, potentially outdated, or irrelevant knowledge.

Think of RAG as giving an LLM an "open-book test." Instead of expecting the AI to answer purely from memory (its training data), we equip it with the ability to quickly look up the exact right documents or data snippets before formulating its answer. This fundamentally changes the LLM's role from a knowledge memorizer to a sophisticated knowledge synthesizer.

The Core Principle: Separate Retrieval from Generation

The genius of RAG lies in its modular approach. It separates the challenge of finding relevant information from the challenge of generating a coherent, human-like response. This separation offers several key advantages:

Factuality: By providing specific, up-to-date context, RAG significantly reduces the likelihood of hallucinations, as the LLM is instructed to base its answer only on the provided information.
Recency: New information can be added to the external knowledge base in real-time, without needing to retrain or fine-tune the LLM. This makes RAG highly agile for dynamic enterprise data.
Domain Specificity: The external knowledge base can be tailored precisely to an organization's proprietary data, enabling LLMs to become experts in niche domains where they previously had no knowledge.
Cost-Effectiveness: RAG is generally far more cost-effective than repeatedly fine-tuning LLMs for new or updated information. Fine-tuning is expensive, time-consuming, and can lead to 'catastrophic forgetting' of general knowledge. RAG simply updates the knowledge base.
Interpretability/Attribution: Because the LLM's response is grounded in retrieved documents, it's often possible to cite the sources, improving trust and auditability.

In essence, RAG transforms an LLM from a general-purpose oracle into a highly specialized, context-aware agent capable of interacting intelligently with an organization's most critical information assets. It allows enterprises to leverage the cutting-edge of generative AI without compromising on accuracy, relevance, or control over their data.

The Core RAG Architecture (The 3-Step Pipeline)

Building a robust RAG system involves a sequential, multi-component pipeline. While implementations can vary in complexity, the core architecture typically comprises three distinct, yet interconnected, stages:

Ingestion & Chunking: Preparing your enterprise data for retrieval.
Storage & Semantic Search: Efficiently storing and retrieving relevant data.
Generation (The Prompt Context): Using retrieved data to inform the LLM's response.

Let's visualize this flow: A user submits a query. This query is used to search a specialized knowledge base (often a vector database) for relevant information. The retrieved information, alongside the original query, is then sent to the LLM, which synthesizes a grounded answer. This process ensures the LLM is always operating with the most relevant and up-to-date context available.

Step 1: Ingestion & Chunking

This initial phase is critical for preparing your raw enterprise data for efficient retrieval. It involves extracting information from various sources, processing it, and transforming it into a format suitable for semantic search.

Data Sources & Preprocessing

Your enterprise data can reside in a multitude of formats and locations:

Documents: PDFs, Word documents (.docx), Markdown files, HTML pages (e.g., Confluence, SharePoint).
Databases: SQL databases, NoSQL databases (e.g., customer records, product catalogs).
Communication Platforms: Slack archives, email threads, CRM notes.
Code Repositories: Git repositories (for code documentation, internal libraries).

The first step is to extract the raw text content from these diverse sources. This often involves:

Parsing: Using libraries (e.g., PyPDF2, python-docx, BeautifulSoup) to extract text from structured and semi-structured documents.
Optical Character Recognition (OCR): For scanned PDFs or image-based documents, OCR tools are essential to convert images of text into machine-readable text.
Cleaning: Removing boilerplate text (headers, footers, navigation), irrelevant metadata, excessive whitespace, or corrupted characters.
Standardization: Converting all text to a consistent encoding (e.g., UTF-8) and potentially normalizing capitalization or punctuation.

Chunking Strategy: Breaking Down Knowledge

LLMs have a finite context window – the maximum number of tokens they can process in a single prompt. Enterprise documents can be lengthy, far exceeding these limits. Moreover, sending an entire document for every query is inefficient and often introduces noise. Therefore, the extracted text needs to be broken down into smaller, manageable units called chunks.

Effective chunking is an art and a science. Poor chunking can lead to:

Lost Context: If chunks are too small, essential information might be split across multiple chunks, making it difficult for the LLM to understand the complete picture.
Irrelevant Information: If chunks are too large, they might contain a lot of irrelevant text, diluting the signal and potentially confusing the LLM.

Common chunking strategies include:

Fixed-Size Chunking: Splitting text into chunks of a predefined character or token count (e.g., 500 characters) with a specified overlap (e.g., 50 characters). Overlap helps maintain context across chunk boundaries.
Sentence/Paragraph Chunking: Splitting text at natural linguistic breaks (sentences, paragraphs). This often results in more semantically coherent chunks than fixed-size methods.
Recursive Character Text Splitter: A common approach (found in libraries like LangChain) that attempts to split by paragraphs, then sentences, then words, until chunks fit a specified size, ensuring semantic boundaries are prioritized.
Semantic Chunking: A more advanced technique where chunks are created based on semantic similarity. Text is embedded, and then a clustering algorithm or other method identifies natural breaks where the meaning shifts significantly.

Best Practice: Experiment with different chunk sizes and overlap values. A chunk size of 200-1000 tokens with 10-20% overlap is a common starting point, but the optimal values depend heavily on your specific data and use case.

Embedding Generation: The Language of Similarity

Once your data is chunked, the next crucial step is to transform each text chunk into a numerical representation called an embedding.

What are Embeddings? Embeddings are high-dimensional vectors (lists of numbers, e.g., 1536 dimensions for models like OpenAI's text-embedding-3-small or open-source alternatives) that capture the semantic meaning of text. Texts with similar meanings will have vectors that are numerically 'close' to each other in this high-dimensional space.
How they are Generated: An embedding model (e.g., OpenAI's text-embedding-3-small, various Sentence Transformers models from Hugging Face, Cohere Embed) takes a piece of text as input and outputs its corresponding vector.
Importance: Embeddings are the backbone of semantic search. They allow us to move beyond keyword matching and find information based on conceptual similarity. For instance, a query about "remote work policy" could retrieve documents mentioning "telecommuting guidelines" because their embeddings are semantically close.

Each chunk of text from your enterprise data is processed by an embedding model, and its resulting vector is stored. This collection of vectors, along with references to their original text chunks, forms the core of your searchable knowledge base.

Step 2: Storage & Semantic Search (The Vector DB)

With your enterprise data processed into chunks and vectorized, the next step is to store these embeddings efficiently and enable rapid, accurate semantic search. This is the domain of the Vector Database.

The Role of a Vector Database

A vector database is purpose-built for storing, indexing, and querying high-dimensional vectors. Unlike traditional relational databases that excel at structured queries (e.g., SELECT * FROM users WHERE age > 30), vector databases specialize in 'similarity search' – finding vectors that are numerically closest to a given query vector.

How Semantic Search Works

When a user submits a query (e.g., "How do I request time off?"):

Query Embedding: The user's query is first sent to the same embedding model that was used to embed your enterprise data chunks. This transforms the natural language query into a query vector.
Vector Similarity Search: The query vector is then sent to the vector database. The database's indexing algorithms (e.g., Hierarchical Navigable Small Worlds (HNSW), Inverted File Index (IVF), Locality-Sensitive Hashing (LSH)) efficiently compare the query vector to all stored document chunk vectors.
Distance Metrics: This comparison typically uses distance metrics like:
- Cosine Similarity: Measures the cosine of the angle between two vectors. A value of 1 indicates identical direction (perfect similarity), 0 indicates orthogonality (no similarity), and -1 indicates opposite direction.
- Euclidean Distance: Measures the straight-line distance between two points in Euclidean space. Smaller distance implies greater similarity. The vector database returns the 'top-K' most similar document chunk vectors, where 'K' is a configurable parameter (e.g., retrieve the 5 most relevant chunks).
Retrieval of Original Text: Along with the similar vectors, the vector database also retrieves the original text content of the corresponding chunks.

Popular Vector Database Options

The choice of vector database depends on factors like scale, latency requirements, deployment model (managed vs. self-hosted), and ecosystem integration:

Managed Services:
- Pinecone: A cloud-native, fully managed vector database known for its scalability and ease of use.
- Weaviate: An open-source, cloud-native vector database that also offers a managed service, supporting GraphQL and semantic search.
- Qdrant: Another open-source vector search engine, available as self-hosted or managed, known for its speed and advanced filtering capabilities.
Self-Hosted/Open Source:
- Milvus: A widely adopted open-source vector database designed for massive-scale vector similarity search.
- Chroma: A lightweight, easy-to-use open-source embedding database, great for local development and smaller-scale applications.
- pgvector: An extension for PostgreSQL that enables efficient vector similarity search directly within a relational database. Excellent for scenarios where you want to keep your vector data alongside your existing structured data.

Advanced Retrieval Strategies

Simple top-K retrieval is a good start, but for complex enterprise data, more sophisticated strategies can enhance relevance:

Re-ranking: After an initial retrieval of, say, 20 chunks, a smaller, more powerful re-ranking model (often a cross-encoder or a specialized LLM) can evaluate the relevance of these chunks more deeply against the query and re-order them, selecting the absolute best 'K' for the LLM.
Hybrid Search: Combining semantic (vector) search with traditional keyword-based search (e.g., BM25) can provide a more robust retrieval system. Keyword search excels at finding exact matches or rare terms, while semantic search handles conceptual understanding.
Multi-query Retrieval: Generating multiple slightly different queries from the original user query (e.g., using an LLM) and running parallel searches to broaden the retrieval scope.
Contextual Compression: Filtering or summarizing retrieved documents to only include the most relevant sentences or paragraphs, reducing noise and optimizing token usage for the LLM.

Step 3: Generation (The Prompt Context)

This is the final stage where the LLM synthesizes an answer, critically informed by the context retrieved from your vector database.

Constructing the Augmented Prompt

The core idea here is to inject the retrieved document chunks directly into the LLM's prompt. This creates an 'augmented prompt' that provides the LLM with all the necessary information to answer the user's question accurately and without hallucination.

A typical augmented prompt structure looks like this:

# Placeholder for a simplified LangChain-like RAG snippet

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.documents import Document

# Initialize the LLM (using a sample configuration)
# from langchain_openai import ChatOpenAI
# llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)

# A simple retriever mock for demonstration. In a real RAG system, this would
# embed the question, query a vector DB, and return Document objects.
class MockRetriever:
    def get_relevant_documents(self, query: str) -> list[Document]:
        # In a real scenario, this would query the vector DB
        if "remote work expenses" in query.lower():
            return [
                Document(page_content="The company's remote work expense policy allows reimbursement for internet and utilities up to $50/month."),
                Document(page_content="Employees must submit expense reports by the 15th of the following month for remote work related costs."),
            ]
        return [Document(page_content="No specific information found on that topic in the internal knowledge base.")]

mock_retriever = MockRetriever()

# 1. Define the prompt template
# This template instructs the LLM on its role and how to use the provided context.
template = """You are an expert assistant for a large enterprise.
Answer the user's question based *only* on the provided context.
If the answer cannot be found in the context, politely state that you do not have enough information.

Context:
{context}

Question:
{question}
"""
prompt = ChatPromptTemplate.from_template(template)

# 2. Format retrieved documents into a single context string
# This is crucial: the retriever returns Document objects, but the prompt expects a formatted string.
def format_docs(docs: list[Document]) -> str:
    """Serialize retrieved documents into a single context string."""
    return "\n\n".join(doc.page_content for doc in docs)

# 3. Define the RAG chain (using LangChain's Runnable interface for clarity)
# The 'context' key is populated by the retriever and formatted into a string, 
# and 'question' by the user's input.
rag_chain = (
    {
        "context": lambda x: format_docs(mock_retriever.get_relevant_documents(x["question"])), 
        "question": RunnablePassthrough()
    }
    | prompt
    | llm  # Your initialized LLM instance goes here (e.g., ChatOpenAI model above)
    | StrOutputParser()
)

# 4. Invoke the chain with a user query
# from langchain_openai import ChatOpenAI # Example LLM initialization
# llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
# response = rag_chain.invoke({"question": "What is the policy for remote work expenses?"})
# print(response)
# This would print: "The company's remote work expense policy allows reimbursement for internet and utilities up to $50/month. Employees must submit expense reports by the 15th of the following month for remote work related costs."

Key elements of the prompt template:

System Message/Role: Sets the persona and instructions for the LLM (e.g., "You are an expert assistant...").
Context Placeholder ({context}): This is where the retrieved document chunks are inserted. It's crucial to clearly delineate the context from the actual question.
Instruction for Context Usage: Explicitly telling the LLM to only use the provided context and to state if the answer is not found is vital to prevent hallucination.
Question Placeholder ({question}): The user's original query.

LLM Interaction and Synthesis

Once the augmented prompt is constructed, it is sent to the chosen LLM (e.g., GPT-4 Turbo, Claude 3.5 Sonnet, or open-source alternatives like Llama 3). The LLM then processes this entire prompt, using the provided context to formulate a relevant and accurate answer. Because the context is explicitly given, the LLM acts more like a sophisticated summarizer and question-answering system over the provided text, rather than generating from its internal, general knowledge.

This final step ensures that the LLM's response is:

Grounded: Directly supported by the retrieved enterprise data.
Relevant: Addresses the user's specific query.
Accurate: Minimizes hallucination by constraining the LLM's generation to the facts presented in the context.

By following this three-step pipeline, enterprises can transform generic LLMs into powerful, domain-specific AI assistants that deliver reliable and actionable intelligence from their most valuable data assets.

Common Pitfalls in RAG Engineering

While RAG offers a powerful solution, its effective implementation requires careful consideration and engineering rigor. Several common pitfalls can undermine the performance and reliability of a RAG system if not addressed proactively.

1. Suboptimal Chunking Strategies

As discussed, chunking is foundational, and mistakes here cascade through the entire pipeline:

Chunks that are too small: If chunks are excessively granular (e.g., single sentences), they might lack sufficient context to be meaningful on their own. The semantic meaning required to answer a complex question could be fragmented across multiple disparate chunks, making retrieval difficult or incomplete.
Chunks that are too large: Conversely, chunks that are too long introduce noise. They might contain a lot of irrelevant information alongside the relevant bits, diluting the signal for the embedding model and increasing the chances of retrieving less precise context. Large chunks also consume more tokens in the LLM's context window, increasing inference cost and potentially hitting context limits prematurely.
Poor Overlap: Insufficient overlap between sequential chunks can lead to critical information being split precisely at the boundary, making it hard for retrieval to capture the complete idea.

Mitigation: Experimentation is key. Develop an evaluation pipeline to test different chunk sizes, overlap strategies, and chunking methods (e.g., fixed-size vs. recursive vs. semantic) against a diverse set of representative queries. Consider specialized chunking based on document structure (e.g., splitting by headings, sections in a PDF). For highly structured data, consider 'parent-child' or 'summary' chunking where smaller chunks are linked to larger, more contextual parent chunks or summaries for different retrieval stages.

2. Irrelevant or Insufficient Retrieval

Even with good chunking, the retriever component can fail to provide the LLM with the optimal context:

Poor Embedding Model Choice: Not all embedding models are created equal, and some perform better on specific domains or languages. Using a generic embedding model for highly specialized enterprise terminology might lead to embeddings that don't accurately capture semantic similarity, resulting in irrelevant retrievals.
Noisy or Low-Quality Data in Vector DB: If your ingested data contains outdated, contradictory, or simply poorly written information, the vector database will retrieve it, and the LLM will struggle to synthesize a coherent, accurate answer. 'Garbage in, garbage out' applies acutely here.
Suboptimal k Value: Retrieving too few chunks (k is too low) might mean missing critical pieces of information. Retrieving too many chunks (k is too high) introduces irrelevant information into the LLM's context, potentially confusing it or causing it to misinterpret the core question.

Mitigation:

Embedding Model Evaluation: Test different embedding models for your specific domain. Consider fine-tuning an open-source embedding model on your proprietary data if off-the-shelf options underperform.
Data Quality Management: Implement robust data cleansing, deduplication, and versioning strategies for your source documents. Only ingest high-quality, current, and relevant data into your RAG knowledge base.
Advanced Retrieval Techniques: Employ re-ranking models to refine the initial top-K results. Utilize hybrid search (keyword + vector) to capture both exact matches and semantic similarity. Explore multi-query strategies to generate a more comprehensive set of retrieved documents.

3. Latency Issues

RAG introduces additional steps in the query processing pipeline, which can impact response times:

Slow Query Embedding: Converting the user's query into a vector can take time, especially if the embedding model is large or running on under-provisioned hardware.
Slow Vector Database Lookups: As the size of your vector database grows (millions or billions of vectors), similarity search can become a bottleneck if indexing is inefficient or the database is not properly scaled.
LLM Inference Latency: Even with optimized context, the LLM's generation step can be slow, especially for larger, more capable models (e.g., GPT-4) or for very long responses.

Mitigation:

Optimize Embedding Models: Choose embedding models that balance performance and accuracy. For query embedding, consider smaller, faster models if acceptable. Implement caching for frequently asked questions.
Vector DB Optimization: Ensure your vector database is correctly indexed (e.g., using HNSW or IVF) and adequately resourced. Explore cloud-native managed vector databases that handle scalability automatically. Consider sharding your vector index for very large datasets.
LLM Choice and Optimization: Select an LLM that meets your latency and quality requirements. For internal applications where cost and speed are paramount, smaller open-source models might be preferable to larger, more expensive cloud models. Implement streaming responses from the LLM where possible to improve perceived latency.

4. Prompt Engineering Failures

Even with perfect retrieval, a poorly constructed prompt can lead to suboptimal LLM responses:

Vague or Ambiguous Instructions: If the prompt doesn't clearly define the LLM's role, desired output format, or constraints, the LLM might deviate from expectations.
Failure to Constrain to Context: Forgetting to explicitly instruct the LLM to only use the provided context (e.g., "Answer only from the context provided. If the answer is not in the context, state that you don't know.") is a common mistake that reintroduces hallucination risk.
Context Window Overflow: If the combined length of the prompt, retrieved chunks, and the expected response exceeds the LLM's maximum context window, the model will truncate the input, leading to incomplete or erroneous answers.

Mitigation:

Clear and Concise System Prompts: Define the LLM's persona and task unambiguously. Use clear delimiters for context and questions.
Explicit Guardrails: Always include instructions to strictly adhere to the provided context and to admit when information is not available.
Dynamic Context Management: Implement logic to truncate or summarize retrieved chunks if their combined length approaches the LLM's context window limit. Prioritize the most relevant chunks in such scenarios. Evaluate the impact of different context lengths on LLM performance.
Few-Shot Examples: For specific response formats or nuanced tasks, providing one or two examples within the prompt can guide the LLM more effectively.

Addressing these common pitfalls requires a holistic approach, combining careful data engineering, robust infrastructure, and iterative prompt design. Continuous monitoring and evaluation are essential to ensure your RAG system consistently delivers accurate and performant results.

Conclusion & Next Steps

The journey from generic LLMs to powerful, domain-specific AI applications for enterprise data is fundamentally paved by Retrieval-Augmented Generation. RAG architecture is not merely an enhancement; it is a transformative paradigm that addresses the core limitations of pre-trained LLMs – their knowledge cutoff and propensity for hallucination – making them truly viable for critical business functions.

By systematically ingesting and chunking proprietary data, transforming it into semantically rich embeddings, storing it in high-performance vector databases, and then intelligently augmenting LLM prompts with retrieved context, enterprises can unlock unprecedented capabilities. RAG offers a cost-effective, agile, and scalable alternative to expensive model fine-tuning, allowing organizations to keep their AI systems current with rapidly evolving internal knowledge.

This article has provided a comprehensive technical blueprint, detailing the motivations, core components, and common challenges in engineering a robust RAG pipeline. The principles outlined here – from meticulous data preparation and strategic chunking to efficient vector search and precise prompt engineering – are the bedrock of successful RAG implementations.

Ready to Build Your First RAG Application?

Explore Frameworks: Dive into open-source frameworks like LangChain and LlamaIndex. These libraries provide high-level abstractions for building RAG pipelines, simplifying integration with various LLMs, embedding models, and vector databases.
Experiment with Vector Databases: Set up a local instance of Chroma or pgvector to get hands-on experience, or explore managed services like Pinecone for scalability.
Start Small, Iterate Fast: Begin with a small, manageable dataset from your enterprise. Focus on getting a basic RAG pipeline operational, then iteratively refine your chunking, retrieval, and prompt strategies based on real-world queries and evaluation metrics.
Continuous Learning: The RAG landscape is evolving rapidly. Stay updated with the latest research in retrieval techniques, embedding models, and multi-modal RAG. Consider exploring advanced topics like agentic RAG, where LLMs can dynamically decide when and how to retrieve information.

RAG empowers you to transform LLMs from generalists into trusted, domain-expert collaborators, enabling your enterprise to harness the full potential of generative AI with confidence and accuracy. The future of enterprise AI is augmented, and RAG is your blueprint to building it.

Feedback & Community

We believe in transparent, community-driven content creation. This article was generated using the Ozigi Dashboard – our advanced longform content generation platform – and has been thoroughly reviewed and refined by our engineering team.

Have feedback on this article? We'd love to hear your thoughts:

Leave a comment below or email us at hello@ozigi.app
Share your RAG architecture experiences and learnings with our community

Interested in building your own enterprise AI content? Longform article generation is available to users on the Organization tier, limited to 5 articles per day. Check our pricing details to learn more about what Ozigi can do for your content strategy.

Building a Robust Webhook Handler in Node.js: Validation, Queuing, and Retry Logic

Dumebi Okolo — Tue, 07 Apr 2026 11:50:28 +0000

Webhooks are everywhere. Stripe fires one when a payment succeeds. GitHub fires one when a PR is merged. Twilio fires one when an SMS lands. And when your handler is flaky — when it misses events, fails silently, or chokes under load — you lose data and trust.

Most tutorials show you how to receive a webhook. Few show you how to handle it properly. This article covers the full picture: signature validation, idempotency, async queuing, and retry logic with exponential backoff.

We'll use Node.js and Express throughout, with no external queue infrastructure required. One important caveat up front: the queuing approach in this article is designed for a single, long-lived Node.js process. If you're running on serverless functions (Lambda, Cloud Run) or horizontally scaled deployments with multiple instances, in-memory queues are not reliable — skip ahead to the When to Upgrade section for the right tool in those cases.

TL;DR Summary

Concern	Solution
Fake webhook senders	HMAC-SHA256 signature verification with `timingSafeEqual`
Slow handlers timing out	Acknowledge `200` immediately, process async
Cascading failures	In-process queue with concurrency limit
Transient errors	Exponential backoff with jitter
Duplicate events	Idempotency keys via Set or Redis

What We're Building

A webhook handler that:

Validates the request signature (so only legitimate senders get through)
Acknowledges fast (returns 200 immediately, does the work async)
Queues events in-process so the work doesn't block the HTTP layer
Retries failures with exponential backoff
Handles duplicates with idempotency keys

Step 1: Signature Validation

Never trust an incoming webhook without verifying it came from who you think it came from. Most webhook providers (Stripe, GitHub, Shopify) sign their payloads using HMAC-SHA256 with a shared secret.

const crypto = require('crypto');

function verifySignature(payload, signature, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload, 'utf8')
    .digest('hex');

  // Use timingSafeEqual to prevent timing attacks
  const expectedBuffer = Buffer.from(`sha256=${expected}`, 'utf8');
  const signatureBuffer = Buffer.from(signature, 'utf8');

  if (expectedBuffer.length !== signatureBuffer.length) return false;

  return crypto.timingSafeEqual(expectedBuffer, signatureBuffer);
}

Why timingSafeEqual? A simple === check leaks timing information — an attacker can brute-force signatures by measuring how long the comparison takes. timingSafeEqual always takes the same amount of time regardless of where the strings differ.

Now wire it into Express. A critical detail: you need the raw body for HMAC validation, not the parsed JSON. Express's json() middleware strips the raw body by default — use express.raw() on the webhook route instead.

const express = require('express');
const app = express();

// Store raw body before parsing
app.use('/webhook', express.raw({ type: 'application/json' }));

app.post('/webhook', (req, res) => {
  const signature = req.headers['x-hub-signature-256']; // GitHub format
  const rawBody = req.body; // Buffer, because of express.raw()

  if (!verifySignature(rawBody, signature, process.env.WEBHOOK_SECRET)) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  const event = JSON.parse(rawBody);

  // Acknowledge immediately — do the work async
  res.status(200).send('OK');

  queue.enqueue(event);
});

The key discipline here: acknowledge before you process. If your business logic takes 2 seconds and the sender has a 1-second timeout, you'll get duplicate events.

Step 2: An In-Process Job Queue

You don't always need Redis or BullMQ for a job queue. For a single, persistent Node.js process, an in-process queue with controlled concurrency is enough — and it's simpler to reason about.

⚠️ Limitations to understand before using this pattern:

Jobs are lost on restart. If your process crashes or is redeployed while events are queued, those jobs disappear silently. There is no persistence.

Not shared across instances. If you run multiple server instances (behind a load balancer, in a cluster, or in any horizontally scaled setup), each instance has its own queue. Events are not distributed or deduplicated across them.

If either of those constraints is a problem for your use case, go straight to a real queue like BullMQ or AWS SQS.

class WebhookQueue {
  constructor({ concurrency = 3, maxRetries = 5 } = {}) {
    this.queue = [];
    this.running = 0;
    this.concurrency = concurrency;
    this.maxRetries = maxRetries;
  }

  enqueue(event) {
    this.queue.push({ event, attempts: 0 });
    this.drain();
  }

  drain() {
    while (this.running < this.concurrency && this.queue.length > 0) {
      const job = this.queue.shift();
      this.running++;
      this.process(job).finally(() => {
        this.running--;
        this.drain(); // pick up the next job
      });
    }
  }

  async process(job) {
    try {
      await handleEvent(job.event);
    } catch (err) {
      job.attempts++;
      if (job.attempts < this.maxRetries) {
        const delay = this.backoff(job.attempts);
        console.warn(`Retrying event ${job.event.id} in ${delay}ms (attempt ${job.attempts})`);
        setTimeout(() => {
          this.queue.push(job);
          this.drain();
        }, delay);
      } else {
        console.error(`Event ${job.event.id} failed after ${this.maxRetries} attempts`, err);
        // Send to dead-letter store, alert, etc.
      }
    }
  }

  backoff(attempt) {
    // Exponential backoff with jitter
    const base = Math.min(1000 * 2 ** attempt, 30000);
    const jitter = Math.random() * 1000;
    return base + jitter;
  }
}

const queue = new WebhookQueue({ concurrency: 3, maxRetries: 5 });

The backoff method uses exponential backoff with jitter. Without jitter, all retrying jobs fire at the same moment and create a thundering herd. Adding a random jitter spreads the load. See AWS's writeup on backoff and jitter for a deeper look at why this matters at scale.

Step 3: The Event Handler

This is where your actual business logic lives. Keep it focused — one function per event type.

async function handleEvent(event) {
  switch (event.type) {
    case 'payment.succeeded':
      await handlePaymentSucceeded(event.data);
      break;
    case 'user.created':
      await handleUserCreated(event.data);
      break;
    default:
      console.log(`Unhandled event type: ${event.type}`);
  }
}

async function handlePaymentSucceeded(data) {
  // e.g., upgrade account, send receipt, update DB
  await db.orders.update({ id: data.orderId, status: 'paid' });
  await emailService.sendReceipt(data.customerEmail, data.amount);
}

Step 4: Idempotency

Webhook senders will send duplicates. Network timeouts, retries on their end, and at-least-once delivery guarantees mean you'll see the same event ID more than once.

Your handler needs to be idempotent — processing the same event twice should have the same effect as processing it once.

const processedEvents = new Set(); // Use Redis in production

async function handleEvent(event) {
  if (processedEvents.has(event.id)) {
    console.log(`Skipping duplicate event: ${event.id}`);
    return;
  }

  processedEvents.add(event.id);

  switch (event.type) {
    // ... your handlers
  }
}

In production, replace the in-memory Set with a Redis SET NX EX call via ioredis so idempotency survives process restarts:

const redis = require('ioredis');
const client = new redis();

async function isAlreadyProcessed(eventId) {
  // SET key value NX EX seconds
  // NX = only set if not exists; EX = expire after 24h
  const result = await client.set(`event:${eventId}`, '1', 'NX', 'EX', 86400);
  return result === null; // null means the key already existed
}

async function handleEvent(event) {
  if (await isAlreadyProcessed(event.id)) {
    return;
  }
  // process...
}

Step 5: Putting It All Together

const express = require('express');
const crypto = require('crypto');

const app = express();
app.use('/webhook', express.raw({ type: 'application/json' }));

// --- Signature verification ---
function verifySignature(payload, signature, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');
  const expectedBuffer = Buffer.from(`sha256=${expected}`);
  const sigBuffer = Buffer.from(signature);
  if (expectedBuffer.length !== sigBuffer.length) return false;
  return crypto.timingSafeEqual(expectedBuffer, sigBuffer);
}

// --- Queue ---
class WebhookQueue {
  constructor({ concurrency = 3, maxRetries = 5 } = {}) {
    this.queue = [];
    this.running = 0;
    this.concurrency = concurrency;
    this.maxRetries = maxRetries;
  }
  enqueue(event) { this.queue.push({ event, attempts: 0 }); this.drain(); }
  drain() {
    while (this.running < this.concurrency && this.queue.length > 0) {
      const job = this.queue.shift();
      this.running++;
      this.process(job).finally(() => { this.running--; this.drain(); });
    }
  }
  async process(job) {
    try {
      await handleEvent(job.event);
    } catch (err) {
      job.attempts++;
      if (job.attempts < this.maxRetries) {
        const delay = Math.min(1000 * 2 ** job.attempts, 30000) + Math.random() * 1000;
        setTimeout(() => { this.queue.push(job); this.drain(); }, delay);
      } else {
        console.error(`Dead letter: ${job.event.id}`, err);
      }
    }
  }
}

const queue = new WebhookQueue();

// --- Idempotency ---
const processed = new Set();

// --- Handler ---
async function handleEvent(event) {
  if (processed.has(event.id)) return;
  processed.add(event.id);
  console.log(`Processing event: ${event.type} (${event.id})`);
  // your business logic here
}

// --- Route ---
app.post('/webhook', (req, res) => {
  const sig = req.headers['x-hub-signature-256'];
  if (!verifySignature(req.body, sig, process.env.WEBHOOK_SECRET)) {
    return res.status(401).send('Unauthorized');
  }
  res.status(200).send('OK'); // acknowledge immediately
  queue.enqueue(JSON.parse(req.body));
});

app.listen(3000, () => console.log('Webhook server listening on :3000'));

When to Upgrade to a Real Queue

The in-process queue above is acceptable for a single persistent process with moderate throughput — think a low-traffic internal tool or a side project where restarts are rare and you run one instance. You'll want to graduate to BullMQ (Redis-backed) or AWS SQS when:

You're running multiple server instances (in-process state won't be shared)
You need event history and visibility into failed jobs
Your event volume exceeds a few hundred per minute consistently
You need scheduled retries that survive process restarts

The good news: the handler logic above (handleEvent, idempotency, backoff) carries over directly. You're just swapping the queue substrate.

Webhooks are one of those things that look simple until they aren't. Getting these five concerns right means you can receive events reliably at scale — without losing data, without duplicating side effects, and without taking down your server under a burst of retries.

If you're building something that relies on real-time event delivery, these patterns are worth getting right from the start.

What's your webhook setup look like? Drop a comment — especially if you've found a gotcha I haven't covered.

Your Social Media Content Marketing is Failing. Here's Why

Dumebi Okolo — Wed, 01 Apr 2026 11:30:00 +0000

I will intro this article with my experience, but retold.

You've spent six weeks building something real. You merged the final PR at 11pm on a Thursday. You pushed to production. You watched the deployment logs scroll clean. And then you did what every builder does: you opened Twitter, typed something like "Just shipped [thing]. Super excited to share this with everyone 🚀", hit post, and went to bed.

You woke up to four likes. Two of them were your teammates.

The product was solid. The problem it solved was real. But the post? The post was invisible.

Here's the thing nobody tells you when you're deep in the build:

shipping is only half the work.

The other half is making people care. And most technical founders, developers, and DevRel professionals are running that half on empty.

The Gap Between Building and Being Seen

There's a particular kind of frustration that lives in technical communities. This is the frustration of people who are genuinely doing interesting things and can't seem to get traction on any of it.

It's not imposter syndrome. It's just a distribution problem.

The builders who get seen aren't always the ones building better things, sadly. They're just the ones better at translating what they build into content that lands. Content that makes someone stop mid-scroll and think "wait, this is exactly my problem," or "this is a painpoint I have."

That translation layer is what most technical people skip, rush, or outsource badly.

A 2024 State of DevRel report found that content creation consistently ranks as one of the top three time drains for developer advocates. This is not because they don't know what to write, but because the gap between "having something worth saying" and "saying it in a way that resonates" is a lot wider than most people expect.

For founders, it's worse. You're building, selling, hiring, and doing customer calls, and somewhere in that schedule, you're supposed to be producing thought leadership content that grows your personal brand and drives top-of-funnel awareness. It rarely happens at the level it should.

Why Your Regular AI Doesn't Work

The obvious answer is AI. You paste your notes into ChatGPT, ask it to write a LinkedIn post, and get something back that technically covers the topic. You post it but nothing happens-- no traction.

It wasn't that the output was wrong. It was just generic. And generic content in technical communities doesn't just underperform, it actually actively damages credibility.

Developers, content folks and DevRel professionals are some of the most discerning readers on the internet. They can spot templated, buzzword-heavy content in seconds. The moment a post opens with "In today's fast-paced digital landscape" or promises to "delve into the nuances" of anything, it's already dead on arrival.

The problem isn't that AI tools can't write. It's just that most of them default to the statistical mean of their training data, which is saturated with corporate documentation, SEO copy, and marketing fluff. The output sounds like everybody. It sounds like nobody in particular.

What is needed isn't just generated content. You need generated content that sounds like you. That is, content written with your specific technical depth, your actual voice, your real opinion.

Tools like Ozigi approach this differently. Instead of asking the AI to "write professionally" (a soft suggestion it ignores), Ozigi enforces a hard blocklist of AI-default vocabulary at the API level (words like delve, robust, seamlessly, tapestry ) forcing the model to construct sentences from your actual content rather than padding with filler. The output reads less like a press release and more like a Slack message from someone who actually built the thing. You can read exactly how that system works in the Banned Lexicon deep dive.

But the tool is only part of the answer. The bigger problem is structural.

The Real Reason Your Content Isn't Working

Most builders (like me, before) treat content like a release: something that happens once, at the end, when the thing is done.

That mental model is the root cause of most distribution failure.

Content that builds an audience doesn't work like product launches. It works like compounding interest. A single post doesn't build a following. A consistent body of work does. A consistent posting habit that over time signals to your audience that you're a reliable source of something worth reading.

The builders who seem to "go viral" on X or LinkedIn aren't getting lucky. They've usually been shipping content consistently for long enough that when one post breaks through, there's a body of work behind it that converts interest into followers, followers into readers, and readers into users.

So the real question isn't "how do I write a better launch post?"

It's "how do I build a content system I can actually sustain?"

What a Sustainable Technical Content System Looks Like

Here's the framework. It's not complicated, but it requires treating content like an engineering problem — which, if you're reading this, is probably how you think best anyway.

1. Raw material is everywhere. Stop waiting for inspiration.

Every week you're producing more content-worthy material than you realize:

PRs you merged and the decisions behind them
A bug that took you three hours to track down
A meeting where a customer said something that reframed how you think about the product
A library you tried that didn't work the way the docs said it would
An architectural decision you almost made and didn't

None of this requires you to sit down and think of something to write about. It requires you to notice that what's already happening in your work is interesting to other people.

The shift is from treating content creation as a separate creative task to treating it as a documentation habit. You're already doing the work. You just need a system to capture it.

Ozigi is built around this principle. You drop in a URL, a block of raw notes, even a PDF, an audio, transcript, basically any piece of information you have at your disposal, and the engine extracts the narrative structure without you needing to summarize or clean it first. That's what the multimodal ingestion pipeline is built to do: collapse the friction between "I have something worth saying" and "I have a draft worth editing" down to seconds.

2. Platform matters more than most people think.

A LinkedIn post and an X thread about the same topic are not the same content. They're different formats, different reader expectations, different hooks, different lengths.

LinkedIn readers expect context and narrative. They'll read three paragraphs before deciding if they care. X readers decide in one sentence, often the first one. Discord announcements need to be skimmable. Newsletters can go long, but they need a reason to exist beyond "here's what I built."

Most people write one thing and paste it across platforms unchanged. The format stays the same but engagement falls because the content doesn't match where it's landing.

A proper content system produces platform-native output from the same source material. Your one insight: the rate-limiting decision, the architecture tradeoff, the customer discovery finding, etc, becomes a thread on X, a narrative on LinkedIn, a community update in Discord or Slack, and a newsletter deep-dive. Each piece formatted for the expectations of its audience, not copy-pasted from each other.

3. Your voice is the most important part of your content.

Anyone can write about Next.js caching. Anyone can explain what a webhook is. But only you can explain those things with your specific perspective, your specific context, the way you'd describe it to a colleague over lunch.

That voice — built over hundreds of posts — is what makes people follow you and not just the topic. It's what turns a reader into someone who shows up every time you post because they trust it'll be worth their time.

That voice is also what AI strips out by default. The generic output problem isn't just an aesthetics issue. Every time you publish something that sounds like it came from a template, you're forfeiting the one thing that can't be replicated: the specific way you think about something.

This is why Ozigi's System Personas go beyond setting a "tone." Instead of prompting "write professionally," you define a character: your technical depth, your sentence rhythm, the phrases you actually use, the things you'd never say. That brief gets applied to every generated content, which means every draft is already shaped like you before you touch the edit button.

4. The 10% rule: the tool gets you 90, you own the rest.

The honest truth about AI-assisted content is that any decent engine can get you to 90% you need to get started. The last 10% is yours, and it's the part that actually matters.

That 90% is structure, platform formatting, tone calibration, cutting the filler. Generative AI can handle that by default.

The 10% is "the specific number from your metrics dashboard" or that inside joke the AI doesn't know about, the anecdote from your last customer call, or the offhand observation that only makes sense if you know your history with this problem. The exact phrasing you'd use if you were explaining this to a friend at 11pm.

That 10% is what makes content trustworthy. It's what makes someone share it instead of just scrolling past it. And it's irreplaceable because it comes from actually having done the thing.

The mistake most people make with AI writing tools is expecting the full 100%. When the output is 90% of the way there, they feel cheated.

The better mental model to have is:

you're not outsourcing the writing. You're outsourcing the blank page.

Ozigi's editing layer is built around exactly this split. Every campaign lands in a staging area — nothing goes live until you've reviewed it. The human-in-the-loop architecture keeps generation and publishing strictly separate, so you're always the last step before your content reaches your audience.

The Compounding Effect Nobody Talks About

Here's what happens when you run a consistent content system for six months:

Your posts start referencing each other. Your audience starts anticipating what you'll say next. When you ship something new, you have enough readers that the launch post gets signal on day one, which means it gets distributed further, which means more people see it.

So, getting four likes on your launch post isn't a content quality problem. The problem is a lack of consistency. What it looks like is you posting into a vacuum because you hadn't been posting consistently enough to have an audience ready when it mattered.

The builders who seem to "have an audience already" when they ship something new didn't get lucky.
I know a founder on X who did a 100 day post on X challange before his product launch. He climbed to $500 in sales in the first week. He already had an audience.
He paid the consistency debt early. He posted about the messy in-progress version, the failed experiments, the decisions he made and unmade. By the time he shipped, the audience was already there.

Content marketing for technical audiences is a long game. The best time to start was six months ago. The second-best time is right now with a system that makes it sustainable enough to actually keep going.

Start Small. Ship Consistently.

You don't need to produce ten pieces of content a week. You don't need a content calendar with color-coded categories and quarterly themes.

You need one piece of content per week that comes from something you actually did, written in a voice that sounds like you, distributed to the platforms where your audience actually is.

That's the whole system.

The tools exist to make it easier. The only thing without a shortcut is starting.

If you're a technical founder, developer, or DevRel professional trying to build a consistent content presence without it eating your calendar — Ozigi is worth trying. The free tier gives you 5 campaigns a month. Drop in your raw notes from last week, see what comes out, and decide from there. Get one week of Pro free when you sign up today!

→ Try Ozigi free · Read the platform docs · See the architecture deep dives · Star on GitHub

Have a content system that's actually working for you? Or a launch post that flopped spectacularly and taught you something? Drop it in the comments — genuinely curious what patterns people are seeing.

Gemini 2.5 Flash vs Claude 3.7 Sonnet: 4 Production Constraints That Made the Decision for Me

Dumebi Okolo — Tue, 10 Mar 2026 13:00:34 +0000

An evaluation of the Gemini 2.5 flash and Claude 3.7 Sonnet model for an agentic engine.

I had a simple rule when choosing an LLM for Ozigi: don't pick based on benchmark leaderboards. After my v2 launch, in recieving feedback, a user suggested I use the Claude models as they were better for content generation than Gemini. While the suggestion sounded tempting, I had to pick a model based on the four constraints my production pipeline couldn't negotiate around.

Most "Gemini vs Claude" comparisons evaluate general-purpose capabilities like coding, reasoning, and creative writing. That's useful if you're building a general-purpose product.
I wasn't.
Ozigi is a content engine. You feed it a URL, a PDF, or raw notes. It returns a structured 3-day social media campaign as a JSON payload that the frontend maps directly into UI cards.

That specificity made the evaluation easier than I expected: Two models, Four constraints. One clear winner on three of the constraints.

This is the third post in the Ozigi Changelog Series. If you want the backstory on why Ozigi exists, start with how I vibe-coded the internal tool that became it, and the v2 changelog that introduced the modular architecture this decision was built on.

Here's the full Architecture Decision Record.

The Setup: What the Pipeline Actually Does

The core API route in Ozigi does this:

Accepts a multipart/form-data payload containing a URL, raw text, and/or a file (PDF or image)
Constructs a prompt with strict editorial constraints injected at the system level
Sends everything to the LLM via the Vertex AI Node.js SDK
Returns the raw text response directly to the client

The frontend then does this:

const parsed = JSON.parse(responseText);
setCampaign(parsed.campaign);

No middleware. No schema validation. No error recovery in the happy path. Raw parse, straight into React state.

That single line is why model selection mattered.

Constraint 1: Comparing Gemini vs Claude Models for JSON Output Stability

The requirement: The model must return a valid JSON object — every time, without wrapping it in markdown code fences, without adding a conversational preamble, and without hallucinating a trailing comma that breaks JSON.parse().

The target schema looks like this:

{
  "campaign": [
    { "day": 1, "x": "...", "linkedin": "...", "discord": "..." },
    { "day": 2, "x": "...", "linkedin": "...", "discord": "..." },
    { "day": 3, "x": "...", "linkedin": "...", "discord": "..." }
  ]
}

It renders nine posts across three platforms in a span of three days, with every field required.
The UI renders each field into a separate card with edit, copy, and publish actions. A missing key doesn't throw a visible error — it silently renders an empty card.
This comparison is specifically between Gemini with responseSchema enforcement and Claude with prompted JSON, not between each model's structural output ceiling. Claude's tool use with tool_choice: {type: "tool"} enforces schema at the decoding layer and can reach equivalent reliability. The relevant constraint here was which enforcement mechanism was available and practical within my existing stack. More on that below.
I ran 500 automated test generations against both models targeting this schema, measuring the percentage of responses that JSON.parse() accepted without exceptions.

Model	Format Adherence Rate
Gemini 2.5 Flash	99.9%
Claude 3.7 Sonnet (prompted)	~88.5%

The 11.5% gap maps directly to broken UI states for real users. That was not acceptable to me for a core feature.

Using Gemini's responseSchema closes this entirely. According to Google's controlled generation documentation, the feature physically prevents the model from returning output that doesn't conform to your schema. It's not prompt-level guidance, it's enforced at the decoding layer. Here's what the production implementation looks like for Ozigi: the schema is defined once at the top of the route and attached directly to the model config:

const distributionSchema = {
  type: "OBJECT" as const,
  properties: {
    campaign: {
      type: "ARRAY" as const,
      description: "A list of 3 daily social media posts.",
      items: {
        type: "OBJECT" as const,
        properties: {
          day:      { type: "INTEGER" as const, description: "Day number (1, 2, or 3)" },
          x:        { type: "STRING"  as const, description: "Content for X/Twitter." },
          linkedin: { type: "STRING"  as const, description: "Content for LinkedIn." },
          discord:  { type: "STRING"  as const, description: "Content for Discord." },
        },
        required: ["day", "x", "linkedin", "discord"],
      },
    },
  },
  required: ["campaign"],
};

const model = vertex_ai.getGenerativeModel({
  model: "gemini-2.5-flash",
  generationConfig: {
    responseMimeType: "application/json",
    responseSchema: distributionSchema,
  },
});

response.text() is now structurally guaranteed to be valid JSON. JSON.parse() cannot fail on a missing field, trailing comma, or conversational preamble — the model is physically prevented from producing them.
Claude's tool use and function calling can achieve similar guarantees, but it requires a meaningfully different integration architecture. With the Vertex SDK, this is one config block.

Winner: Gemini.

Constraint 2: Comparing Gemini vs Claude on Latency on a Live Public Sandbox

The requirement: Ozigi has a free, unauthenticated sandbox. Anyone can generate a full 3-day campaign without signing up.

That changes the economics of model selection completely. A paying user on a premium plan will tolerate a 20-second wait if the output quality justifies it. An anonymous user who found the product via my whacky marketing efforts will not. They'll close the tab at 10 seconds and probably not come back, sadly.

I benchmarked both models against a standard 10,000-token input payload via Vercel serverless functions (my production environment):

Model	Avg Response Time
Gemini 2.5 Flash	~6.2s
Claude 3.7 Sonnet	~21.5s

Methodology: N=100 requests per model, measured end-to-end from Vercel function invocation to full response. Results are environment-dependent and intended for directional comparison, not as absolute benchmarks.

The gap holds across payload sizes. Gemini Flash consistently comes in under 10-15 seconds. Claude 3.7 Sonnet consistently exceeds 20 seconds on the same inputs, in the same environment.

This gap would narrow significantly with streaming: getting first tokens in front of the user within 2-3 seconds. Streaming changes the perceived wait time for a user entirely. This is, however, a v4 architecture item that is being worked on. For a non-streaming pipeline with a public sandbox, the 3.5x latency difference is a product decision, not just an engineering one.

Winner: Gemini Flash — and it's not close for non-streaming public sandboxes.

Constraint 3: Comparing Gemini vs Claude on Native Multimodal Ingestion

The requirement: Users can upload PDFs and images directly as context. The pipeline needs to process them without an external preprocessing step.

With Gemini via the Vertex AI Node.js SDK, the entire PDF pipeline is:

// /app/api/generate/route.ts
if (file && file.size > 0) {
  const arrayBuffer = await file.arrayBuffer();
  const base64Data = Buffer.from(arrayBuffer).toString("base64");

  parts.push({
    inlineData: {
      data: base64Data,
      mimeType: file.type, // "application/pdf", "image/jpeg", etc.
    },
  });
}

const result = await model.generateContent({
  contents: [{ role: "user", parts: parts }],
});

You can see that the SDK handles the buffer natively. Gemini reads the PDF directly as part of the multipart request alongside the text prompt — no OCR step, no preprocessing, no separate service call. Google's multimodal documentation confirms that Gemini was designed from the ground up to handle PDF and image buffers natively via inlineData.

An earlier version of this article claimed that Claude required an external OCR step for PDF ingestion. That was wrong. Claude's Messages API does support native base64 PDF ingestion directly via a document content block — no OCR preprocessing, no external service. The pattern is structurally similar to Vertex AI's inlineData, just different field names.
The real constraint here was ecosystem, not capability. I evaluated Claude 3.7 Sonnet as available in the Google Model Garden within my existing Vertex AI setup. Switching to Claude's native PDF ingestion would have meant moving to the Anthropic Messages API entirely — a different provider, different SDK, different billing. The Vertex AI path was simpler for the stack I was already running.
Winner: Gemini — for this stack. Both models support native multimodal ingestion without external OCR. The advantage here was ecosystem fit, not a fundamental capability difference.

Constraint 4: Comparing Google Gemini vs Claude on Tone Engineering

The requirement: Generated social media posts must sound like a human wrote them. Specifically, they must pass AI content detection and avoid the predictable cadence patterns that make AI-generated copy immediately identifiable.

This is the constraint where Claude wins cleanly on base performance.
Our internal blind A/B evaluations of 50 technical posts (scored on pragmatic sentence structure and absence of AI terminology) gave Claude 3.7 Sonnet a "human cadence quality score" of 9.5/10. Gemini Flash's base score was 5.5/10.

That's a significant gap. And it's for the feature that is Ozigi's core value proposition.

Why use Gemini for Tone Engineering?

Because the gap is engineerable.

We built the Banned Lexicon — a programmatic constraint injected at the system prompt level that explicitly penalizes the vocabulary patterns that make AI copy detectable. You can read the full implementation in the Ozigi documentation:

THE BANNED LEXICON: You are strictly forbidden from using the 
following words or their variations: delve, testament, tapestry, 
crucial, vital, landscape, realm, unlock, supercharge, revolutionize, 
paradigm, seamlessly, navigate, robust, cutting-edge, game-changer.

Combined with explicit cadence engineering:

BURSTINESS (CADENCE): Write with high burstiness. Do not use 
perfectly balanced, medium-length sentences. Mix extremely short, 
punchy sentences (2-4 words) with longer, detailed explanations.

PERPLEXITY: Avoid predictable adjectives. Use strong, active verbs 
and concrete nouns. Talk like a pragmatic subject matter expert 
explaining a concept to people, not a marketer selling a product.

FORMATTING RESTRAINT: You are limited to a MAXIMUM of 1 emoji per 
post. Use a maximum of 2 highly relevant hashtags per post.

With these constraints active, Gemini's human cadence score jumps from 5.5 to 9.2 — within acceptable range of Claude's base 9.5.

The key insight: Claude's tone advantage is a default advantage, not an absolute one. Gemini's outputs are more malleable under prompt constraints. For a use case where tone control is the entire product, that malleability is worth more than a higher baseline.

Winner: Gemini + engineering constraints. The tone gap is closeable. The latency and JSON stability gaps on the other constraints are not.

Gemini vs Claude Models: The Cost Reality

At this stage where Ozigi is a public sandbox, every anonymous page load that can trigger a generation is a billable API call absorbed by the product. Ozigi is at its pre-revenue stage, so this matters a lot.

Model	Input Cost (per 1M tokens)	Output Cost (per 1M tokens)
Gemini 2.5 Flash	~$0.075	~$0.30
Claude 3.7 Sonnet	~$3.00	~$15.00

Pricing sourced from Google Cloud Vertex AI pricing and Anthropic API pricing.

Pro tip:Verify current rates before production decisions — both have changed multiple times in the past year.

The input cost difference is 40x. The output cost difference is 50x. For a free-tier product with no revenue, the ability to run a public sandbox sustainably is the difference between having a conversion funnel and not having one.

Where Ozigi is Going and How it'd Change My Choice of Model, Moving Foward

This is an honest ADR. Here's what would change my answer.

When Ozigi finally moves behind a paywall, latency and cost become secondary concerns. A signed-in user on a paid plan is more likely waiting 20 seconds for premium output is a different UX calculation than an anonymous user on a free demo. In that context, Claude's base tone quality becomes much more compelling. I'd be trading economics for output baseline, and the trade might be worth it.

When streaming gets implemented, the latency argument against Claude weakens significantly. Claude 3.7 Sonnet's time-to-first-token via streaming is competitive. A user seeing the first post appear in 2-3 seconds experiences the product very differently than a user staring at a progress bar for 21 seconds. Streaming is on the roadmap.

For an in-depth look at how we tested the pipeline that informs these decisions, see how we E2E test AI agents with Playwright in Next.js.

The Decision Matrix

Constraint	Gemini 2.5 Flash	Claude 3.7 Sonnet	Winner
JSON Stability (responseSchema)	99.9% → guaranteed	~88.5% (prompted)	Gemini
Latency (non-streaming)	~6.2s	~21.5s	Gemini
Native PDF/Image ingestion	Native via Vertex SDK	Native via Messages API	Gemini (Eco-system fit)
Base tone quality	5.5/10	9.5/10	Claude
Tone quality (+ constraints)	9.2/10	9.5/10	Near tie
Cost per 1M input tokens	$0.075	$3.00	Gemini

Gemini won on five of six dimensions. Claude won on one — base tone — and that gap was closeable through prompt engineering.

Four Questions To Ask Before Choosing An LLM Model For Your Agentic Project/APP

If you're building something similar to Ozigi, these are the constraints worth looking through before you pick an API and start building:

**1. Does your UI depend on structured output? If your frontend calls JSON.parse() on a raw model response, you need API-level schema enforcement — not prompt instructions asking nicely. responseSchema via Vertex AI, Claude's tool use with forced tool_choice, or structured outputs via OpenAI all enforce at the decoding layer. The question isn't which model supports it — most do — it's which enforcement path fits your existing stack.

2. Do you have a free tier or public sandbox? If yes, latency and cost are product decisions that affect conversion, not just infrastructure decisions that affect margins.

**3. Does your use case require multimodal inputs? Most major models now support native PDF and image ingestion without external preprocessing. Map out what the integration looks like within your existing API provider before assuming you need to switch or add infrastructure

4. Where is the base model weakest, and is that gap engineerable? Claude's tone advantage is real. It's also not the only path to human-sounding copy. Engineering constraints at the prompt level can close gaps that feel insurmountable when you're just looking at base benchmarks.

The best model for your product is rarely the one with the highest aggregate score. It's the one that fails least on the constraints you actually can't work around.

The full Ozigi architecture — including the generate API route, the Banned Lexicon implementation, and the Vertex AI configuration — is open source on GitHub.
The live context engine is at ozigi.app.
The interactive version of this ADR with Chart.js visualisations of each benchmark.
Ozigi is currently looking for User Experience Testers to give honest Feedback on their experience using the product, and areas for improvement.
We have some open issues on Github that is welcome to contribution from the community. ps, this app has been entirely vibe coded so far, therefore we welcome vibe coded contributions too!
Connect With Me On LinkedIn
Send me an email on okolodumebi@gmail.com.
Building osmething cool? Talk about it in the comments!

How to End-to-end (E2E) Test AI Agents: Mocking API Responses with Playwright in Next.js

Dumebi Okolo — Fri, 06 Mar 2026 12:50:33 +0000

Building an AI agent is fun. At least, I have had so much fun building out Ozigi, a social media content manager agent (ps, we are in need of user experience testers!).

But!
Testing it in a CI/CD pipeline is a nightmare.

If you are building an application that relies on an LLM (like OpenAI, Anthropic, or Google's Vertex AI), you quickly run into these three challanges when writing End-to-End (E2E) tests:

Cost: Every time your test suite runs, you are burning API credits.
Speed: LLMs are slow. Waiting 10-15 seconds per test will grind your deployment pipeline to a halt.
Non-Determinism: LLMs never return the exact same string twice. If your Playwright test relies on expect(page.getByText('exact phrase')).toBeVisible(), your tests will randomly fail.

While building Ozigi—an agentic content engine designed to turn raw technical research into structured social campaigns—I needed a way to test the complex UI state transitions (like custom loaders and dynamic grids) without actually hitting the Vertex AI API, especially seeing as I am managing very conservatively my $300 in credits!

Playwright Network Interception

Here is how to completely decouple your frontend E2E tests from your LLM backend using Next.js and Playwright.

In Ozigi, the user flow looks like this:

The user selects a custom persona and inputs raw context (a URL or text dump).

They click "Generate Campaign."
The UI swaps to a <DynamicLoader />.

The Next.js API route (/api/generate) sends the context to Gemini 2.5 Pro.
The LLM returns a strictly formatted JSON object.
The UI renders the multi-platform campaign grid.

If I test this live, it will introduce latency and flakiness.
Instead, I intercepted the API call and instantly return a fake JSON payload.

Network Mocking (Interception with `page.route`)

Playwright allows us to hijack outbound network requests directly from the browser. When the frontend tries to call our Next.js API route, Playwright intercepts the POST request, blocks it from ever hitting the server, and fulfills it with our own static data.

Here is the exact test script I use to validate the Ozigi content engine:

import { test, expect } from '@playwright/test';

test.describe('Ozigi Context Engine & AI Mocking', () => {

  test('should generate a campaign by intercepting the LLM response', async ({ page }) => {
    // 1. Navigate to the dashboard
    await page.goto('/dashboard');

    // 2. Fill out the Context fields
    await page.getByPlaceholder('Paste a URL or raw notes').fill('https://ozigi.app/docs');
    await page.getByPlaceholder('Additional directives...').fill('Keep it technical.');

    // 🚀 THE MAGIC: Intercept the AI generation API route
    await page.route('**/api/generate', async (route) => {

      // Define the exact JSON structure your frontend expects from the LLM
      const mockedAIResponse = {
        output: JSON.stringify({
          campaign: [
            {
              day: 1,
              x: "Day 1 Thread: Ozigi is tested and working! 1/2\n\n[The content engine is officially alive.]",
              linkedin: "LinkedIn Post: Ozigi testing complete.",
              discord: "Discord Update: Systems green."
            }
          ]
        })
      };

      // Fulfill the route instantly with the mocked data
      await route.fulfill({
        status: 200,
        contentType: 'application/json',
        body: JSON.stringify(mockedAIResponse),
      });
    });

    // 3. Trigger the generation
    await page.getByRole('button', { name: /Generate Campaign/i }).click();

    // 4. Assert the UI state transitions correctly
    // Verify the loader appears while the "network" request is happening
    const loaderContainer = page.locator('.animate-in.fade-in');
    await expect(loaderContainer).toBeVisible();

    // 5. Assert the final UI renders our mocked data perfectly
    await expect(page.getByText('Ozigi is tested and working!')).toBeVisible();
    await expect(page.getByText('[The content engine is officially alive.]')).toBeVisible();
  });
});

Why You Should Mock LLM/API Responses In Playwright

By using this testing pattern, I achieved three of my engineering goals:

Zero Cost: The test suite can run 1,000 times a day on GitHub Actions without costing a single cent in Vertex AI compute.
Lightning Fast: The entire E2E test finishes in seconds, as I bypass the LLM's generation latency entirely.
Absolute Determinism: Because I injected a static JSON payload, my text assertions (toBeVisible) will never fail due to an AI hallucination or a slightly altered adjective.

When building AI wrappers or agentic workflows, your testing strategy must isolate the LLM from the UI. Let the LLM be unpredictable in production, but demand strict predictability in your test suite.

I built this network mocking (interception0 pattern into Ozigi, an agentic content engine that helps pretty much anyone turn their raw notes/ideas into structured, multi-platform campaigns without dealing with cheesy AI buzzwords. You can check it out at ozigi.app.

Let's connect on LinkedIn!
You can find my spaghetti code here..

Consider this the unofficial v3 changelog of Ozigi. As always, we are welcome to your feedback and can't wait to hear from you!

Ozigi v2 Changelog: Building a Modular Agentic Content Engine with Next.js, Supabase, and Playwright

Dumebi Okolo — Mon, 02 Mar 2026 11:37:31 +0000

When I first built Ozigi (initially WriterHelper), the goal was simple: give content professionals in my team a way to break down their articles into high-signal social media campaigns.

OziGi has now evolved to an open source SaaS product, oepn to the public to use and imnprove.

Here is the complete technical changelog of how I completely turned Ozigi from a monolithic v1 MVP into a production-ready v2 SaaS.

1. Modular Refactoring of The App.tsx (Separation of Concerns)

In v1, my entire application: auth, API calls, and UI—lived inside a long app/page.tsx file. The more changes I made, the harder it became to manage.

Modular Component Library: I stripped down the monolith and broke the UI into pure, single-responsibility React components (Header, Hero, Distillery, etc.).

Centralized Type Safety: I created a global lib/types.ts file with a strict CampaignDay interface (complete with index signatures) to finally eliminate the TypeScript "shadow type" build errors I was fighting.
State Persistence: Implemented localStorage syncing so the app "remembers" if a user is in the dashboard or the landing page, preventing frustrating resets on browser refresh.

2. Using Supabase as the Database and Tightening the Backend

A major UX flaw in v1 was that refreshing the page wiped the user's progress.

Relational Database & OAuth: I replaced anonymous access with secure GitHub OAuth via Supabase.
Automated Context History: I engineered a system that auto-saves every generated campaign to a PostgreSQL database. Users can now restore past URLs, notes, and outputs with a single click.

Identity Storage: Built a settings flow to permanently save a user's custom "Persona Voice" and Discord Webhook URLs directly to their profile.

3. Core Feature Additions

Multi-Modal Ingestion: Upgraded the input engine to accept both a live URL and raw custom text simultaneously.

Native Discord Deployment: Built a dedicated API route and UI webhook integration to push generated content directly to Discord servers with one click.

4. Update UI/UX & Professional Branding

The Rebrand: Pivoted the app's messaging to focus entirely on content professionals, positioning it as an engine to generate social media content with ease and in your own voice.
Open-First Onboarding: Designed a "Try Before You Buy" workflow. Unauthenticated users can test the AI generation seamlessly, but are gated from premium features (History, Personas, Discord) via an Upgrade Banner.

Pixel-Perfect Layouts & SEO: Eliminated rogue whitespace and z-index issues using precise CSS Flexbox rules. Upgraded app/layout.tsx with professional OpenGraph and Twitter Card metadata.

5. Quality Assurance & DevOps (Automated Playwright E2E Tests)

Automated E2E Testing: Completely rewrote the Playwright test suite (engine.spec.ts) to verify the new landing page copy, test the navigation flow, and confirm security rules apply correctly.
Linux Dependency Fixes: Patched my CI/CD pipeline by ensuring underlying Linux browser dependencies (--with-deps) are installed so headless Chromium tests pass flawlessly.

What's Next? (v3 Roadmap)

With the Context Engine now stable, the foundation is set.
My plan for V3 is to fix the deployment pipeline:

integrating the native X (Twitter)
LinkedIn APIs so users can publish directly from the Ozigi dashboard.

What has been your biggest challenge scaling a Next.js MVP? Let me know in the comments!
Try out Ozigi
And let me know if you have any feature suggestions? Let me know!
Want to see my poorly written code? Find OziGi on Github.

Connect with me on LinkedIn!

What came next:
After shipping v2, the next hard question was model selection. A reader suggested switching to Claude for better content quality. I ran the benchmarks instead of just taking the advice. The results across JSON stability, latency, multimodal ingestion, and tone were clearer than I expected: Gemini 2.5 Flash vs Claude 3.7 Sonnet: 4 Production Constraints That Made the Decision for Me

I vibe-coded an internal tool that slashed my content workflow by 4 hours

Dumebi Okolo — Fri, 27 Feb 2026 14:52:17 +0000

One of the biggest challenges I face as a content expert is repurposing my written blogs for social media. Before now, I had to ask AI for summaries or try to get them myself. I became very busy recently, and I don't have time for that anymore.
The best solution for me was building a tool that helps me generate social media content from my blog and posts on my behalf.
I was in a meeting of content professionals recently. A key point that was hammered on regarding the use of AI in content creation is the need to maintain a strict Human-in-the-Loop (HITL) workflow.
This resonated well with me.
I had initially planned to build an agent to automate and schedule social media posts. This, however, leaves out the HITL factor, so I restrategized.

Here is the technical breakdown of how I built an Agentic Content Engine using Next.js 15, Gemini 3.1 Pro, and Discord Webhooks.

Agentic Human-in-the-Loop (HITL) architecture

The Problem: The "Context Gap"
Most AI social media tools are just wrappers for generic prompts. They don't know my research, they don't know my voice, and they definitely don't know the technical nuances of my articles.
So,
I needed a tool that:

Reads my actual dev.to articles.
Strategizes a 3-day multi-platform campaign.
Displays it in a way that I can audit, edit, and then—with one click—Deploy.

Even though this app was "vibe coded" (shoutout to the AI for keeping up with my pivots 😂😂), the architecture is solid.

The core philosophy of this build is Agency over Automation. The agent doesn't just act; it reasons, structures, and then waits for human approval before posting

The AI Stack

Reasoning Engine: Gemini 3.1 Pro (Tier 1 Billing). I opted for Pro over Flash to handle complex instruction following and strict JSON schema enforcement.
Frontend: Next.js 15 (App Router) for server-side rendering and SEO efficiency.
Styling: Tailwind CSS with @tailwindcss/typography for professional markdown rendering.
Deployment: Discord Webhooks for an immediate, zero-auth execution pipeline.

Handling AI Hallucinations in Next.Js

A common failure in vibe coding, I have found, is the LLM returning "chatty" text when the UI expects structured data.
To solve this, I implemented a Strict JSON Enforcement pattern in the API route.

Gemini often wraps its JSON output in markdown code blocks. If you pass this directly to JSON.parse(), the app crashes.

To solve this, I used Sanitization Middleware.
I built a regex-based sanitization layer to strip the noise and ensure the frontend receives a clean array.

// app/api/generate/route.ts
const rawOutput = data.output; // The raw string from Gemini

// Regex to extract only the JSON content
const cleanJson = rawOutput.replace(/```
{% endraw %}
json|
{% raw %}
```/g, "").trim();

try {
  const campaignData = JSON.parse(cleanJson);
  return NextResponse.json({ campaign: campaignData.campaign });
} catch (error) {
  console.error("JSON Parsing failed:", rawOutput);
  return NextResponse.json({ error: "Failed to parse Agent strategy" }, { status: 500 });
}

UI/UX Strategy: The Kanban "Board" Approach

The v1 of the UI was so messy. The tool worked but you'd have to dig through mountains of text to even understand what was going on.
I tried formatting it into a table for some structure. Somehow, that was worse!
Finally, to optimize for a "Human-in-the-Loop" workflow, I moved to a columnar dashboard.
Social posts, especially threads on X, can be long, and that would have made even the boards to be clumsy and unkempt.
To keep the UI clean, I built a PostCard component that caps content at 250 characters with a state-managed "Read More" toggle.

const [isExpanded, setIsExpanded] = useState(false);
const displayContent = isExpanded ? content : content.substring(0, 250) + "...";

This ensures the user can audit the text without scrolling for "miles."

Photo dump: Agentic Content Flow in Action

The Starting Point Here’s the clean, minimal dashboard before the magic happens. I wanted it to feel like a professional "Command Centre," not a messy chatbot window.

The 3-Day Campaign Map Once I paste my URL, the Agent goes to work. It returns a structured 3x3 grid. I added a 250-character truncation with a "Read More" toggle because, let's face it, nobody wants a wall of text when they're trying to strategise.

The Deployment Here is the best part. I hit "Post to Discord," and boom—success. No manual copy-pasting, no switching tabs. It’s live.

What's next

This is what I have built so far. I am calling it BloggerHelper v1
My next updates are:

Integrating the X and LinkedIn feature.
Putting more work into the context tank. So far, the agent's context has been obtained from the article and some instructions in the agents_instruction.md file. I will work more on this
Putting an edit feature, where I can edit a post before it goes out.
Making it take in more context than just my blog posts

Conclusion: The Engineering of Presence

Even though this tool was designed to help me cut down on work hours, it was also to take me from just a technical writer to a content engineer/architect, where my primary goal isn't to just create content but create solutions that make for easy content flow.
Also, as I position myself as an AI influencer, I want to show myself building more with AI and evangelising its adoption.

Let's connect on LinkedIn!

What’s your take on Agentic Workflows? Are you building for full automation, or are you keeping the human in the loop?

Let’s discuss below. 👇

UPDATE!!!!

I just used my tool to get my social media caption/content for this post. See below.

You can try it out here, but mercy on my API credit!!

UPDATE 2 — March 2026:
Several people in the comments asked about forcing structured JSON output without the regex sanitisation layer. I ended up going deep on this for Ozigi v3. The answer is responseSchema via the Vertex AI SDK — it enforces structure at the decoding layer, not the prompt level. I benchmarked it alongside Claude 3.7 Sonnet across four production constraints. The full write-up, with numbers, is here: Gemini 2.5 Flash vs Claude 3.7 Sonnet: 4 Production Constraints That Made the Decision for Me

Using Perplexity AI and Gemini 3 (Pro) for Academic Research and Writing

Dumebi Okolo — Thu, 19 Feb 2026 15:07:41 +0000

I’m currently in the trenches of my Master’s thesis, focusing on 5G Anomaly Detection using TensorFlow Lite at the Edge.

I wrote a paper on EDGE-DEPLOYABLE TENSORFLOW LITE AUTOENCODER FOR REAL-TIME 5G ANOMALY DETECTION AND COST-AWARE OPTIMIZATION that you can check out.

This blog post is part of my short-form content series. Where I write straight-to-the-point blog posts of less than 1000 words

Before building my AI research workflow, I used to spend hours just "pre-reading," trying to build the literature review section of my thesis.

Not anymore!
I built my own "Research Stack" with already existing AI tools that does all the heavy lifting for me in a matter of minutes.

I don’t use just one tool. I use an AI aggregator and a AI Native Pro Model together.

Perplexity is the AI Aggregator

Many people, like me before making this discovery, think of Perplexity as just a model; it’s actually more of a "librarian."
It doesn't just rely on its own model; it uses some of the best in the industry—Claude 4, GPT-5, and Gemini 3—to scour the web and find citations.

Sonar is Perplexity's own model.

I've come to learn that Perplexity is the "king" of finding where the information is.

However, when it comes to understanding/making sense of the 20 or so PDFs I just found? That’s where the "Aggregator" model hits a wall.

The Native Pro Advantage (Gemini Advanced)

Because I have a Gemini Pro subscription, I have access to something Perplexity’s implementation can’t match: Gemini's 2-Million Token Context Window.**

While Perplexity gives me snippets and links, I can feed those entire PDFs or papers it gives me into Gemini Pro.
This way, Gemini doesn't just look up the research papers; it "lives" in them.
That is, it remembers a conflict in data on page 4 and compares it to a conclusion on page 48.

My Research Workflow

Here is exactly how I use Perplexity AI and Google Gemini to speed up my thesis research:

Phase 1-- Using Perplexity to find research papers and material:

I ask Perplexity to find the most recent 2026 papers on Federated Learning in 5G. It gives me URLs and citations.
Here's an example of my prompt:

Find the top 5 most cited research papers from late 2025 and 2026 regarding 'Anomaly Detection in 5G Core Networks using Federated Learning.' Provide the direct URLs and a 2-sentence summary of their core methodology

Phase 2-- Using Gemini Pro to go through research materials:

I download those papers and upload them to Gemini and use it for things like comparing, reasoning, or critiquing.
Here's an example prompt I've used

I have these 5 research papers [Paste links/sources]. Using your 2M token context, analyze how these papers address the 'latency vs. accuracy' trade-off in Edge computing. Then, draft a 1,000-word skeleton for my literature review that explains why AI automation is the solution to 5G network failures.

Phase 3: Direct editing in the Google Docs Workspace

Since Gemini is integrated with my Google Workspace, I edit the literature review draft directly into a Google Doc.

📊 Comparison: Perplexity AI vs Google Gemini for Research

Feature	Perplexity (The Librarian)	Gemini Pro (The Architect)
Primary Strength	Real-time search & citations.	Massive context & reasoning.
Model Source	Aggregator (Claude, GPT, Gemini).	Native (Google's best).
Context Window	Small (Snippet-based).	2M+ Tokens (Entire libraries).
Best For...	Finding "The What" & URLs.	Analyzing "The How" & Drafting.
Integration	Web-only.	Google Workspace (Docs/Gmail).

What I have learned in my AI use is that looking for the one tool that does everything would lead to failure. or inaccuracies.
I prefer a "separation of concerns" type of workflow, leading to better accuracy.
This only works, though, when you know how to build the right stack for your workflow and how to get around the stack

Are you still using a single LLM for your research, or have you started "stacking" your tools? Let's discuss in the comments!

You can find me on LinkedIn!

Top 5 Headless CMS to Build a Blog in 2026

Dumebi Okolo — Mon, 26 Jan 2026 10:00:00 +0000

After doing a lot of research as a technical writer, I have found the top 5 CMS for blogging. Whether you are trying to build a personalwriting repository or you are a full-fledged publication with more complex subscription needs, this article is for you!

When I first got into personal blogging, creating The Handy Developer's Guide, I used WordPress, choose a theme, did some customizations as best as my plan let me and that was it.
Though my experience might be different from someone else's, I didn't have the best time using WordPress. One of the problems I had using WordPress was the lack of broader customization as a premium tier subscriber. If you had ever visited my blog back then, you'd know that the frontend was a cry for help. The backend was solid though.
I am revisiting blogging again in 2026, and after a lot of research, I curated this list. This is a product of research and personal preference.

What is a Headless CMS?

A headless Content Management System (CMS) is a backend-only system where the content repository (the "body") is separated from the presentation layer/frontend (the "head").

Unlike a traditional CMS like WordPress, which dictates how content looks on a website through built-in templates, a headless CMS only stores and manages raw content. This content is then delivered to any device ( a website, mobile app, or smartwatch) via an Application Programming Interface (API).

Difference Between a Headed (Monolithic) CMS and a Headless CMS

Feature	Headed (Monolithic) CMS	Headless CMS
User Interface	Pre-built templates and themes	Build your own using React, Vue, Swift, etc.
Control	Marketers can drag-and-drop easily	Developers have full control over the codebase
Channels	Mostly limited to websites	Omnichannel delivery: web, mobile apps, IoT, digital displays
Scalability	Harder; frontend and backend scale together	Easier; frontend and backend scale independently
Security	Larger attack surface due to direct database exposure	Smaller attack surface with API-only access

What Makes a Headless CMS Great for Blogging in 2026?

Flexible Content Modeling: You define your own post types, relationships, and reusable blocks. Schema evolves with your needs.
API Power: REST, GraphQL, or something more? It to fetch exactly the data you need, without over-fetching or under-fetching, REST, GraphQL, GROQ and so on.
TypeScript and SDK Quality: SDKs with type safety, codegen, and auto-completion.
Editor Experience: Intuitive UI for non-developers. Preview changes, collaborate in real time, and avoid constraints.
Performance and Caching: The CMS play nicely with CDNs, static site generators, and incremental static regeneration, with fast content propagation.
Authentication and Access Control: Lock down drafts, manage roles, and integrate with SSO or OAuth.
Migration and Portability: Import/export data effortlessly.

With that in mind, let’s meet my top five.

1. Sanity: The Developer-First Content Operating System

I am currently using Sanity to build out my blog this 2026.
Sanity is the CMS for teams who treat content as data, not just text. It’s opinionated. If you want to model your content in code, build custom editorial interfaces, and automate workflows with AI, Sanity positions itself as a content operating system, not just a CMS.

For marketers, Sanity offers a customizable Studio UI, real-time collaboration, and visual editing with live previews. For developers, it’s a schema-as-code playground with TypeScript support, and fast APIs. It has a plugin system that lets you build exactly what your team needs.

ps, this isnt my dashboard. mine is not nearly as full, but I needed something to show the full effect.

Features of Sanity CMS

Schema-as-Code and Content Lake

Sanity’s core innovation is its “Content Lake”: a real-time, globally distributed datastore where content is stored as structured JSON documents. You define your content schema in JavaScript or TypeScript, version it in Git, and deploy changes like any other codebase. This means you can evolve your content model without downtime or risky migrations.

Query Language: GROQ (and GraphQL)

Sanity’s primary query language is GROQ (Graph-Relational Object Queries), a powerful, declarative language designed for content trees and references. GROQ lets you fetch exactly the shape of data you need, with projections, filters, and joins, all in a single request. For teams who prefer GraphQL, Sanity can auto-generate a GraphQL API from your schema, though some advanced features (like inline objects) may require tweaks.

API Performance and Caching

Sanity’s APIs are fast—think 32ms response times and 500+ concurrent queries per project. The Live CDN ensures content updates propagate globally within 60 seconds, and when paired with frameworks like Next.js or Gatsby, you get near-instant cache invalidation and incremental static regeneration (ISR) for static-speed performance with real-time freshness.

TypeScript and SDK Quality

Sanity’s SDKs are robust, with TypeScript support, codegen for schemas, and CLI tools for migrations and local development. You can generate types from your schema, get auto-completion in your IDE, and even use AI agents to scaffold new content models.

Studio Customization and Plugins

Sanity Studio is a fully customizable React app. Developers can build custom input components, document views, and tools (like SEO analyzers or campaign trackers) that integrate natively into the editor UI. Visual editing lets marketers preview and edit content directly on the live site, with drag-and-drop layouts and click-to-edit functionality.

Authentication, Roles, and Access Control

Sanity supports granular roles (Viewer, Contributor, Editor, Developer, Admin) and custom roles on enterprise plans. SSO integration (Okta, Azure AD, Google Workspace) is available for larger teams, and access can be scoped down to datasets or even individual documents.

Migration and Import/Export

Sanity provides CLI tools and migration guides for moving from legacy CMSs (like WordPress or Drupal) to structured content. You can script migrations, map fields, and transform HTML blobs into Portable Text (Sanity’s rich text format).

Security and Compliance

Sanity is SOC2 Type 2 certified, GDPR compliant, and offers EU data residency. Daily backups and audit logs are available on enterprise plans. Note: HIPAA and ISO certifications are inherited from Google Cloud Platform, not held directly.

2. Contentful: The Enterprise Digital Experience Platform

Contentful is the “safe” choice for enterprises. It’s a mature, cloud-based platform with robust APIs, a polished UI, and a massive ecosystem of integrations. You get structured content, localization, analytics, and personalization tools out of the box. Contentful is designed for global teams managing complex, multi-language, multi-channel content.

For marketers, Contentful offers a user-friendly editor, localization workflows, and built-in analytics. For developers, it provides REST and GraphQL APIs, SDKs for every major language, and a stable, scalable infrastructure.

Features of Contentful

Content Modeling and API-First Design

Contentful lets you define custom content types (e.g., BlogPost, Author, Category) via a web UI or API. Each type has fields (text, media, references), and you can model relationships between entries. The API-first approach means every piece of content is accessible via REST or GraphQL endpoints, making it easy to power websites, apps, and more.

Query Languages: REST and GraphQL

Contentful supports both REST and GraphQL APIs. The REST API is stable, cache-friendly, and widely supported. The GraphQL API allows you to fetch exactly the fields you need, reducing over-fetching and improving performance for complex frontends.

TypeScript and SDK Quality

Contentful’s JavaScript SDK is now written in TypeScript, providing strong type safety, auto-completion, and codegen for your content models. You can generate TypeScript types from your schema, get autosuggestions for queries, and chain client modifiers for localization and link resolution.

Localization and Personalization

Contentful excels at localization: you can manage content in dozens of languages, with field-level translations and fallback strategies. The platform also supports personalization experiments, letting you A/B test content variants and track performance with built-in analytics.

Caching, CDN, and Performance

Contentful uses a global CDN to deliver content with low latency. API responses are cacheable, and you can use webhooks to trigger static site rebuilds or incremental static regeneration (ISR) in frameworks like Next.js. Asset delivery (images, files) is optimized via a dedicated asset API.

Authentication, Roles, and Access Control

Contentful offers granular roles and permissions, SSO integration, and API tokens for secure access. You can control who can edit, publish, or view content at the space or environment level.

Ecosystem and Integrations

Contentful’s App Framework and Marketplace provide hundreds of integrations: e-commerce, analytics, marketing automation, and more. Webhooks and APIs make it easy to connect to CI/CD pipelines, static site generators, or custom workflows.

Migration and Import/Export

Contentful provides CLI tools for importing/exporting content, migrating schemas, and syncing environments. However, some users report friction when evolving content models at scale, due to rigid entry-reference structures and API rate limits.

Security and Compliance

Contentful is enterprise-ready: SOC2, GDPR, and ISO 27001 certified, with audit logs, SSO, and data residency options. Enterprise plans offer dedicated infrastructure and SLAs.

3. Strapi: The Open-Source, Self-Hosted Powerhouse

Strapi is the CMS for developers who want total control. It’s open-source, built on Node.js, and can be self-hosted anywhere: from your laptop to AWS, DigitalOcean, or Strapi Cloud. You get a slick admin UI, auto-generated REST and GraphQL APIs, and a thriving plugin ecosystem.

For marketers, Strapi offers a user-friendly editor and customizable workflows. For developers, it’s a playground for custom APIs, plugins, and integrations, with no vendor lock-in.

Features of Strapi

Content Modeling and API Generation

Strapi lets you define content types (e.g., Article, Author, Tag) via a visual builder or code. Each type becomes an auto-generated REST and/or GraphQL endpoint, complete with CRUD operations. You can customize controllers, services, and routes as needed.

Query Languages: REST and GraphQL

By default, Strapi exposes REST APIs for every content type. The optional GraphQL plugin adds a powerful Apollo-based endpoint, with schema auto-generation, custom resolvers, and a built-in playground for testing queries and mutations.

TypeScript and SDK Quality

Strapi’s core is now TypeScript-friendly, with type definitions, codegen, and strong typing for plugins and customizations. The community maintains SDKs for JavaScript, TypeScript, and popular frontend frameworks.

Authentication, Roles, and Access Control

Strapi ships with robust role-based access control (RBAC), JWT authentication, and support for social logins (Google, Facebook, etc.). You can define granular permissions for public, authenticated, and custom roles, and integrate with external identity providers.

Plugin Ecosystem and Extensibility

Strapi’s plugin system is a major strength: over 350 plugins cover everything from SEO and image optimization to custom fields, webhooks, and integrations. You can build your own plugins or extend the admin UI with React components.

Deployment and Hosting

Strapi can be self-hosted on any Node.js-compatible environment, or deployed to Strapi Cloud for managed hosting. You control the database (PostgreSQL, MySQL, MongoDB, SQLite), asset storage, and infrastructure. This means you’re responsible for scaling, backups, and security, but also free from SaaS constraints.

Caching, CDN, and Performance

Performance depends on your hosting setup. Strapi supports CDN integration for assets, reverse proxies (NGINX, Traefik), and Redis or in-memory caching for APIs. Strapi Cloud includes a global CDN and DDoS protection on paid plans.

Migration and Import/Export

Strapi provides CLI tools for migrating content, syncing environments, and exporting/importing data. You can script migrations, seed databases, and automate schema changes as part of your CI/CD pipeline.

Security and Compliance

Strapi is SOC2 and GDPR compliant, with regular security audits and a transparent open-source codebase. You’re responsible for patching, SSL, and compliance when self-hosting.

4. Ghost: The Writer’s Blogging Platform, Now Headless

I was unable to get far with exploring Ghost.
To "Get Started" on Ghost, you have to create an account and verify you are human by adding your bank details.
In this Comparison article with substack, the Ghost said it supported 135 global currencies and accepted all international payment methods. Unfortunately (?), the Nigerian Naira isn't on the list. (we meuveee).

This is a purely research-based piece on Ghost.

Ghost started as a minimalist, open-source blogging platform, and in 2026, it’s evolved into a modern, headless CMS with a focus on publishing, newsletters, and paid memberships. Ghost is beloved by writers, journalists, and indie publishers who want speed, SEO, and full control without the plugin sprawl of WordPress or the lock-in of Substack (there's an entire article about this!).

For marketers, Ghost offers a clean editor, built-in SEO, and audience growth tools. For developers, it provides a RESTful Content API, webhooks, and the option to self-host or use Ghost(Pro) for managed hosting.

Features of Ghost CMS

Content Modeling and API

Ghost’s content model is opinionated: you get Posts, Pages, Tags, Authors, and Members (for subscriptions). It’s not as flexible as Sanity or Strapi, but it’s perfect for blogs, newsletters, and publications. The RESTful Content API delivers published content in JSON, with endpoints for posts, pages, tags, authors, and settings.

Editor Experience

Ghost’s Koenig editor is Markdown-first, with support for rich embeds, images, and custom cards. The UI is distraction-free, fast, and optimized for writing flow. Memberships, comments, and newsletters are built in, no plugins required.

TypeScript and SDK Quality

Ghost’s JavaScript SDK wraps the REST API, making it easy to fetch content from any frontend (Next.js, SvelteKit, etc.). TypeScript support is solid, with type definitions and codegen for API responses.

Authentication, Roles, and Access Control

Ghost supports staff roles (Author, Editor, Admin, Owner) and member roles (free, paid, custom tiers). Access control is simple: staff manage content, members access gated posts and newsletters. SSO and advanced RBAC are available on higher plans.

Caching, CDN, and Performance

Ghost(Pro) includes a global CDN, DDoS protection, and automatic caching for assets and API responses. Self-hosted Ghost can be paired with NGINX, Cloudflare, or any CDN for optimal performance.

Deployment and Hosting

Ghost can be self-hosted (Node.js, MySQL, NGINX) or run on Ghost(Pro) for managed hosting. Ghost(Pro) handles updates, backups, SSL, and scaling.

Migration and Import/Export

Ghost provides import/export tools for posts, members, and settings. You can migrate from WordPress, Substack, or other platforms with minimal friction.

Security and Compliance

Ghost(Pro) is GDPR compliant, with enterprise-grade security, two-factor authentication, and SSO on business plans. Self-hosted users are responsible for patching and compliance.

5. Hygraph: The GraphQL-Native, API-First CMS

Hygraph (formerly GraphCMS) is the CMS for developers who love GraphQL. It’s API-first, SaaS-based, and designed for omnichannel content delivery. Hygraph is popular with teams building complex apps, e-commerce sites, and multi-platform experiences.

For marketers, Hygraph offers a visual editor, localization, and workflow tools. For developers, it’s a GraphQL playground with flexible schema modeling, content federation, and strong TypeScript support.

Features of Hygraph

Content Modeling and GraphQL APIs

Hygraph lets you define content models (types, fields, relationships) via a visual builder or API. Every model is exposed as a GraphQL endpoint, with auto-generated queries, mutations, and filtering. You can federate content from remote sources, enabling unified querying across multiple backends.

Query Language: GraphQL (Only)

Hygraph is GraphQL-native—there’s no REST API. This means you get precise, efficient queries, strong typing, and compatibility with modern frontend frameworks (Next.js, SvelteKit, etc.). The Management SDK allows programmatic schema changes and migrations, with full TypeScript support.

TypeScript and SDK Quality

Hygraph provides developer-friendly SDKs, codegen for TypeScript types, and CLI tools for migrations and environment management. The API playground makes it easy to test queries and mutations.

Authentication, Roles, and Access Control

Hygraph supports role-based access control (RBAC), OAuth authentication, SSO, audit logs, and custom roles. You can manage permissions at the model, field, or environment level.

Caching, CDN, and Performance

Hygraph delivers content via a global CDN, with middle-layer caching and predictable payloads. Performance is strong, with low latency and high concurrency limits on enterprise plans. You can use static site generators or ISR for optimal speed.

Ecosystem and Integrations

Hygraph integrates with CRM, analytics, personalization, commerce, and marketing automation tools. The Marketplace offers ready-made apps and UI extensions, and webhooks enable custom workflows.

Migration and Import/Export

Hygraph supports GraphQL mutations for bulk imports, content federation for remote data, and CLI tools for schema migrations. Migration guides and support are available for onboarding.

Security and Compliance

Hygraph is GDPR, CCPA, SOC2, and ISO 27001 compliant, with enterprise-grade security, data encryption, and manual/off-site backups.

Comparison Table: The Top 5 Headless CMS for Blogging in 2026

CMS	Free Tier Limits	Query Language(s)	Best Use-Case	Developer Experience Score (1–10)
Sanity	Unlimited admin users, free content updates, pay-as-you-go for API overages	GROQ, GraphQL, REST	Structured content, automation, multi-channel	9.5
Contentful	5 users, 100,000 API calls/mo, 50GB asset bandwidth	REST, GraphQL	Enterprise DXP, localization, integrations	8
Strapi	Free (self-hosted); Cloud: 2,500 API req/mo, 10GB storage	REST, GraphQL	Open-source, self-hosting, plugins	8.5
Ghost	$15/mo for 1,000 members, unlimited posts/emails	REST	Blogging, newsletters, memberships	7.5
Hygraph	2 locales, 3 users, unlimited assets, 10KB query	GraphQL	GraphQL-native, content federation, omnichannel	8.5

Developer Experience Score is a subjective rating based on TypeScript support, SDK quality, local dev workflows, and extensibility.

Final Thoughts: Which Headless CMS Should You Choose in 2026?

Choose Sanity if you want a developer-first, future-proof CMS with real-time collaboration, schema-as-code, and deep customization. It’s the best choice for teams who treat content as data and want to automate, scale, and innovate.
Choose Contentful if you’re an enterprise with global teams, complex localization, and a need for stability, integrations, and analytics. It’s the “safe” choice, but be prepared for higher costs and some rigidity.
Choose Strapi if you want open-source freedom, self-hosting, and total control over your backend. It’s ideal for developers who want to own their stack and avoid SaaS lock-in.
Choose Ghost if you’re a writer, blogger, or indie publisher who wants a fast, SEO-friendly, and distraction-free platform with built-in memberships and newsletters.
Choose Hygraph if you’re building complex, API-driven apps with heavy GraphQL usage, content federation, and global scale. It’s the top pick for GraphQL-first teams.

Let's connect on LinkedIn!

From Vibe Coding to Engineering: Building a Production-Ready Next.js 15 Blog with AI

Dumebi Okolo — Wed, 21 Jan 2026 10:00:00 +0000

Dusting off the room because it has been a minute or two since I was in here last!

Through last year, I ran a lot of vibe-coded projects. Most were for writing demos, others were simply for the fun of it.
However, with each new vibe-coded project, I kept getting super frustrated and super stuck with debugging AI's badly written (spaghetti) code.

"Vibe Coding" has been the trend of the moment. The idea to me was basically, "Describe your app in plain English, and the AI handles the syntax." This was the approach I kept using that kept failing until now.

Why Vibe Coding is Ineffective

VIbe coding is ineffective because most people treat AI like it's magic. They ask it for a feature, paste the code, and hope for the best. Usually, they get a messy file structure, insecure code, and a maintenance nightmare. The application might work on localhost, but it lacks the rigor required for the real world.

How I Vibe-coded My Blog

The Goal
I wanted a technical blog that was "up to standard and safe." Coming from Wordpress, where I built my blog (The Handy Developer's Guide) and lived on for the better part of a year and half, I wanted a platform I could own completely, built with modern engineering standards.

The Solution
I didn't just ask the AI for code; I managed it. I adopted the mindset of a Senior Architect and treated the AI as my junior developer.
By enforcing strict constraints and architectural patterns, I used vibe coding to build a secure, production-ready application.
The image below is where I started with Gemini. But it gets better down the line.

Steps to Vibe Code a Production-Ready App

Step 1: Defining the Architecture of Your Project

Before writing a single line of code, I had to define the stack. A standard AI prompt might suggest a generic React app or a rigid site builder. That was not enough.

The Decision
I chose a Headless Architecture:

Frontend: Next.js 15 (App Router)

Backend: Sanity (Headless CMS)
Styling: Tailwind CSS (v4)

Why I Used Sanity as a Headless CMS to Build My Blog

Separation of concerns is critical for long-term survival. With this architecture, I own the code, and I own the content.

Portability: If I want to change the design next year, I don't lose my posts. They live safely in Sanity's database.
Security: There is no exposed database or admin panel for hackers to target on the frontend.
Performance: Next.js allows for Static Site Generation (SSG), meaning my pages load instantly, together with Sanity.

Key Takeaway
I did not let the AI pick the stack; I picked the stack, then told the AI how to build it.

Step 2: Fixing AI Hallucinations in Next.js 15

The quality of the output depends entirely on the constraints of the input. I didn't just say, "Make a blog." I assigned a role and a standard.

The Trick
I used a "System Prompt" strategy to set the ground rules before any code was written.

The Prompt

The idea was to have one tab of Gemini 3 acting as the senior developer/project manager, while another tab acted as the engineer/dev on ground.
So, I got tab A to give me the high-level prompts after already explaining i=to it its role.

The Result
The AI didn't dump files in the root directory. It set up a professional folder structure (lib/, components/, types/) and automatically created a .env.local file for credentials. By explicitly banning any types, the AI was forced to write interface definitions for my Post and Author schemas, preventing runtime crashes later.

Step 3: How to Stop AI From Hardcoding API Keys.

Initially, I spun up a standalone Sanity Studio. I quickly realized this created redundancy—I didn't want to manage two separate projects. I directed the AI to refactor the architecture, merging the CMS directly into the Next.js application using an Embedded Studio.
This is how we managed it.

The Result
I had a working CMS living independently at /studio before I even had a homepage. This allowed me to write and structure content immediately, giving the frontend real data to fetch during development.

Step 4: Using AI to Fix the Errors it Generated

AI is not perfect. Even with a great prompt (I'd know), "hallucinations" happen. I had to do my fair share of debugging, but they were more minor than I remember vibe-coded errors to be.
We hit two major roadblocks.

Bug 1: The Route Group Conflict
I moved my layout files into a (blog) route group to organize the code (this was totally my choice, by the way; even though the Project Manager tab suggested it, it said it was optional). Suddenly, "the internet broke." In my terminal, I got error messages about missing tags.

The Issue: The AI had created a layout hierarchy where the root layout.tsx was missing the essential <html> and <body> tags because I had moved them into the child group.
The Fix: We refactored the hierarchy. I established a "Root Layout" for the HTML shell and a "Blog Layout" for the Navbar and Footer.

Bug 2: The "Broken Image" Saga
The homepage rendered, but every image was a broken icon. The URL looked correct, but the browser refused to load it.

The Issue: I already knew this was a security feature, not a bug. Next.js blocks external images by default to prevent malicious injection.
The Fix: I didn't panic. I just checked the configuration. I prompted the project manager tab to update next.config.ts to explicitly whitelist cdn.sanity.io. One server restart later, the images appeared.

The Lesson
AI writes the code, but you have to check the config. And sometimes, you just have to turn it off and on again.

Step 5: Refining the UI in a Vibe Coding Project (current phase)

Design
We moved from a sort of skeleton UI to a professional UI. We implemented a "Glassmorphism" navbar with a blur effect and switched to a high-quality typography pairing (Inter for UI, Playfair Display for headings).

How to Check If Your Blog is Up To Standard

SEO
"A blog that doesn't rank is a diary," said someone really famous.
I had the AI to implement Dynamic Metadata.
We used the generateMetadata function to automatically pull the SEO title, description, and OpenGraph images from Sanity. Now, every link shared on social media looks professional.

Analytics
I wanted to know if people were reading, but I didn't want to invade their privacy, so we integrated Vercel Analytics, a privacy-friendly tracker that gives me the data I need without the cookie banners users hate.

The Proof
I ran a Google Lighthouse audit on the production build to verify our "Senior Architect" standards. The results spoke for themselves:

Accessibility: 100
Best Practices: 96
SEO: 100

My project manager assured me that this was a good score, especially seeing as my blog is not yet live. Getting it live will increase the score.

Conclusion:

I haven't launched the blog yet because I still have some work to do on it. I haven't properly tested it yet.
Having been writing articles recently on Playwright, I have learnt how to do extensive searches, simulating different browser and network conditions.
In due time, though, the blog will be launched.
I wrote this article because I wanted to share an update on one of the things I have been working on so far and how AI has helped me.

Let me know what you think of my journey so far.
Do you have any Vibe coding best practices?
Do you think I am wasting my time and should learn actual programming skills?

No matter your opinions, we want to hear them!!

Find me on LinkedIn.

Introduction to AI Agents: A Technical Overview for Beginners

Dumebi Okolo — Thu, 04 Dec 2025 10:53:29 +0000

Artificial intelligence has shifted from static prompt–response patterns to systems capable of taking structured actions. These systems are known as AI agents. Although the term is often stretched in marketing, the underlying architecture is practical and grounded in well-understood software principles.

_I took the 5-day AI agents intensive course with Google and Kaggle, and I promised myself to document what I learned each day.
It's been a while since I took the course, but I have been putting off writing this article. _

This article will be part of a 5-part series where I go through each day with what I have learned and share my knowledge with you.

Now, this article outlines the foundational concepts needed to build an AI agent. It also sets the stage for subsequent posts that will explore implementation details, tool integration, orchestration, governance, and evaluation. This is Day One of a multi-part technical series.

What Is an AI Agent?

Technically, an AI agent is a software system that uses a language model, tools, and state management to complete a defined objective.
It operates through a controlled cycle of reasoning and action, instead of remaining a passive text generator.

A typical AI agent includes:

A model for reasoning
A set of tools for retrieving information or executing operations
An orchestrator that manages the interaction between the model and those tools
A deployment layer for running the system at scale

This structure turns a model from a text interface into an operational component that can support business processes or technical workflows.

The AI Agent Workflow: The Think–Act–Observe Cycle

All agent systems follow a predictable control loop.
This loop is essential because it governs correctness, safety, and resource usage.

1. Mission Acquisition
The system receives a task, either from a user request or an automated trigger.
Example: “Retrieve the status of order #12345.”

2. Context Assessment
The agent evaluates available information:

Prior messages
Stored state
Tool definitions
Policy rules

3. Reasoning Step
The model generates a plan.
Example:

Identify the correct tool for order lookup
Identify the tool for shipping data retrieval
Determine response structure

4. Action Execution
The orchestrator calls the selected tool with validated parameters.

5. Observation and Iteration
The agent incorporates tool output back into its context, reassesses the task, and continues until completion or termination.

This controlled loop prevents uncontrolled behavior and supports predictable outcomes in production systems.

Core Architecture of an AI Agent System

1. Model Layer

The model performs all reasoning.
Selection depends on:

Latency requirements
Cost boundaries
Task complexity
Input/output formats

Multiple models may be used for routing, classification, or staging tasks.
However, initial implementations usually rely on a single model for simplicity.

2. Tool Layer

Tools provide operational capability.
A tool is a function with strict input/output schemas and clear documentation.
They fall into categories such as:

Data retrieval (APIs, search functions, database operations)
Data manipulation (formatting, filtering, transformation)
Operational actions (ticket creation, notifications, calculations)

Effective tool design keeps actions narrow, predictable, and well-documented.
Tools form the “action surface” of the agent and determine how reliably the system can complete assigned objectives.

3. Orchestration Layer

This layer supervises the system. It is responsible for:

Running the reasoning loop
Applying system rules
Tracking state
Managing tool invocation
Handling errors
Regulating cost and step limits

It is also the layer where developers define the agent’s operational scope and boundaries.

4. Deployment Layer

An agent becomes useful only when deployed as a service.
A typical deployment includes:

An API interface
Logging and observability
Access controls
Storage for session data or long-term records
Continuous integration workflows

This layer ensures the agent behaves as a reliable software component rather than a prototype.

Capability Levels in AI Agents

Understanding agent capability levels helps to set realistic expectations.

Level 0: Model-Only Systems

The model answers queries without tools or memory.
Suitable for text generation or explanation tasks.

Level 1: Tool-Connected Systems

The model uses a small set of tools to complete direct tasks.
Example: Querying external APIs for factual information.

Level 2: Multi-Step Systems

The agent performs planning and executes sequences of tool calls.
This level supports tasks that require intermediate decisions.

Level 3: Multi-Agent Systems

Two or more agents collaborate.
A coordinator routes tasks to specialized agents based on capability or domain.

Level 4: Self-Improving Systems

Agents that can create new tools or reconfigure workflows based on observed gaps.
Primarily research-grade today.

Building Your Practical First Agent

Developers do not need a complex system to get a simple agent running.
A small, well-defined project is just okay for understanding the architecture.

Keep in mind that I ran all this code in Kaggle's Notebook and we used Google's Gemini for the project. The screenshots accompanying the code blocks are from my own effort.

Step 1. Configure Your Gemini API Key

Every ADK project must expose your Gemini API key to the runtime. This block sets the key as an environment variable, which the ADK automatically detects.

import os

# Replace with your actual key or load it from your environment manager
os.environ["GOOGLE_API_KEY"] = "YOUR_API_KEY_HERE"
print("API key configured.")

Step 2. Import ADK Core Components

These are the foundational ADK modules we'll interact with: agent definitions, model bindings, runtimes, and built-in tools. This is the minimum import set required to stand up a functional agent.

from google.adk.agents import Agent
from google.adk.models.google_llm import Gemini
from google.adk.runners import InMemoryRunner
from google.adk.tools import google_search
from google.genai import types

Step 3. Optional: Retry Settings

LLM APIs occasionally return transient errors under heavy load. The retry configuration defines a standard exponential backoff strategy so your agent can recover automatically without failing user tasks.

retry_config = types.HttpRetryOptions(
    attempts=5,
    exp_base=7,
    initial_delay=1,
    http_status_codes=[429, 500, 503, 504],
)

Step 4. Define Your First Agent

This is the most important construct. An agent is defined by its behavior (instruction), identity (name/description), model, and available tools. The structure below is portable across any environment.

root_agent = Agent(
    name="helpful_assistant",
    description="A simple agent that can answer general questions.",
    model=Gemini(
        model="gemini-2.5-flash-lite",
        retry_options=retry_config
    ),
    instruction="You are a helpful assistant. Use web search for current information.",
    tools=[google_search],
)

Step 5. Create a Runner

The Runner orchestrates conversations, tool calls, and message history. For prototyping, InMemoryRunner is the simplest option because it requires no infrastructure or persistent storage.

runner = InMemoryRunner(agent=root_agent)

Step 6. Run Your Agent

run_debug() executes a complete agent cycle—thought generation, tool selection, action execution, and final synthesis. This is the quickest way to validate that your agent is correctly wired.

response = await runner.run_debug(
    "What is Google's Agent Development Kit? What languages are supported?"
)
print(response.text)

Step 7. Try a Query That Requires Live Information

This example demonstrates that the agent will automatically invoke the Google Search tool when the prompt requires real-time information not contained in the model’s training data.

response = await runner.run_debug("What's the weather in London right now?")
print(response.text)

Step 8. Scaffold an ADK Project Folder (Optional)**

Explanation

ADK includes a CLI for generating full project scaffolds. This is useful when you're ready to move from experimentation into an actual multi-file agent application.

adk create sample-agent --model gemini-2.5-flash-lite --api_key $GOOGLE_API_KEY

Step 9. Launch the ADK Web UI (Optional)

The ADK Web UI is a local development interface for inspecting agent traces, debugging tool calls, and testing messages. Start it from any terminal—no Kaggle or notebook integration required.

adk web

After launching, the UI becomes available at:

http://localhost:8000

Moving forward, my subsequent articles will cover:

Designing reliable tool schemas
Structuring agent instructions
Using Model Context Protocol (MCP) in real applications
Implementing human-in-the-loop workflows
Tracking performance and diagnosing failures
Hardening agents against incorrect tool usage

That's all for day 1! Can't wait to get back here for day 2!
Did you know that the 5-Day AI agent Intensive Course is now publicly available to learn from? Head on here!

Let's connect:
Linkedin

Migraine Awareness Week 2025: Living With New Daily Persistent Headache (NDPH)

Dumebi Okolo — Wed, 24 Sep 2025 15:23:36 +0000

September 22nd to September 28th marks Migraine Awareness Week 2025, a week dedicated to raising awareness about chronic headache conditions.

Dear Dev Community, I know that this isn't developer-focused content, but like the name of this app implies, this is a community. And I think part of the benefits of a community is to share your highs and lows.
Today, I will be sharing a very big low of mine and hoping it reaches the right people and brings comfort to those who would need it.

While migraines are well-known, there's another, lesser-discussed disorder that deserves attention: New Daily Persistent Headache (NDPH).
I've lived with NDPH since 2018 (meaning my head hasn't stoped aching for 7+ years), and this is my story.

How The Headaches Began

In 2018, my life changed suddenly. One day, my head began to ache - and the pain never went away. It wasn't just the constant headache that puzzled me. I was also overwhelmed by intense hunger pangs that seemed impossible to satisfy.
I thought I had an endocrine problem, so I kept visiting specialists. Instead of answers, I got brushed aside. Some doctors accused me of exaggerating, others said I was just seeking attention. Those words hurt deeply, and the lack of support made the condition even harder to bear.
Meanwhile, the hunger took a toll. I gained weight year after year, carrying not only the burden of constant pain but also the visible effects of a body I couldn't control.
The hunger is still there, but I feel that I have grown a lot since 2018, and I am more able to handle and control myself. From my estimate, I have gained about 50–60kg (100–120lbs) since 2018 and the inception of NDPH and the hunger pangs.

The Long Journey to Diagnosis

Over the years, I tried almost everything - painkillers, anti-seizure drugs, blood pressure medication, endless consultations. Nothing helped.
Two years ago, I finally met Professor Enoch, a neurosurgeon who diagnosed me with New Daily Persistent Headache (NDPH). His verdict was blunt: there's no cure, only management.
It was both crushing and liberating. Crushing because there was no solution, liberating because I finally had a name for my condition. I wasn't imagining it.

What Is New Daily Persistent Headache (NDPH)?

NDPH is a rare and stubborn primary headache disorder. Unlike migraines or tension headaches, it starts suddenly and without warning - many people remember the exact day it began - and then it becomes constant.
Key features of NDPH include:
Sudden onset: A headache that begins one day and never goes away.
Duration: Lasts for more than three months with no break.
Symptoms: Can resemble migraines (throbbing pain, sensitivity to light and sound, nausea) or tension headaches (tight, pressing pain).
Causes: Still unclear. It may follow infections, stressful life events, or appear spontaneously.

Unfortunately, NDPH is known for being difficult to treat, and many patients - myself included - go through years of trial and error with little relief.

Misdiagnosis and the Emotional Toll

One of the toughest parts of this journey has been not being believed. Because NDPH is rare, most doctors don't recognise it right away. Instead, patients are bounced from one clinic to another, sometimes treated as if they're making it all up.
This experience leaves scars. The pain itself is exhausting, but the dismissal from medical professionals adds another layer of suffering. For me, that rejection was almost as heavy as the headaches themselves.

Learning to Manage Life With NDPH

Today, I've stopped chasing miracle cures. I'm learning how to live with NDPH, and that shift in mindset has given me some peace.
Here's what's helped me:
Acceptance: Realizing the headache may never fully go away.
Managing hunger: Finding strategies to control the relentless food cravings that come with my condition. I often go on fast periods to show/prove to my body that I, too, can win!
Lifestyle adjustments: Pacing myself, managing stress, and prioritizing rest.
Education and advocacy: Speaking openly about NDPH to raise awareness.

It's not a perfect solution, but it's how I take back some control.

Why Awareness Matters During Migraine Awareness Week

NDPH isn't the same as migraine, but both are life-altering headache disorders that deserve compassion, understanding, and research. By sharing my story during Migraine Awareness Week 2025, I want to remind people:
Chronic headaches are real, and they change lives.
Patients deserve to be listened to, not dismissed.
More research is urgently needed for conditions like NDPH.

Awareness won't cure me, but it can help shift how society responds to people living with invisible pain.

Does NDPH Count as a Disability?

This is a question I often ask myself. On one hand, NDPH doesn't always show up on the outside, so people assume you're fine. But living with daily pain absolutely affects work, social life, and mental health in ways that can be disabling.
Some organisations and countries recognise chronic headache disorders as disabilities, while others don't. For me, the label matters less than the recognition that this condition does make daily living harder. Still, it's an important question for society to wrestle with.

What do you think? Should NDPH be recognised as a disability?

✍️ Written in honour of Migraine Awareness Week 2025, and in hope of sparking conversation around New Daily Persistent Headache.

Watching this explainer video will help. You might also notice my comment littered amongst the comments. 😂😂 ignore, please!

All the Screenshots shared in this article are gotten from Cleveland Clinic's article on New Persistent Daily Headaches (NDPH)

This is a one-off article.

DEV Community: Dumebi Okolo

Demystifying RAG Architecture for Enterprise Data: A Technical Blueprint

The Problem with Standalone LLMs

The Knowledge Cutoff Problem

The Hallucination Risk

What is Retrieval-Augmented Generation (RAG)?

The Core Principle: Separate Retrieval from Generation

The Core RAG Architecture (The 3-Step Pipeline)

Step 1: Ingestion & Chunking

Data Sources & Preprocessing

Chunking Strategy: Breaking Down Knowledge

Embedding Generation: The Language of Similarity

Step 2: Storage & Semantic Search (The Vector DB)

The Role of a Vector Database

How Semantic Search Works

Popular Vector Database Options

Advanced Retrieval Strategies

Step 3: Generation (The Prompt Context)

Constructing the Augmented Prompt

LLM Interaction and Synthesis

Common Pitfalls in RAG Engineering

1. Suboptimal Chunking Strategies

2. Irrelevant or Insufficient Retrieval

3. Latency Issues

4. Prompt Engineering Failures

Conclusion & Next Steps

Ready to Build Your First RAG Application?

Feedback & Community

Building a Robust Webhook Handler in Node.js: Validation, Queuing, and Retry Logic

TL;DR Summary

What We're Building

Step 1: Signature Validation

Step 2: An In-Process Job Queue

Step 3: The Event Handler

Step 4: Idempotency

Step 5: Putting It All Together

When to Upgrade to a Real Queue

Your Social Media Content Marketing is Failing. Here's Why

The Gap Between Building and Being Seen

Why Your Regular AI Doesn't Work

The Real Reason Your Content Isn't Working

What a Sustainable Technical Content System Looks Like

1. Raw material is everywhere. Stop waiting for inspiration.

2. Platform matters more than most people think.

3. Your voice is the most important part of your content.

4. The 10% rule: the tool gets you 90, you own the rest.

The Compounding Effect Nobody Talks About

Start Small. Ship Consistently.

Gemini 2.5 Flash vs Claude 3.7 Sonnet: 4 Production Constraints That Made the Decision for Me

The Setup: What the Pipeline Actually Does

Constraint 1: Comparing Gemini vs Claude Models for JSON Output Stability

Constraint 2: Comparing Gemini vs Claude on Latency on a Live Public Sandbox

Constraint 3: Comparing Gemini vs Claude on Native Multimodal Ingestion

Constraint 4: Comparing Google Gemini vs Claude on Tone Engineering

Why use Gemini for Tone Engineering?

Gemini vs Claude Models: The Cost Reality

Where Ozigi is Going and How it'd Change My Choice of Model, Moving Foward

The Decision Matrix

Four Questions To Ask Before Choosing An LLM Model For Your Agentic Project/APP

How to End-to-end (E2E) Test AI Agents: Mocking API Responses with Playwright in Next.js

Playwright Network Interception

Network Mocking (Interception with page.route)

Why You Should Mock LLM/API Responses In Playwright

Ozigi v2 Changelog: Building a Modular Agentic Content Engine with Next.js, Supabase, and Playwright

1. Modular Refactoring of The App.tsx (Separation of Concerns)

2. Using Supabase as the Database and Tightening the Backend

3. Core Feature Additions

4. Update UI/UX & Professional Branding

5. Quality Assurance & DevOps (Automated Playwright E2E Tests)

What's Next? (v3 Roadmap)

I vibe-coded an internal tool that slashed my content workflow by 4 hours

Agentic Human-in-the-Loop (HITL) architecture

The AI Stack

Handling AI Hallucinations in Next.Js

UI/UX Strategy: The Kanban "Board" Approach

Photo dump: Agentic Content Flow in Action

What's next

Conclusion: The Engineering of Presence

UPDATE!!!!

Using Perplexity AI and Gemini 3 (Pro) for Academic Research and Writing

Network Mocking (Interception with `page.route`)