Skip to content

EdgeComet/engine

Repository files navigation

EdgeComet

CI

Open-source SEO infrastructure engine: caching, pre-caching, and JavaScript rendering for search engine bots (Googlebot, Bingbot) and AI crawlers (GPTBot, ClaudeBot, PerplexityBot).

Overview

EdgeComet sits in the bot request path as a layer between your site and crawler traffic. Caching is its core job: it serves prepared HTML in milliseconds, pre-renders JavaScript only for crawlers that cannot execute it, and refreshes that cache automatically so bots get complete, fast content without your origin rendering every page on every visit.

This approach allows bots to see your content, metadata, and structured data as intended, without requiring changes to your frontend code. It works with React, Vue, Angular, or any JavaScript framework that renders client-side. The engine is open source under Apache-2.0, so you can self-host it or use the managed EdgeComet service.

Features

  • Caching: Redis for distributed coordination and metadata, filesystem for HTML content. Cached pages served in milliseconds.
  • Pre-caching: Cache Daemon automatically recaches frequently crawled pages on idle capacity, keeping popular URLs fresh without re-rendering the whole site.
  • Rendering: Managed Chrome instance pool with automatic lifecycle handling, error recovery, and distributed locking to prevent duplicate renders.
  • Configuration: Rules defined globally, per-host, or per-URL pattern using exact matches, wildcards, or regex. Query parameter matching supported.
  • Dimensions: Separate cache entries for desktop, mobile, and tablet user agents.
  • Monitoring: Prometheus metrics, structured logging (Zap), stale-while-revalidate support.
  • Open source: Full engine published under Apache-2.0. Inspect, self-host, and extend it.

Use cases

Fast bot responses and crawl budget optimization: Cache serves prepared HTML from the filesystem in milliseconds regardless of origin speed, on any stack. Bypass mode caches origin responses without rendering for pages that don't need JS execution. Reducing response time lets search and AI bots crawl more of your pages in the same window.

Automatic pre-caching: The Cache Daemon keeps frequently crawled pages fresh in the background using only idle capacity, so popular URLs stay up to date without harming real-time performance.

JavaScript rendering: Bots receive fully rendered HTML while users get the client-side app. First render takes 2-5s, cached responses under 10ms. Supports device-specific dimensions (desktop, mobile, tablet).

AI crawler support: GPTBot, ClaudeBot, PerplexityBot receive complete HTML with structured data. Works the same as search engine bot rendering.

Quick start

Prerequisites

  • Go 1.24.2 or higher
  • Chrome or Chromium browser (headless mode)
  • Redis 6.0 or higher

Build

git clone https://github.com/EdgeComet/edgecomet
cd edgecomet

go build -o bin/edge-gateway ./cmd/edge-gateway
go build -o bin/render-service ./cmd/render-service
go build -o bin/cache-daemon ./cmd/cache-daemon

Run

Copy the sample configurations and edit them for your domain:

cp configs/sample/edge-gateway.yaml configs/my_edge-gateway.yaml
cp configs/sample/render-service.yaml configs/my_render-service.yaml

Start the services:

./bin/render-service -c configs/my_render-service.yaml
./bin/edge-gateway -c configs/my_edge-gateway.yaml

Test

curl -H "X-Render-Key: your-render-key" \
     -H "User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1)" \
     "http://localhost:10070/render?url=https://example.com/your-page"
  • EC-Source: render - Freshly rendered by Chrome
  • EC-Source: render_cache - Served from cache

For a step-by-step walkthrough, see the Quick Start guide.

Architecture

EdgeComet uses a multi-service architecture with clear separation of concerns:

Client Request
    ↓
Edge Gateway
    ↓
├─→ Cache Check (Redis + Filesystem)
│   └─→ Cache Hit → Return HTML
│
└─→ Cache Miss
    ↓
    Render Service (Chrome Pool)
    ↓
    ├─→ Acquire Chrome Instance
    ├─→ Execute JavaScript
    ├─→ Capture Rendered HTML
    └─→ Store in Cache
    ↓
Return HTML

Components

Edge Gateway

Handles incoming requests, manages cache lookups, and coordinates with render services. Uses FastHTTP for high performance and supports bot detection, authentication, and flexible URL pattern matching.

Render Service

Manages a pool of headless Chrome instances, renders JavaScript pages, and registers availability in Redis. Implements automatic restart policies to prevent memory leaks and handles graceful shutdown.

Cache Daemon (optional)

Monitors bot traffic and automatically triggers recaching for frequently accessed pages. Schedules recaching based on bot traffic patterns to keep popular content fresh.

High availability and sharding

EdgeComet supports multi-instance deployments with distributed cache sharing across Edge Gateway instances. Cache sharding reduces cache misses in horizontal scaling scenarios and provides redundancy for high availability.

When you deploy multiple Edge Gateway instances with sharding enabled, the system automatically distributes cache entries across instances using consistent hashing. You configure a replication factor to control how many instances store each cache entry. For example, with replication factor 2, every rendered page is stored on two different Edge Gateway instances.

The distributed architecture tolerates instance failures gracefully. If an Edge Gateway goes offline, requests automatically route to remaining instances that hold the cache replicas. When you add new instances to the cluster, they automatically discover existing instances via Redis and begin participating in cache distribution.

Configuration

EdgeComet uses a three-level configuration hierarchy: global settings in edge-gateway.yaml, per-domain settings in hosts.d/, and per-route URL rules within each host. Settings merge at request time, with more specific levels overriding broader ones.

Sample configurations are in configs/sample/. For full details, see the Edge Gateway configuration reference.

Testing

The codebase carries 62k lines of tests against 26k lines of source (~2.4x ratio). Unit tests cover individual packages, acceptance tests spin up full service stacks with embedded Redis and real Chrome instances to verify rendering, caching, sharding, and invalidation end-to-end.

All test commands run from the tests/ directory:

cd tests

Unit tests

  • make unit - run tests for all internal packages
  • make unit-verbose - run with verbose output and coverage report

Acceptance tests

  • make test - run basic acceptance tests (single EG-RS configuration)
  • make basic FOCUS="..." - run specific test suites by name
  • make basic-verbose - run with verbose output for debugging
  • make basic-single FILE=... - run a single test file
  • make sharding - run sharding tests (multi-EG distributed cache)
  • make recache - run recache tests (automatic bot-triggered recaching)
  • make help - view all available test commands

Load testing

The load testing tool evaluates system performance under realistic traffic conditions. It requires a CSV file with URLs, sends concurrent requests to your Edge Gateway, and provides detailed metrics on response times, cache efficiency, and throughput.

Quick example:

cd tests/loadtest
go run main.go \
  -urls test.csv \
  -gateway http://localhost:10070 \
  -key your-api-key \
  -concurrency 10 \
  -duration 5m

Documentation

Monitoring

Metrics

EdgeComet exposes Prometheus metrics on a dedicated port. Configure in your service YAML:

metrics:
  enabled: true
  listen: ":10079"  # Must differ from server.listen

Available metrics include:

  • Request rates and response times
  • Cache hit/miss ratios
  • Chrome pool utilization
  • Render success/failure rates
  • Service registry status

Logging

Structured logging with Zap provides detailed operational insights:

{
  "level": "info",
  "ts": "2025-01-08T12:34:56.789Z",
  "msg": "Request rendered successfully",
  "request_id": "abc123",
  "url": "https://example.com/page",
  "render_time_ms": 1234,
  "cache_stored": true
}

Log levels: DEBUG, INFO, WARN, ERROR

Part of the broader EdgeComet platform

This repository is the open-source core of EdgeComet: the Edge Gateway, Render Service, and Cache Daemon, which handle caching, pre-caching, and rendering in the bot request path. The managed platform builds on the same in-path layer and adds Edge SEO, Log Analyzer, Evergreen Crawl, Alerting, and Search Analytics. Run the engine yourself, or use the managed service when you want these modules without operating the infrastructure.

License

Apache-2.0

About

Server-side rendering and caching system for JavaScript websites

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages