Add crawl4ai docs functionality and update related files#85
Conversation
|
Added some missing code, refined prompt added small example. |
|
final issues resolved, you may have done this diff than I have, I was not sure if to add this as a seperate ai coder, but implemetation would be difficult as we need to use both code sources. I asked the agent a very random query = can you provide me an advanced website agent using crawl4ai and pydantic.ai using async, headless browser etc, from future import annotations as _annotations import os from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig logfire.configure(send_to_logfire='if-token-present') @DataClass advanced_crawler_agent = Agent( @advanced_crawler_agent.tool async def main(): if name == 'main': Then I asked refine this so we can use crawler run config and cache mode for crawl4ai Outputted code from future import annotations import os from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode @DataClass advanced_crawl_agent = Agent( @advanced_crawl_agent.tool async def main(): if name == 'main': |
|
then I asked make me an advanced Crawl4AI and Pydantic AI agent implementation incorporating asynchronous crawling, headless browser, comprehensive site map crawling, and markdown conversion, using the Google Vertex AI Gemini model properly via the documentation provided. output was actually quite good, even though its still not complete. Folder & File Structure: your_agent_project/ requirements.txt pydantic-ai .env.example Google Vertex credentials, set up following Vertex AI guidelines.GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json agent.py from pydantic_ai import Agent, RunContext load_dotenv() @DataClass agent = Agent( @agent.tool agent_tools.py from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode async def fetch_sitemap(url): async def parse_sitemap(xml_content): async def crawl_page(crawler, url): async def full_site_crawl_and_markdown(base_url): agent_prompts.py SYSTEM_PROMPT = """ Quick Run Example Create a new file main_async.py: import asyncio load_dotenv() deps = Deps(google_credentials=os.getenv("GOOGLE_APPLICATION_CREDENTIALS")) async def main(): if name == "main": Explanation and Setup Instructions To run the agent: pip install -r requirements.txt python main_async.py Agent Features: Would you like me to further refine this implementation autonomously, or would you prefer to review it and provide feedback for specific updates? |
…ublish-method-and-remove-encodes fix: publish events with string NATS subjects
* Migrate from Node.js/npm/Jest to Bun runtime - Replace npm with bun for package management (bun.lock) - Replace Jest with bun:test for testing - Update tsconfig for Bun (ESNext module, bundler resolution) - Update Dockerfile to use oven/bun:1-slim - Update CI workflow to use oven-sh/setup-bun@v2 - Remove dynamic import hack from codex.ts (direct ESM imports) - Fix test mocking for Bun (export execFileAsync, use spyOn) - Update all documentation (CLAUDE.md, README.md, CONTRIBUTING.md) All 395 tests pass, type-check passes, E2E validated with curl. * ci: retrigger CI build * fix: make execFileAsync a function for better Bun mockability * fix: ensure execFileAsync returns string not Buffer * fix: rename _execFileAsync to comply with naming convention * fix: make mkdirAsync mockable for Bun tests * fix: update engines to bun>=1.0.0 and add mkdirAsync mock * fix: pin Bun to 1.3.4 in CI to fix mock.module test failures Newer Bun versions have different mock.module() behavior that causes cross-test module pollution, resulting in 71 test failures in CI while tests pass locally. Pinning to 1.3.4 ensures consistent behavior. * fix: run orchestrator tests last to avoid mock.module pollution Bun's mock.module() pollutes the global module cache, causing tests to fail when orchestrator.test.ts (which mocks command-handler and factory) runs before those modules' own test files. Fix by running tests in two batches: 1. All tests except orchestrator 2. Orchestrator tests last This ensures orchestrator's mocks don't affect other test files.
* Migrate from Node.js/npm/Jest to Bun runtime - Replace npm with bun for package management (bun.lock) - Replace Jest with bun:test for testing - Update tsconfig for Bun (ESNext module, bundler resolution) - Update Dockerfile to use oven/bun:1-slim - Update CI workflow to use oven-sh/setup-bun@v2 - Remove dynamic import hack from codex.ts (direct ESM imports) - Fix test mocking for Bun (export execFileAsync, use spyOn) - Update all documentation (CLAUDE.md, README.md, CONTRIBUTING.md) All 395 tests pass, type-check passes, E2E validated with curl. * ci: retrigger CI build * fix: make execFileAsync a function for better Bun mockability * fix: ensure execFileAsync returns string not Buffer * fix: rename _execFileAsync to comply with naming convention * fix: make mkdirAsync mockable for Bun tests * fix: update engines to bun>=1.0.0 and add mkdirAsync mock * fix: pin Bun to 1.3.4 in CI to fix mock.module test failures Newer Bun versions have different mock.module() behavior that causes cross-test module pollution, resulting in 71 test failures in CI while tests pass locally. Pinning to 1.3.4 ensures consistent behavior. * fix: run orchestrator tests last to avoid mock.module pollution Bun's mock.module() pollutes the global module cache, causing tests to fail when orchestrator.test.ts (which mocks command-handler and factory) runs before those modules' own test files. Fix by running tests in two batches: 1. All tests except orchestrator 2. Orchestrator tests last This ensures orchestrator's mocks don't affect other test files.
#1483) * chore(deps): remove stale package-lock.json to clear Dependabot noise This file was deleted in #85 (Bun migration) but accidentally re-committed in #89 unrelated to that PR's actual fix. It hasn't been touched since April and isn't used by anything (CI runs `bun install`), but Dependabot keeps scanning it — every one of the 21 open alerts triaged in #1353 is against this file, not bun.lock. Removing it closes all 21 alerts. The axios `^1.15.0` override in package.json stays — it's doing real work for the bun tree because @slack/bolt pulls in a vulnerable axios transitively (CVE-2025-62718). Add package-lock.json (and yarn/pnpm lockfiles) to .gitignore so this can't silently slip back in. Closes #1353 * chore(deps): patch four runtime CVEs in bun.lock via overrides Targets #1353 alerts that resolve in the actual runtime tree (bun.lock), not just the stale package-lock.json removed in the previous commit. Added overrides: - follow-redirects ^1.16.0 — auth-header leak on cross-domain redirect (GHSA-r4q5-vmmm-2653); via @slack/bolt - path-to-regexp ^8.4.2 — DoS via sequential optional groups (CVE-2026-4926, CVE-2026-4923); via @slack/bolt + claude-agent-sdk - qs ^6.15.1 — arrayLimit bypass DoS (CVE-2025-15284, CVE-2026-2391); via @slack/bolt - flatted ^3.4.2 — prototype pollution in parse() (CVE-2026-33228); dev-only via eslint chain bun audit confirms each resolves to a single non-vulnerable version across the tree. bun run validate green. No code changes — purely transitive bumps; we don't import any of these directly. Skipped (require deeper triage): undici, lodash, picomatch — each has multiple major versions resolved in the bun tree, so a single override would force-downgrade other consumers.
Add crawl4ai docs functionality and update related files