Skip to content

chore: update crawl4ai from 0.6.2 to 0.7.4#769

Merged
leex279 merged 3 commits into
mainfrom
crawl4ai-update
Oct 9, 2025
Merged

chore: update crawl4ai from 0.6.2 to 0.7.4#769
leex279 merged 3 commits into
mainfrom
crawl4ai-update

Conversation

@leex279

@leex279 leex279 commented Oct 8, 2025

Copy link
Copy Markdown
Collaborator

Summary

Updates Crawl4AI dependency from version 0.6.2 to 0.7.4, bringing performance improvements and stability fixes.

Key Improvements in v0.7.4

  • 🚀 LLM Table Extraction: Revolutionary table extraction with intelligent chunking for massive tables
  • Performance: Fixed dispatcher bug for better concurrent processing (arun_many)
  • 🔧 Browser Management: Resolved race conditions in concurrent page creation
  • 🔗 URL Processing: Enhanced handling of raw:// URLs and base tag link resolution
  • 🛡️ Proxy Support: Flexible proxy configuration supporting both dict and string formats

Changes

  • Updated crawl4ai==0.6.2crawl4ai==0.7.4 in python/pyproject.toml
  • Updated python/uv.lock with new dependencies
  • Added comprehensive update documentation in CRAWL4AI_UPDATE.md
  • Created test script for URL resolution verification

Testing Status

Completed:

  • All imports verified working
  • All 18 crawl orchestration tests passing
  • No breaking changes detected
  • API remains backward compatible

⚠️ CRITICAL: URL Resolution Bug Testing Required

Background

A critical bug was documented in v0.6.2 (see crawler-test branch):

  • Bug: ../../ relative paths only go up ONE directory instead of TWO
  • Impact: ~80% URL failure rate on documentation sites with deep nesting
  • Example: ../../guide/page.html incorrectly becomes .../reference/guide/page.html

Status in v0.7.4

Required Testing Before Merge

cd python
uv run python test_url_resolution_fix.py

This test will:

  1. Crawl AWS Boto3 documentation (known affected site)
  2. Check if ../../guide/paginators.html resolves correctly
  3. Verify if bug is fixed or still present

Do not merge until this test is run and results are documented.

Deployment Checklist

  • Run URL resolution bug test and document results
  • Rebuild Docker images
  • Test crawling in development environment
  • Verify no regressions on documentation sites (AWS Boto3, etc.)
  • Monitor initial production crawls

Related Issues

Crawl4AI issues fixed in v0.7.x:

References

Files Changed

  • python/pyproject.toml - Updated crawl4ai version
  • python/uv.lock - Updated dependency lock file
  • CRAWL4AI_UPDATE.md - Comprehensive update documentation
  • python/test_url_resolution_fix.py - Test script for URL bug verification

Summary by CodeRabbit

  • Chores
    • Updated the crawl4ai dependency to a newer version to keep the platform current and compatible with upstream changes.
    • No user-facing functionality changes; existing features continue to work as before.
    • Minor stability and compatibility improvements may be observed in environments leveraging crawling capabilities.

Updates crawl4ai dependency to latest stable version with performance
and stability improvements.

Key improvements in 0.7.4:
- LLM-powered table extraction with intelligent chunking
- Fixed dispatcher bug for better concurrent processing
- Resolved browser manager race conditions
- Enhanced URL processing and proxy support

All existing tests pass (18/18). No breaking changes identified.
API remains backward compatible.

⚠️ IMPORTANT: URL Resolution Bug Status
A critical bug in v0.6.2 where ../../ paths only go up ONE directory
instead of TWO has been documented (see crawler-test branch). Status
in v0.7.4 is UNKNOWN - testing required before production deployment.

Test script provided: python/test_url_resolution_fix.py

Related issues fixed in v0.7.x:
- #570: General relative URL handling
- #1268: URLs after redirects
- #1323: Trailing slash base URL handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Oct 8, 2025

Copy link
Copy Markdown

Walkthrough

Updated the crawl4ai dependency version from 0.6.2 to 0.7.4 in python/pyproject.toml for the server and all dependency groups. No other files or logic changed.

Changes

Cohort / File(s) Summary
Dependency version bump
python/pyproject.toml
Updated crawl4ai from 0.6.2 to 0.7.4 in [server] and [all] dependency groups.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

I nuzzle the deps with a gentle tap,
Bumping the version—no tricky trap.
From 0.6 to 0.7 I hop and play,
Fresh carrots for builds today. 🥕
Tiny change, tidy toes—hip-hop, hooray!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description provides a thorough summary, lists changes, and details testing status, but it does not follow the repository’s prescribed PR template since it omits the Type of Change section, Affected Services checklist, the standardized Testing section with checkboxes and evidence, and other required template headings. Please update the PR description to include all required template sections, including Type of Change with the appropriate option selected, Affected Services checklist, the Testing section with checkboxes and test evidence, the general Checklist, and any Breaking Changes information.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the primary change by specifying the updated dependency and its version bump, directly reflecting the PR’s main purpose without extraneous information.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch crawl4ai-update

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a580fdf and 00fe259.

⛔ Files ignored due to path filters (1)
  • python/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • python/pyproject.toml (2 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@leex279 leex279 marked this pull request as ready for review October 9, 2025 14:04
@leex279 leex279 merged commit e6d538f into main Oct 9, 2025
8 checks passed
leonj1 pushed a commit to leonj1/Archon that referenced this pull request Oct 13, 2025
chore: update crawl4ai from 0.6.2 to 0.7.4
@Wirasm Wirasm deleted the crawl4ai-update branch April 6, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant