-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add caddl plugin for 3D/CAD asset extraction #1727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Implements a new plugin to download 3D and CAD assets from web pages. Features: - Detects and downloads files with 3D/CAD extensions (.blend, .stl, .obj, .gltf, .glb, .fbx, .vrm, .usdz, etc.) - Parses HTML from singlefile/dom extractors to find asset URLs - Configurable timeout, max file size, SSL verification, and user agent - Uses curl for downloads (already available in most systems) - Isolated plugin that doesn't depend on ArchiveBox core - Includes tests and UI templates Addresses issue #668 Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 8 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="archivebox/plugins/caddl/templates/thumbnail.html">
<violation number="1" location="archivebox/plugins/caddl/templates/thumbnail.html:3">
P2: Missing `display: flex;` on the inner div. The `flex-direction: column` property has no effect without it, so the emoji and '3D Asset' text won't stack vertically as intended.</violation>
</file>
<file name="archivebox/plugins/caddl/config.json">
<violation number="1" location="archivebox/plugins/caddl/config.json:6">
P2: Missing `CADDL_BINARY` configuration for curl path. All other plugins that use external binaries (wget, ytdlp, gallerydl, singlefile) define a `*_BINARY` property to allow configuring the binary path. For consistency, consider adding:
```json
"CADDL_BINARY": {
"type": "string",
"default": "curl",
"description": "Path to curl binary"
}
```</violation>
</file>
<file name="archivebox/plugins/caddl/on_Snapshot__65_caddl.bg.py">
<violation number="1" location="archivebox/plugins/caddl/on_Snapshot__65_caddl.bg.py:170">
P1: Path traversal vulnerability: filenames consisting only of dots (e.g., `..`) pass through sanitization and can escape the output directory. Add validation to reject or replace filenames that are `.` or `..`, or use `resolve()` to verify the final path stays within `output_dir`.</violation>
</file>
Reply to cubic to teach it or ask questions. Tag @cubic-dev-ai to re-run a review.
|
@claude unfortunatley I think this basic approach wont work that well in practice, most STL sites like thangs/thingiverse/etc. require solving a captcha to download STLs. at a minimum I think we need to convert this to use the chrome tab and puppeteer for everything instead of singlefile and curl so it uses the logged in user session see infiniscroll plugin for an example |
|
Claude finished @pirate's task in 3m 36s —— View job Converting caddl plugin to use Chrome + Puppeteer ✓Understanding the request: Most STL sites require captcha solving, so we need to use Chrome + puppeteer with the logged-in user session instead of singlefile + curl. Todo List
Changes MadeConverted from Python + curl to JavaScript + Puppeteer:
Why This Approach Is BetterThe Chrome + Puppeteer approach solves the core issue you identified:
This matches the infiniscroll plugin pattern and should work much more reliably in practice. Note on the Cubic Review IssuesThe three issues identified by cubic-dev-ai in the review are now resolved:
Issue #3 (the CSS in thumbnail.html) still exists. Would you like me to fix that as well? |
- Rewrite implementation in JavaScript using puppeteer-core - Connect to existing Chrome session via CDP for authenticated downloads - Removes need for curl binary and cookie/SSL configuration - Fixes issues with sites requiring captcha/login (Thingiverse, Thangs, etc) - Updates tests to reflect JavaScript implementation - Adds path traversal protection for filenames Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 5 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="archivebox/plugins/caddl/tests/test_caddl.py">
<violation number="1" location="archivebox/plugins/caddl/tests/test_caddl.py:69">
P2: These tests duplicate the function implementations inline rather than testing the actual script's functions. If `parseSizeLimit` or `sanitizeFilename` in `on_Snapshot__65_caddl.bg.js` is modified, these tests will still pass because they're testing their own inline copy. Consider importing and testing the actual functions from the script, or using a separate test module that the script can also import.</violation>
</file>
<file name="archivebox/plugins/caddl/on_Snapshot__65_caddl.bg.js">
<violation number="1" location="archivebox/plugins/caddl/on_Snapshot__65_caddl.bg.js:216">
P1: Throwing an error inside an event listener callback won't propagate to the outer try/catch - the max size check during download is effectively broken. The error will be unhandled while the download continues. Consider using a flag variable and checking it after the navigation, or calling `downloadPage.close()` to abort.</violation>
</file>
Reply to cubic to teach it or ask questions. Tag @cubic-dev-ai to re-run a review.
## Summary Fixes a CSS issue in the CAD/3D asset thumbnail template where flex properties were being applied without `display: flex`. ## Problem The inner div in `archivebox/plugins/caddl/templates/thumbnail.html` had `flex-direction: column` and `align-items: center` styles, but was missing `display: flex`. Without the display property set to flex, these flex properties have no effect, so the emoji (🧊) and "3D Asset" text weren't stacking vertically as intended. ## Solution Added `display: flex;` to the inner div's style attribute so the flexbox layout takes effect and the content stacks correctly. ## Changes - `archivebox/plugins/caddl/templates/thumbnail.html`: Added `display: flex;` to line 3 <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixed the CAD/3D asset thumbnail layout by adding display: flex to the inner div so flexbox applies and the 🧊 emoji and "3D Asset" label stack vertically and stay centered. <sup>Written for commit a01bdd4. Summary will update automatically on new commits.</sup> <!-- End of auto-generated description by cubic. -->
- Tests now extract and execute actual parseSizeLimit and sanitizeFilename functions from the JS file - Added more edge case tests (empty strings, invalid input, path traversal) - Fixed P1 bug where throwing in event listener wouldn't propagate to try/catch - Use flag variables to track size exceeded state and check after navigation Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
Implements a new plugin to download 3D and CAD assets from web pages.
Features
Fixes #668
Generated with Claude Code
Summary by cubic
Adds a new caddl plugin to detect and download 3D/CAD assets from captured pages using Chrome + Puppeteer via the existing browser session. Addresses issue #668 by enabling capture of common 3D formats with simple UI views and configurable limits, including support for login/captcha sites.
New Features
Bug Fixes
Written for commit e937688. Summary will update automatically on new commits.