Skip to content

Add mend CLI command#145

Merged
NGTmeaty merged 16 commits into
masterfrom
cli-close
Oct 7, 2025
Merged

Add mend CLI command#145
NGTmeaty merged 16 commits into
masterfrom
cli-close

Conversation

@CorentinB

@CorentinB CorentinB commented Sep 26, 2025

Copy link
Copy Markdown
Collaborator

This pull request introduces significant improvements to the gowarc project by restructuring the CLI tools, enhancing usability, and improving code organization and documentation. The main changes include modularizing command implementations, introducing a robust logging and flag system, and providing comprehensive documentation for CLI usage, especially for the new mend command.

CLI mend command:

This command replicates the mend feature of warctool, allowing to properly truncate and close/rename .open WARC files left by gowarc in case of crashes.

CLI and Documentation Enhancements:

  • Added a detailed "CLI Tools" section to README.md, documenting installation, available commands (extract, mend, verify, completion), usage examples, global flags, and linking to further documentation for mend.
  • Introduced a new cmd/mend/README.md with extensive documentation on the mend command, outlining features, usage, safety, and example outputs.

Codebase Refactoring and Modularization:

  • Refactored the extract command implementation:
    • Moved from cmd/extract.go to cmd/extract/extract.go for better modularity and maintainability.
    • Updated imports to use a new utils package for shared functionality.
    • Centralized flag parsing and record filtering logic into utility functions, improving code reuse and clarity.
    • Replaced hardcoded constants (e.g., permissions, filename lengths) with named constants from utils for better maintainability [1] [2] [3] [4] [5].
    • Added a summary report function after extraction completes.

CLI Framework and Logging Improvements:

  • Updated cmd/main.go to:
    • Register subcommands (extract, mend, verify) using their modularized Command objects instead of monolithic command definitions [1] [2].
    • Add global flags for JSON output and verbose logging, and configure a global logger using slog based on these flags.
    • Remove duplicate or now-unnecessary flag definitions from the main file, delegating them to each command module.

These changes collectively improve the usability, maintainability, and extensibility of the gowarc CLI, making it easier for users to interact with WARC files and for developers to add new features in the future.


References:

@CorentinB CorentinB self-assigned this Sep 26, 2025
@CorentinB CorentinB added the enhancement New feature or request label Sep 26, 2025

@equals215 equals215 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the 2 comments that are more good additions IMO

Comment thread cmd/mend/mend.go Outdated
Comment thread cmd/warc/mend/mend_test.go
@NGTmeaty

Copy link
Copy Markdown
Collaborator

Some mend tests are failing:

--- FAIL: TestMendOutputMatchesSynthetic (0.01s)
    --- FAIL: TestMendOutputMatchesSynthetic/empty.warc.gz.open (0.01s)
    --- FAIL: TestMendOutputMatchesSynthetic/corrupted-trailing-bytes.warc.gz.open (0.00s)
    --- FAIL: TestMendOutputMatchesSynthetic/corrupted-mid-record.warc.gz.open (0.00s)
    --- FAIL: TestMendOutputMatchesSynthetic/good.warc.gz.open (0.00s)
FAIL

Reviewing everything else in a moment!

@NGTmeaty NGTmeaty left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review looks good! I want to check this against internal warctool next week (or maybe this weekend) but otherwise it looks really good! Thanks again!

Comment thread README.md
Comment thread cmd/warc/utils/utils.go
Comment thread cmd/mend/mend.go Outdated
Comment thread cmd/warc/mend/mend.go
Comment thread cmd/mend/mend.go Outdated
Comment thread cmd/mend/mend.go Outdated
Comment thread cmd/warc/mend/mend_test.go
Comment thread cmd/verify/verify.go Outdated
Comment thread cmd/verify/verify.go Outdated
Comment thread cmd/warc/verify/verify.go
@CorentinB

Copy link
Copy Markdown
Collaborator Author

@NGTmeaty I made it so that it asks for deletion and ultimately delete empty WARCs, that also includes WARCs with only 1 warcinfo (e.g. when Zeno starts but no URL gets crawled)

@equals215 equals215 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, not sure the 2 first comments are actual issues or just a me skill issue

Comment thread cmd/warc/mend/mend.go
Comment thread cmd/warc/mend/mend.go
CorentinB and others added 3 commits October 6, 2025 16:03
Restructure cmd/ to cmd/warc/ to fix go install installing the binary
as 'cmd' instead of 'warc'. Update all documentation and examples to
reflect the new binary name.

@NGTmeaty NGTmeaty left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you for the modifications!

@NGTmeaty NGTmeaty merged commit 4dfb7f9 into master Oct 7, 2025
4 checks passed
@NGTmeaty NGTmeaty deleted the cli-close branch October 7, 2025 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants