Skip to content

feat: extract specific path, accept stdin as streaming input#384

Merged
rvagg merged 7 commits intomasterfrom
rvagg/path-extract-w-stdin-streaming
Mar 7, 2023
Merged

feat: extract specific path, accept stdin as streaming input#384
rvagg merged 7 commits intomasterfrom
rvagg/path-extract-w-stdin-streaming

Conversation

@rvagg
Copy link
Copy Markdown
Member

@rvagg rvagg commented Mar 6, 2023

(WIP: needs better testing, it works with the very limited test data I have at hand)

  • Adds a --path (-p) to extract just a specific unixfs path (and its children)
  • Adds - as a possible input for --file (-f) to read from stdin - the catch here is that it reads it into memory, so a large CAR is going to be a problem; but it will at least extract as the CAR blocks come in.

So cat foo.car | car extract -f - -p /my/specific/file.txt is a thing .. which will be even neater with lassie fetch <cid>/my/specific/file.txt -o -; when I make that a thing.

I think I might have also fixed the macos-12 flaky test problem, maybe it's not a github.com/rogpeppe/go-internal problem after all but related to how macos-12 handles buffered file reads.

@rvagg rvagg force-pushed the rvagg/path-extract-w-stdin-streaming branch from bb2448a to e072d34 Compare March 6, 2023 11:29
@rvagg
Copy link
Copy Markdown
Member Author

rvagg commented Mar 6, 2023

Latest commit adds - as an argument option that will extract a file to stdout. It'll error if it has to do anything more than a single file.

An alternative would be to provide --stdout as an argument rather than - since the output path is optional already and - is a valid (although very odd) output path that someone make want.

cat foo.car | car extract -f - -p /my/specific/file.txt - - now spits the contents of file.txt to stdout.

@willscott
Copy link
Copy Markdown
Member

this looks good! let me know when you want a full read through the code

Comment on lines +215 to +217
if nf, ok := err.(interface{ NotFound() bool }); ok && nf.NotFound() {
fmt.Fprintf(c.App.ErrWriter, "data for directory entry not found: %s (skipping...)\n", name)
return 0, nil
Copy link
Copy Markdown
Member Author

@rvagg rvagg Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting this while I think about it and so it doesn't get lost in the code move—in the case where you want to extract all, rather than a specific path, instead of erroring where a block load fails, it skips whatever it's failed to load. This needs more testing, and perhaps it's not feasable, because I've got an example where this partially works but then fails with an iterator overrun error which I don't yet understand. I'm guessing it's to do with sharded directory incompleteness but I need to work that out.

@rvagg rvagg marked this pull request as ready for review March 7, 2023 10:01
@rvagg
Copy link
Copy Markdown
Member Author

rvagg commented Mar 7, 2023

Added a big 'ol test script and some fixture data; ready for review @willscott

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants