split: pass GNU tests/b-chunk.sh#5475
Conversation
|
GNU testsuite comparison: |
|
@tertsdiepraam would you mind reviewing this one ? |
|
setting back to draft - even though it passes the GNU test, the implementation for ---io-blksize option is incomplete, needs more work |
ed7c9ad to
ad29e42
Compare
|
GNU testsuite comparison: |
It seems to be unnecessary since we have already made the path relative using `construct_dest_path`.
rustix & linux-raw-sys
* uucore support for illumos and solaris * use macro to consolidate illumos and solaris signals * fixing some CI issues * replaced macro with better cfg usage
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
@tertsdiepraam @sylvestre this one is ready for review |
| None => b'\n', | ||
| }; | ||
|
|
||
| let io_blksize: Option<usize> = if let Some(s) = matches.get_one::<String>(OPT_IO_BLKSIZE) { |
There was a problem hiding this comment.
What do you think about ?
let io_blksize: Option<usize> = matches.get_one::<String>(OPT_IO_BLKSIZE).map(|s| {
parse_size_u64(s)
.ok()
.and_then(|n| {
let n = n.try_into().map_err(|_| SettingsError::InvalidIOBlockSize(s.to_string()))?;
if n > OPT_IO_BLKSIZE_MAX {
Err(SettingsError::InvalidIOBlockSize(s.to_string()))
} else {
Ok(n)
}
})
}).transpose()?;
There was a problem hiding this comment.
in this case it would "eat" possible error returned by parse_size_u64(), so it cannot be bubbled up to from()->Result. Also, since ok() converts it to Option and the and_then() expects an option to be returned from the closure, it is not straitforward to return the SettingsError in either places within the closure inside closure - i.e. doing return Err(SettingsError::InvalidIOBlockSize(s.to_string())) or Some(Err(SettingsError::InvalidIOBlockSize(s.to_string()))) would not work : as return is not for from()->Result function but closure, and wrapping in Some() to satisfy and_then() signature we will end up with Option<Option<Result<>>> as a retuning value from closure from map() instead of Option<Result<>> ... could unwrap(), but then still missing/eating error from parse_size_u64() ...
I could be missing something though
There was a problem hiding this comment.
please ignore my comment then :(
src/uu/split/src/split.rs
Outdated
| // empty STDIN stream, | ||
| // and files with true file size 0 | ||
| // will also fit here | ||
| input_size = num_bytes; |
There was a problem hiding this comment.
what about an early exit ?
| input_size = num_bytes; | |
| Ok(num_bytes) |
src/uu/split/src/split.rs
Outdated
| let mut tmp_fd = File::open(Path::new(input))?; | ||
| let end = tmp_fd.seek(SeekFrom::End(0))?; | ||
| if end > 0 { | ||
| input_size = end; |
There was a problem hiding this comment.
same:
| input_size = end; | |
| Ok(end) |
src/uu/split/src/split.rs
Outdated
| let read_limit: u64 = if let Some(n) = io_blksize { | ||
| *n | ||
| } else { | ||
| OPT_IO_BLKSIZE_MAX | ||
| } | ||
| .try_into() | ||
| .unwrap(); |
There was a problem hiding this comment.
| let read_limit: u64 = if let Some(n) = io_blksize { | |
| *n | |
| } else { | |
| OPT_IO_BLKSIZE_MAX | |
| } | |
| .try_into() | |
| .unwrap(); | |
| let read_limit = io_blksize.unwrap_or(OPT_IO_BLKSIZE_MAX) as u64; |
src/uu/split/src/split.rs
Outdated
| // could report incorrect file size via `metadata.len()` | ||
| // either `0` while there is content | ||
| // or a size larger than actual content | ||
| input_size = get_irregular_input_size(input, reader, buf, io_blksize)?; |
There was a problem hiding this comment.
same, early return
| input_size = get_irregular_input_size(input, reader, buf, io_blksize)?; | |
| get_irregular_input_size(input, reader, buf, io_blksize) |
src/uu/split/src/split.rs
Outdated
| // Regular file | ||
| let metadata = metadata(input)?; | ||
| input_size = metadata.len(); | ||
| // Double check the size if `metadata.len()` reports `0` | ||
| if input_size == 0 { | ||
| input_size = get_irregular_input_size(input, reader, buf, io_blksize)?; | ||
| } |
There was a problem hiding this comment.
| // Regular file | |
| let metadata = metadata(input)?; | |
| input_size = metadata.len(); | |
| // Double check the size if `metadata.len()` reports `0` | |
| if input_size == 0 { | |
| input_size = get_irregular_input_size(input, reader, buf, io_blksize)?; | |
| } | |
| let metadata = metadata(input)?; | |
| let size = metadata.len(); | |
| // If metadata reports size 0, use get_irregular_input_size to double-check | |
| if size == 0 { | |
| get_irregular_input_size(input, reader, buf, io_blksize) | |
| } else { | |
| Ok(size) | |
| } |
|
GNU testsuite comparison: |
tertsdiepraam
left a comment
There was a problem hiding this comment.
Impressive work! Seeing that GNU test pass is great!
src/uu/split/src/split.rs
Outdated
| // The ---io parameter is consumed and ignored. | ||
| // The parameter is included to make GNU coreutils tests pass. | ||
| static OPT_IO: &str = "-io"; |
There was a problem hiding this comment.
Here's an example from GNU:
❯ split ---
split: option '---io-blksize' requires an argument
Try 'split --help' for more information.
❯ split ---io-bl
split: option '---io-blksize' requires an argument
Try 'split --help' for more information.
This is not a separate option, but one of those things where a long argument can be shortened to the first few letters if its unambiguous. clap supports this too and I think it's turned on (it's infer_long_args(true)) so do we need this?
There was a problem hiding this comment.
yep, looks like it is the case. will remove
src/uu/split/src/split.rs
Outdated
| // of bytes per chunk. | ||
| // | ||
| // Get the size of the input in bytes | ||
| let initial_buf: &mut Vec<u8> = &mut vec![]; |
There was a problem hiding this comment.
| let initial_buf: &mut Vec<u8> = &mut vec![]; | |
| let initial_buf = Vec::new(); |
src/uu/split/src/split.rs
Outdated
| if usize::BITS < 64 { | ||
| let _: usize = num_chunks | ||
| .try_into() | ||
| .map_err(|_| USimpleError::new(1, "Number of chunks too big"))?; |
There was a problem hiding this comment.
Why this change? The original code makes sense to me. If we need it to fit in a usize why wouldn't we make it a usize?
There was a problem hiding this comment.
it made sense for the original code because num_chunks was later used in method calls that require usize like writers.iter_mut().take(num_chunks - 1), and there was a test to test for that error - looks like only for code coverage %. I left it in to keep the test, but looking at it more closely - it does not need to be usize, so I will remove it along with the test that checked for it.
src/uu/split/src/split.rs
Outdated
| if input == "-" | ||
| || input.starts_with("/dev/") | ||
| || input.starts_with("/proc/") | ||
| || input.starts_with("/sys/") |
There was a problem hiding this comment.
This feels inherently brittle and I'm wondering whether there's a better way to do it. For example, we could have links to all of these filesystems and that should still work. Based on the syscalls they make, I don't think GNU has exceptions like, though I'm not entirely sure what they are doing instead.
There was a problem hiding this comment.
I think they might just try to get the length first and try this other method if that returns 0? I'm not sure.
There was a problem hiding this comment.
in GNU they first go straight to reading it into a buffer and then compare that to what metadata/filesystem is reporting + do seek on the file if those disagree (as well as trying to copy things arround into temp file/buffer, etc) and some other edge cases. I tried to avoid reading file into a buffer first for everything, but could re-implement to closer follow GNU approach
There was a problem hiding this comment.
I think doing what GNU does is the right call! Nice investigative work! (As a precaution: remember not to look at the GNU source code)
There was a problem hiding this comment.
should be ready to go
src/uu/split/src/split.rs
Outdated
| loop { | ||
| let chunk_size = chunk_size_base + (chunk_size_reminder > i) as u64; | ||
| let buf: &mut Vec<u8> = &mut vec![]; | ||
| i += 1; |
There was a problem hiding this comment.
Would this work?
for i in 1.. {
...
}|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
@tertsdiepraam @sylvestre |
|
GNU testsuite comparison: |
|
@tertsdiepraam @sylvestre I have another set of changes ready that pass next Gnu test for split 'l-chunk', but they are based on code included in this PR. Should I add them into this PR or wait until this one is resolved and then submit those in a new one? |
--------- Co-authored-by: Terts Diepraam <terts.diepraam@gmail.com> Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> Co-authored-by: Brandon Elam Barker <brandon.barker@gmail.com> Co-authored-by: Kostiantyn Hryshchuk <statheres@gmail.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
This PR adds
---io-blksizeoption--numberstrategies0for file size, while actually having some content OR report size greater than actual contentpasses GNU tests/b-chunk.sh