improve parsing of values with units#9190
improve parsing of values with units#9190fdncred merged 4 commits intonushell:mainfrom 1Kinoti:parse-units
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #9190 +/- ##
==========================================
- Coverage 68.92% 68.90% -0.02%
==========================================
Files 635 635
Lines 102041 101974 -67
==========================================
- Hits 70329 70268 -61
+ Misses 31712 31706 -6
|
bobhy
left a comment
There was a problem hiding this comment.
Thank you for diving in and making this change! I think it will make Nushell considerably more user-friendly, especially as a command line tool.
You'll see I'm still agitating for "redundant" valid units, e.g "da" and maybe "d" as well as "day", I do think that's an important part of the user benefit.
I also offered some suggestions about simplifying the interface with parse_unit_value(), but please take them with a grain of salt. I'm not a parser guru and there have been architectural changes since I last hacked on it.
|
|
||
| fn string_to_duration(s: &str, span: Span, value_span: Span) -> Result<i64, ShellError> { | ||
| if let Some(expression) = parse_duration_bytes(s.as_bytes(), span) { | ||
| if let Some(Ok(expression)) = parse_unit_value( |
There was a problem hiding this comment.
It looks a bit weird to have parse_unit_value() require 2 level matching everywhere it is invoked.
Couldn't it have same contract as the top level parse_* do? That is, return Expression and internally call working_set.error() to construct an error if the input doesn't parse as the expected shape? Caller already tells you what the expected units are, you could add an argument for caller to pass in "filesize" or "duration" to plug into the ::LabelledError. Right now, it returns Err(mk_error_for()) to caller, but the only thing captured in the closure is the string "filesize" or "duration", seems the long way around.
There was a problem hiding this comment.
- the reason for the nesting is to handle two sources of failure.
- errors in the number part of the value
- the value not having a valid unit, meaning it's not a
value-with-unit
-
parse_unit_value()**cannot** take external state because it may be (and is) used in contexts where we don't have access toStateWorkingSet -
i read a comment on discord about future plans to support user defined units. those and even other inbuilt units just have to call this function, and
mk_err_for()will make the error message unique to them. my argument for usingmk_err_for()is that the caller has more context. i thought about adding another argument, but i decided to move some work to the caller. either is okay, i think. (though i don't really like functions with too many arguments, andparse_unit_value()is already not looking very good)
a better solution would be to break parse_unit_value() into smaller functions which do part of the work. i haven't really thought of a way to do this
|
|
||
| pub const DURATION_UNIT_GROUPS: &[UnitGroup] = &[ | ||
| (Unit::Nanosecond, "ns", None), | ||
| (Unit::Microsecond, "us", Some((Unit::Nanosecond, 1000))), |
There was a problem hiding this comment.
In my issue, I was wishing for multiple aliases for the unit, such as "ms" as well as "microsecond", but especially "wk" as well as "week", "da" (and maybe "d") as well as "day". Could those be supported just by adding entries in these _GROUPS, or would there be more to it?
There was a problem hiding this comment.
adding entries to *_GROUPS is enough,
but wouldn't this be a bad idea? allowing very many aliases for these units could lead to confusions? especially when newer values-with-units are introduced? unless if it is configurable
There was a problem hiding this comment.
Agreed, it could become a problem.
But I think Duration type particularly needs aliases because they are part of casual conversation and there isn't a standards-based convention for names of the units of measure.
Filesize and other types that might be proposed tend to have SI measures, and those unit names have had the conflicts reconciled.
I'd propose these specific aliases for Duration. I think Filesize already has it covered.
| unit | aliases |
|---|---|
| nanosecond | nanosecond, ns |
| microsecond | microsecond, "\u{03BC}s", us |
| millisecond | millisecond, ms |
| second | second, sec, s |
| minute | minute, min |
| hour | hour, hr, h |
| day | day, d |
| week | week, wk |
There was a problem hiding this comment.
Having said all that, I'd be OK if this PR got merged without further improvements. It's already adding a good ease-of-use feature and the code and tests looks good, so far as I can see.
There was a problem hiding this comment.
I'd propose these specific aliases for Duration. I think Filesize already has it covered.
i think this would better be suited for another pr, which doesn't have to depend on this one
|
let's go! thanks again! |
closes #9111
Description
this pr improves parsing of values with units (
filesizes,durationsand any other future values) by:User-Facing Changes
filesizes and durations can now have underscores for readability