Skip to content

improve parsing of values with units#9190

Merged
fdncred merged 4 commits intonushell:mainfrom
1Kinoti:parse-units
May 17, 2023
Merged

improve parsing of values with units#9190
fdncred merged 4 commits intonushell:mainfrom
1Kinoti:parse-units

Conversation

@1Kinoti
Copy link
Copy Markdown
Contributor

@1Kinoti 1Kinoti commented May 13, 2023

closes #9111

Description

this pr improves parsing of values with units (filesizes, durations and any other future values) by:

  1. allowing underscores in the value part
> 42kb          # okay
> 42_sec        # okay
> 1_000_000mib  # okay
> 69k_b         # not okay, underscores not allowed in the unit
  1. improving error messages involving these values
> sleep 40-sec

# before
Error: nu::parser::parse_mismatch

  × Parse mismatch during operation.
   ╭─[entry #42:1:1]
 1 │ sleep 40-sec
   ·       ──┬──
   ·         ╰── expected duration with valid units
   ╰────

# now
Error:
  × duration value must be a number
   ╭─[entry #41:1:1]
 1 │ sleep 40-sec
   ·       ─┬─
   ·        ╰── not a number
   ╰────
  1. unifying parsing of these values. now all of these use one function

User-Facing Changes

filesizes and durations can now have underscores for readability

@codecov
Copy link
Copy Markdown

codecov bot commented May 13, 2023

Codecov Report

Merging #9190 (6071364) into main (057de06) will decrease coverage by 0.02%.
The diff coverage is 96.77%.

❗ Current head 6071364 differs from pull request most recent head a312352. Consider uploading reports for the commit a312352 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #9190      +/-   ##
==========================================
- Coverage   68.92%   68.90%   -0.02%     
==========================================
  Files         635      635              
  Lines      102041   101974      -67     
==========================================
- Hits        70329    70268      -61     
+ Misses      31712    31706       -6     
Impacted Files Coverage Δ
crates/nu-parser/src/lib.rs 100.00% <ø> (ø)
crates/nu-parser/src/parser.rs 85.33% <96.20%> (-0.12%) ⬇️
crates/nu-command/src/conversions/into/duration.rs 65.33% <100.00%> (+0.64%) ⬆️

Copy link
Copy Markdown
Contributor

@bobhy bobhy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for diving in and making this change! I think it will make Nushell considerably more user-friendly, especially as a command line tool.
You'll see I'm still agitating for "redundant" valid units, e.g "da" and maybe "d" as well as "day", I do think that's an important part of the user benefit.
I also offered some suggestions about simplifying the interface with parse_unit_value(), but please take them with a grain of salt. I'm not a parser guru and there have been architectural changes since I last hacked on it.


fn string_to_duration(s: &str, span: Span, value_span: Span) -> Result<i64, ShellError> {
if let Some(expression) = parse_duration_bytes(s.as_bytes(), span) {
if let Some(Ok(expression)) = parse_unit_value(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit weird to have parse_unit_value() require 2 level matching everywhere it is invoked.
Couldn't it have same contract as the top level parse_* do? That is, return Expression and internally call working_set.error() to construct an error if the input doesn't parse as the expected shape? Caller already tells you what the expected units are, you could add an argument for caller to pass in "filesize" or "duration" to plug into the ::LabelledError. Right now, it returns Err(mk_error_for()) to caller, but the only thing captured in the closure is the string "filesize" or "duration", seems the long way around.

Copy link
Copy Markdown
Contributor Author

@1Kinoti 1Kinoti May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the reason for the nesting is to handle two sources of failure.
  • errors in the number part of the value
  • the value not having a valid unit, meaning it's not a value-with-unit
  1. parse_unit_value() **cannot** take external state because it may be (and is) used in contexts where we don't have access to StateWorkingSet

  2. i read a comment on discord about future plans to support user defined units. those and even other inbuilt units just have to call this function, and mk_err_for() will make the error message unique to them. my argument for using mk_err_for() is that the caller has more context. i thought about adding another argument, but i decided to move some work to the caller. either is okay, i think. (though i don't really like functions with too many arguments, and parse_unit_value() is already not looking very good)

a better solution would be to break parse_unit_value() into smaller functions which do part of the work. i haven't really thought of a way to do this


pub const DURATION_UNIT_GROUPS: &[UnitGroup] = &[
(Unit::Nanosecond, "ns", None),
(Unit::Microsecond, "us", Some((Unit::Nanosecond, 1000))),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my issue, I was wishing for multiple aliases for the unit, such as "ms" as well as "microsecond", but especially "wk" as well as "week", "da" (and maybe "d") as well as "day". Could those be supported just by adding entries in these _GROUPS, or would there be more to it?

Copy link
Copy Markdown
Contributor Author

@1Kinoti 1Kinoti May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding entries to *_GROUPS is enough,

but wouldn't this be a bad idea? allowing very many aliases for these units could lead to confusions? especially when newer values-with-units are introduced? unless if it is configurable

Copy link
Copy Markdown
Contributor

@bobhy bobhy May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it could become a problem.
But I think Duration type particularly needs aliases because they are part of casual conversation and there isn't a standards-based convention for names of the units of measure.
Filesize and other types that might be proposed tend to have SI measures, and those unit names have had the conflicts reconciled.
I'd propose these specific aliases for Duration. I think Filesize already has it covered.

unit aliases
nanosecond nanosecond, ns
microsecond microsecond, "\u{03BC}s", us
millisecond millisecond, ms
second second, sec, s
minute minute, min
hour hour, hr, h
day day, d
week week, wk

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said all that, I'd be OK if this PR got merged without further improvements. It's already adding a good ease-of-use feature and the code and tests looks good, so far as I can see.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose these specific aliases for Duration. I think Filesize already has it covered.

i think this would better be suited for another pr, which doesn't have to depend on this one

@fdncred fdncred merged commit 0e4729b into nushell:main May 17, 2023
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented May 17, 2023

let's go! thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow _ for readability in unitized constants like 3_day or 5_gb

3 participants