Skip to content

Add lazy closure evaluation to default (#14160)#15654

Merged
132ikl merged 20 commits intonushell:mainfrom
Mrfiregem:lazy-default
May 15, 2025
Merged

Add lazy closure evaluation to default (#14160)#15654
132ikl merged 20 commits intonushell:mainfrom
Mrfiregem:lazy-default

Conversation

@Mrfiregem
Copy link
Copy Markdown
Contributor

@Mrfiregem Mrfiregem commented Apr 28, 2025

Description

This PR adds lazy closure evaluation to the default command (closes #14160).

  • For non-closure values and without providing a column name, default acts the same as before
  • The user can now provide multiple column names to populate if empty
  • If the user provides a column name, the input must be a record or list, otherwise an error is created.
  • The user can now provide a closure as a default value
    • This closure is run without any arguments or input
    • The closure is never evaluated if the value isn't needed
    • Even when column names are supplied, the closure is only run once (and cached to prevent re-calling it)

For example:

> default { 1 + 2 } # => 3
> null | default 3 a   # => previously `null`, now errors
> 1 | default { sleep 5sec; 3 } # => `1`, without waiting 5 seconds

> let optional_var = null; $optional_var | default { input 'Enter value: ' } # => Returns user input
> 5 | default { input 'Enter value: ' } # => `5`, without prompting user

> ls | default { sleep 5sec; 'N/A' } name # => No-op since `name` column is never empty
> ls | default { sleep 5sec; 'N/A' } foo bar # => creates columns `foo` and `bar`; only takes 5 seconds since closure result is cached

# Old behavior is the same
> [] | default 'foo' # => []
> [] | default --empty 'foo' # => 'foo'
> default 5 # => 5

User-Facing Changes

  • Users can add default values to multiple columns now.
  • Users can now use closures as the default value passed to default.
  • To return a closure, the user must wrap the closure they want to return inside another closure, which will be run (default { $my_closure }).

Tests + Formatting

All tests pass.

After Submitting

@Mrfiregem Mrfiregem changed the title Add opt-in closure evaluation to default (issue #14160) Add opt-in closure evaluation to default (fixes #14160) Apr 28, 2025
@weirdan
Copy link
Copy Markdown
Contributor

weirdan commented Apr 29, 2025

It's worth noting that even closures can be returned by lazy default:

default -l { {ls} }

@Mrfiregem
Copy link
Copy Markdown
Contributor Author

I was considering making closure evaluation the default, but it appears there's something relying on default sending a closure object since I got a panic when trying to cargo run. I believe it's prompt related, but I didn't look into it, and figured people might rely on the same thing for their scripts.

@132ikl
Copy link
Copy Markdown
Member

132ikl commented Apr 29, 2025

wait, you got a panic? that's not good, could you push a second branch to test what's up with that?

@Mrfiregem
Copy link
Copy Markdown
Contributor Author

That was from a previous attempt a few weeks back so I could be misremembering, and it was just an error in that code. I've fiddled with the current PR implementation to see if I could find it, but everything seems to work, so I think it's a false alarm.

As an aside, it does seem like we can enable lazy closure evaluation by default, if nobody would mind a breaking change for user scripts.

diff --git a/crates/nu-command/src/filters/default.rs b/crates/nu-command/src/filters/default.rs
index a1ce21148..f02c9e507 100644
--- a/crates/nu-command/src/filters/default.rs
+++ b/crates/nu-command/src/filters/default.rs
@@ -30,14 +30,14 @@ impl Command for Default {
                 Some('e'),
             )
             .switch(
-                "lazy",
-                "if default value is a closure, evaluate it",
-                Some('l'),
+                "no-eval",
+                "if default value is a closure, pass it as-is",
+                Some('n'),
             )
             .switch(
-                "lazy-once",
-                "evaluate the closure only once, even for lists (no closure input)",
-                Some('L'),
+                "eval-once",
+                "evaluate default value closure only once, even for lists (no closure input)",
+                Some('o'),
             )
             .category(Category::Filters)
     }
@@ -54,7 +54,7 @@ impl Command for Default {
         input: PipelineData,
     ) -> Result<PipelineData, ShellError> {
         let empty = call.has_flag(engine_state, stack, "empty")?;
-        let lazy = call.has_flag(engine_state, stack, "lazy")?;
+        let lazy = !call.has_flag(engine_state, stack, "no-eval")?;
         let lazy_once = call.has_flag(engine_state, stack, "lazy-once")?;
         default(engine_state, stack, call, input, empty, lazy, lazy_once)
     }

@Bahex
Copy link
Copy Markdown
Member

Bahex commented Apr 29, 2025

Doesn't upsert cover most of the --lazy cases?

> ls | upsert title {|row| $row.name | str upcase }
╭───┬────────────┬──────┬────────┬────────────┬────────────╮
 # │    name    │ type │  size  │  modified  │   title    │
├───┼────────────┼──────┼────────┼────────────┼────────────┤
 0  Cargo.toml  file  6.9 kB  a week ago  CARGO.TOML 
 1  LICENSE     file  1.0 kB  a week ago  LICENSE    
 2  README.md   file   289 B  a week ago  README.MD  
 3  src         dir   1.0 kB  a day ago   SRC        
 4  tests       dir    192 B  a week ago  TESTS      
╰───┴────────────┴──────┴────────┴────────────┴────────────╯

> [{a:1 b:2} {b:1}] | upsert a {|row| if $in == null { $row.b + 1 } else { $in } }
# or
> [{a:1 b:2} {b:1}] | upsert a {|row| default ($row.b + 1) }
╭───┬───┬───╮
 # │ a │ b │
├───┼───┼───┤
 0  1  2 
 1  2  1 
╰───┴───┴───╯

@132ikl
Copy link
Copy Markdown
Member

132ikl commented Apr 29, 2025

As an aside, it does seem like we can enable lazy closure evaluation by default, if nobody would mind a breaking change for user scripts.

sounds good to me. I would guess not many people are using a closure lazily, and I think the behavior makes sense.

@Mrfiregem
Copy link
Copy Markdown
Contributor Author

Mrfiregem commented Apr 29, 2025

Doesn't upsert cover most of the --lazy cases?

For table/record operations, yes. It's basically upsert with an if $in != null guard (or ""/[]/{} if --empty is used), but that's already the case with the current default implementation, without this change. The biggest benefits, in my eyes, are

  1. Being able to prompt for user input without requiring a whole if-else block, if a value is missing (see this Discord thread), which would be useful when writing scripts
  2. If calculating a default value would take long time, the previous way of supplying it with a subshell or storing it in a variable would always greedily evaluate, but now it's only run when needed

@Mrfiregem Mrfiregem changed the title Add opt-in closure evaluation to default (fixes #14160) Add closure evaluation to default (fixes #14160) Apr 29, 2025
@Bahex
Copy link
Copy Markdown
Member

Bahex commented Apr 30, 2025

Passing input to default's closure just feels wrong to me, it makes default too similar to upsert.

I know it's slightly more verbose but I would rather have default as a smaller tool that can be composed with upsert.

> [{a:1 b:2} {b:1}] | default { $in.b + 1 } a
# vs
> [{a:1 b:2} {b:1}] | upsert a {|row| default { $row.b + 1 } }

This is just my personal preference, I'm not strongly opposed to the PR as it is.


Also is --no-eval really necessary? With existing commands we need to use closure-in-closure pattern @weirdan pointed out, it would be weird to have a single command that has --no-eval while other similar commands don't.

.. | insert fn { {..} }
.. | update fn { {..} }

.. | default { {..} } fn
# vs
.. | default --no-eval {..} fn

@Mrfiregem
Copy link
Copy Markdown
Contributor Author

Mrfiregem commented Apr 30, 2025

So are you suggesting removing the use case of running default with a column name and only letting it work on simple values? It would certainly simplify the command code greatly. I see your point of not wanting two commands that do practically the same thing. If that's the case, then we can make the closure input always be nothing, and point people towards the upsert foo { default [-e] ... } construct.

For the second point, if we do keep eval-by-default, I don't have a strong opinion one way or another for keeping --no-eval, besides that new users might be slightly more confused on how to pass closures when looking at help default (but that can be mentioned in an example)

@Mrfiregem
Copy link
Copy Markdown
Contributor Author

I changed the command to always evaluate closure values, following @weirdan and @Bahex's observations that returning a default value closure is possible without --no-eval, and simplified the code such that the default value is always evaluated upfront if it's a closure, which removes the need for both --no-eval and --eval-once.

If there are no other concerns, I can rewrite the PR description in the initial comment.

@weirdan
Copy link
Copy Markdown
Contributor

weirdan commented May 4, 2025

the default value is always evaluated upfront if it's a closure

wait, does that mean that true | default { random int } evaluates the { random int }? If so, it defeats the purpose of lazy evaluation.

@Mrfiregem
Copy link
Copy Markdown
Contributor Author

Mrfiregem commented May 4, 2025

Yes, you're right, lol 🤦. I'll fix that right now.

Edit: I really can't see a way to do this for the default "value" col_name construct without calculating the default value upfront or putting Value:Error values in cells, which I remember now is why I suggested removing this feature and promoting upsert { default }. If anyone has suggestions for calculating this in the .map closure, let me know. I keep running into lifetime issues, and I need more research for how to deal with them.

@weirdan
Copy link
Copy Markdown
Contributor

weirdan commented May 4, 2025

For the default {..} col form, --once option still makes sense, I think, to generate the value either on first null/empty, or for every null/empty.

@Mrfiregem
Copy link
Copy Markdown
Contributor Author

Mrfiregem commented May 7, 2025

Well, I bit the bullet and stopped trying to shove everything in the map function. The result is a large increase in LOC, but the command runs at the same speed for non-closure values (bench --pretty { ls | default 'N/A' foo } # => 693µs 252ns +/- 63µs 837ns on my machine), and since closure values are cached for defaulting columns, it's not too slow either (bench --pretty { ls | default { 'N/A' } foo } # => 2ms 306µs 666ns +/- 292µs 744ns / timeit { ls | default { sleep 5sec; 'N/A' } foo } # => 5sec 5ms 644µs 791ns).

Summary copied from commit message:

This will only ever run the closure one time, when needed, then cache the values, so long operations are only ever run once, or not at all if not necessary, even for records or tables with multiple columns.
Another change is allowing multiple column names, and erroring when the user gives a column name when it doesn't make sense for the input, whereas previously the command would always return the pipeline input. This can be reverted if needed.

Thanks to everyone for putting up with this for so long. It was much more work than I anticipated or was ready for 😅. I'll try to fix any issues you come across when testing.


Edit: One thing I noticed is that if this is added, carapace will need to update the file it generates from running carapace _carapace nushell


Edit 2: Did my last update until this is reviewed, to speed up the command by postponing creating the closure object (bench --pretty { ls | default { 'N/A' } foo } # => 728µs 975ns +/- 53µs 624ns), which makes it nearly as fast as using a simple value for short closures.


Edit 3: I've updated the original PR description with examples as to how the command functions as of this latest commit, if anyone is confused.

@132ikl
Copy link
Copy Markdown
Member

132ikl commented May 8, 2025

For the default {..} col form, --once option still makes sense, I think, to generate the value either on first null/empty, or for every null/empty.

I think the upsert { default { .. } } col form suggested by Bahex makes more sense than shoving all of that functionality into default

@weirdan
Copy link
Copy Markdown
Contributor

weirdan commented May 8, 2025

I think the upsert { default { .. } } col form suggested by Bahex makes more sense than shoving all of that functionality into default

Does that imply we want to remove the column argument from update? Also, upsert { default { .. } } col does not allow evaluating the closure once, and only if required.

@132ikl
Copy link
Copy Markdown
Member

132ikl commented May 8, 2025

Does that imply we want to remove the column argument from update? Also, upsert { default { .. } } col does not allow evaluating the closure once, and only if required.

Do you mean default here instead of update? If you mean default, then no, it's just that default with a column name would have the evaluate once behavior.

The previous implementation of the PR with --no-eval and --eval-once maps onto the current implementation of the PR like this:

default --no-eval { echo hi } -> default { { echo hi } }
default --eval-once { echo hi } column -> default { echo hi } column
default --eval-multiple { echo hi } column -> upsert { default { echo hi } } column (used --eval-multiple here for clarity even though none of the implementations actually had a flag named that, this corresponds to --lazy in the original PR)

@Mrfiregem
Copy link
Copy Markdown
Contributor Author

I've updated the original PR description with examples as to how this PR currently functions. Since there's no closure input currently, and in the spirit of default, I'm skeptical that --eval-multiple is needed. --no-eval was also removed in favor of default { {|| } } as you both suggested.

@weirdan
Copy link
Copy Markdown
Contributor

weirdan commented May 8, 2025

Do you mean default here instead of update?

Yes, of course. I must have been distracted at the moment.

default with a column name would have the evaluate once behavior.

Ah, ok, sounds good then.

@Mrfiregem Mrfiregem changed the title Add closure evaluation to default (fixes #14160) Add lazy closure evaluation to default (#14160) May 9, 2025
@Mrfiregem Mrfiregem requested a review from 132ikl May 10, 2025 12:39
Copy link
Copy Markdown
Member

@132ikl 132ikl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this! It's looking much better 😄

I did notice that this made it incompatible with streaming when using column names, which is different to the previous behavior. We also are streaming in the no column names ListStream branch, so I thought we should try to stream here.

I try to make sure what I'm thinking is actually possible before suggesting it to newer contributors, so I started playing around with it and ended up implementing the whole entire thing before I was able to determine that it is indeed possible. I didn't mean to step on your toes so much in this PR, I don't want you to feel like we don't value your contribution! As you mentioned, this ended up being a surprisingly complicated PR.

I tried to split up my changes into individual GitHub suggestions with explanations so you can understand the changes I made. You should be able to safely apply them as-is. Let me know if you have any questions or anything, and again sorry for accidentally writing so much code on what is ostensibly your PR 😅

@Mrfiregem
Copy link
Copy Markdown
Contributor Author

I try to make sure what I'm thinking is actually possible before suggesting it to newer contributors, so I started playing around with it and ended up implementing the whole entire thing before I was able to determine that it is indeed possible. I didn't mean to step on your toes so much in this PR, I don't want you to feel like we don't value your contribution! As you mentioned, this ended up being a surprisingly complicated PR.

It's not a problem at all. All of my PRs so far have been things that I wanted to use as a user of Nushell, so I appreciate your help in adding this feature 🙏. I'm definitely learning a lot about Nushell internals, so hopefully I'll be able to tackle other larger issues.

I tried to split up my changes into individual GitHub suggestions with explanations so you can understand the changes I made. You should be able to safely apply them as-is. Let me know if you have any questions or anything, and again sorry for accidentally writing so much code on what is ostensibly your PR 😅

Thanks for commenting your reasoning behind each change, too. There were some problems because a commit changed tests to use a newer rust edition, but it's all been sorted out easily.

@132ikl
Copy link
Copy Markdown
Member

132ikl commented May 14, 2025

thanks so much!! just one last thing, I promise! would you be able to add a couple tests for the new functionality in nu-command/tests/commands/default.rs? I meant to mention that yesterday but forgot 😅

A test to make sure the evaluation is actually lazy, a test for lazy returning a closure value, and testing closure evaluation with columns and with --empty would be good

@132ikl
Copy link
Copy Markdown
Member

132ikl commented May 15, 2025

looks great! thanks so much for your work on this, a long awaited feature! excited to see future contributions from you 😄

@132ikl 132ikl merged commit bb37306 into nushell:main May 15, 2025
16 checks passed
@github-actions github-actions bot added this to the v0.105.0 milestone May 15, 2025
@Mrfiregem Mrfiregem deleted the lazy-default branch May 15, 2025 19:17
@NotTheDr01ds NotTheDr01ds added the notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes label May 18, 2025
@NotTheDr01ds
Copy link
Copy Markdown
Contributor

  • To return a closure, the user must wrap the closure they want to return inside another closure, which will be run (default { $my_closure }).

This is a breaking change that gave me a startup error based on my Carapace completion closure using a default from back in the day when the config record wasn't populated by default.

Wrapping it in { $carapace_completer } works, but definitely needs to be documented as a breaking change since I'm probably not the only one with that config.

@NotTheDr01ds
Copy link
Copy Markdown
Contributor

NotTheDr01ds commented May 18, 2025

Ignore the original message - Bruce's response is spot on.

The more I look at this, the more I'm not sure about it. The fact that you have to wrap a closure-variable in additional braces to pass it as a value seems (a) to be different than any other place where we pass a closure in a variable, (b) seems too "magic" for this particular case, and (c) changes the pre-PR behavior.

I think it's going to be difficult to document (or at least for users to understand) why default requires that a closure be wrapped in an additional level of braces when no other command requires this.

Is it possible to switch the behavior so that:

  • An unwrapped variable continues to be treated as it was before, but
  • You need to wrap a closure in double {{ closure }} to have it lazily evaluated

This would seem more logical to me, as we're evaluating the closure in the lazy case.

@weirdan
Copy link
Copy Markdown
Contributor

weirdan commented May 18, 2025

The fact that you have to wrap a closure-variable in additional braces to pass it as a value seems (a) to be different than any other place where we pass a closure in a variable, (b) seems too "magic" for this particular case,

insert/update/upsert work the same

  • You need to wrap a closure in double {{ closure }} to have it lazily evaluated
    This would seem more logical to me, as we're evaluating the closure in the lazy case.

That's ... too magical and has no parallels in nushell, as far as I can tell.

@NotTheDr01ds
Copy link
Copy Markdown
Contributor

Fair enough - Ignore my late night hallucinations :-)

@132ikl
Copy link
Copy Markdown
Member

132ikl commented May 20, 2025

just remember to get back to this now, thanks for adding the breaking change tag @NotTheDr01ds, I forgot to add that! not sure if you saw, but I did also open a PR to fix the carapace issue: carapace-sh/carapace-bin#2796

khaneliman pushed a commit to nix-community/home-manager that referenced this pull request May 22, 2025
After nushell/nushell#15654, the direnv integration fails due to default behaving differently when passing a closure.
sholderbach pushed a commit that referenced this pull request Jun 4, 2025
#15654 (#15806)

# Description
Adds a temporary workaround to prevent #15654 from being a breaking
change when using a closure stored in a variable, and issues a warning.
Also has a special case related to
carapace-sh/carapace-bin#2796 which suggests
re-running `carapace init`


![image](https://github.com/user-attachments/assets/783f3dbf-2a85-4aa5-ac66-efc584ac77fd)


![image](https://github.com/user-attachments/assets/c8fb5ae1-66a8-474c-8244-a22600f4da43)

# After Submitting
Remove variable name detection after grace period
kumarUjjawal pushed a commit to kumarUjjawal/nushell that referenced this pull request Jun 5, 2025
nushell#15654 (nushell#15806)

# Description
Adds a temporary workaround to prevent nushell#15654 from being a breaking
change when using a closure stored in a variable, and issues a warning.
Also has a special case related to
carapace-sh/carapace-bin#2796 which suggests
re-running `carapace init`


![image](https://github.com/user-attachments/assets/783f3dbf-2a85-4aa5-ac66-efc584ac77fd)


![image](https://github.com/user-attachments/assets/c8fb5ae1-66a8-474c-8244-a22600f4da43)

# After Submitting
Remove variable name detection after grace period
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: lazy default

5 participants