Add run-time type checking for command pipeline input by 132ikl · Pull Request #14741 · nushell/nushell

132ikl · 2025-01-04T06:27:11Z

Description

This PR adds type checking of all command input types at run-time. Generally, these errors should be caught by the parser, but sometimes we can't know the type of a value at parse-time. The simplest example is using the echo command, which has an output type of any, so prefixing a literal with echo will bypass parse-time type checking.

Before this PR, each command has to individually check its input types. This can result in scenarios where the input/output types don't match the actual command behavior. This can cause valid usage with an non-any type to become a parse-time error if a command is missing that type in its pipeline input/output (drop nth and history import do this before this PR). Alternatively, a command may not list a type in its input/output types, but doesn't actually reject that type in its code, which can have unintended side effects (get does this on an empty pipeline input, and sort used to before #13154).

After this PR, the type of the pipeline input is checked to ensure it matches one of the input types listed in the proceeding command's input/output types. While each of the issues in the "before this PR" section could be addressed with each command individually, this PR solves this issue for all commands.

This will likely cause some breakage, as some commands have incorrect input/output types, and should be adjusted. Also, some scripts may have erroneous usage of commands. In writing this PR, I discovered that toolkit.nu was passing null values to str join, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR). I found some issues in the standard library and its tests. I also found that carapace's vendor script had an incorrect chaining of get -i:

let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion)

Before this PR, if the get -i 0 ever actually did evaluate to null, the second get invocation would error since get doesn't operate on null values. After this PR, this is immediately a run-time error, alerting the user to the problematic code. As a side note, we'll need to PR this fix (get -i 0 | get -i expansion -> get -i 0.expansion) to carapace.

A notable exception to the type checking is commands with input type of nothing -> <type>. In this case, any input type is allowed. This allows piping values into the command without an error being thrown. For example, 123 | echo $in would be an error without this exception. Additionally, custom types bypass type checking (I believe this also happens during parsing, but not certain)

I added a is_subtype method to Value and PipelineData. It functions slightly differently than get_type().is_subtype(), as noted in the doccomments. Notably, it respects structural typing of lists and tables. For example, the type of a value [{a: 123} {a: 456, b: 789}] is a subtype of table<a: int>, whereas the type returned by Value::get_type is a list<any>. Similarly, PipelineData has some special handling for ListStreams and ByteStreams. The latter was needed for this PR to work properly with external commands.

Here's some examples.

Before:

1..2 | drop nth 1
Error: nu::parser::input_type_mismatch

  × Command does not support range input.
   ╭─[entry #9:1:8]
 1 │ 1..2 | drop nth 1
   ·        ────┬───
   ·            ╰── command doesn't support range input
   ╰────

echo 1..2 | drop nth 1
# => ╭───┬───╮
# => │ 0 │ 1 │
# => ╰───┴───╯

After this PR, I've adjusted drop nth's input/output types to accept range input.

Before this PR, zip accepted any value despite not being listed in its input/output types. This caused different behavior depending on if you triggered a parse error or not:

1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support int input.
# =>    ╭─[entry #3:1:5]
# =>  1 │ 1 | zip [2]
# =>    ·     ─┬─
# =>    ·      ╰── command doesn't support int input
# =>    ╰────
echo 1 | zip [2]
# => ╭───┬───────────╮
# => │ 0 │ ╭───┬───╮ │
# => │   │ │ 0 │ 1 │ │
# => │   │ │ 1 │ 2 │ │
# => │   │ ╰───┴───╯ │
# => ╰───┴───────────╯

After this PR, it works the same in both cases. For cases like this, if we do decide we want zip or other commands to accept any input value, then we should explicitly add that to the input types.

1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support int input.
# =>    ╭─[entry #3:1:5]
# =>  1 │ 1 | zip [2]
# =>    ·     ─┬─
# =>    ·      ╰── command doesn't support int input
# =>    ╰────
echo 1 | zip [2]
# => Error: nu::shell::only_supports_this_input_type
# => 
# =>   × Input type not supported.
# =>    ╭─[entry #14:2:6]
# =>  2 │ echo 1 | zip [2]
# =>    ·      ┬   ─┬─
# =>    ·      │    ╰── only list<any> and range input data is supported
# =>    ·      ╰── input type: int
# =>    ╰────

User-Facing Changes

Breaking change: The type of a command's input is now checked against the input/output types of that command at run-time. While these errors should mostly be caught at parse-time, in cases where they can't be detected at parse-time they will be caught at run-time instead. This applies to both internal commands and custom commands.

Example function and corresponding parse-time error (same before and after PR):

def foo []: int -> nothing {
  print $"my cool int is ($in)"
}

1 | foo
# => my cool int is 1

"evil string" | foo
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support string input.
# =>    ╭─[entry #16:1:17]
# =>  1 │ "evil string" | foo
# =>    ·                 ─┬─
# =>    ·                  ╰── command doesn't support string input
# =>    ╰────
# =>

Before:

echo "evil string" | foo
# => my cool int is evil string

After:

echo "evil string" | foo
# => Error: nu::shell::only_supports_this_input_type
# => 
# =>   × Input type not supported.
# =>    ╭─[entry #17:1:6]
# =>  1 │ echo "evil string" | foo
# =>    ·      ──────┬──────   ─┬─
# =>    ·            │          ╰── only int input data is supported
# =>    ·            ╰── input type: string
# =>    ╰────

Known affected internal commands which erroneously accepted any type:

str join
zip
reduce

Tests + Formatting

🟢 toolkit fmt
🟢 toolkit clippy
🟢 toolkit test
🟢 toolkit test stdlib

After Submitting

Play whack-a-mole with the commands and scripts this will inevitably break

This reverts commit 55f6b18.

This reverts commit 771228f.

NotTheDr01ds · 2025-01-04T09:17:31Z

I also found that carapace's vendor script had an incorrect chaining of get -i:
let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion)
Before this PR, if the get -i 0 ever actually did evaluate to null, the second get invocation would error since get doesn't operate on null values.

It's possible that I'm misunderstanding here, but pre-PR this would never error, since both get statements use the --ignore-errors flag, right? The intended behavior is to return a null if errors occur in either position.

Similar test:

use std/assert
let empty_dir = (mktemp -d)
(
  assert equal
  (ls $empty_dir | get -i 0 | get -i name)
  null
)

Post PR, the -i is ignored and generates an error regardless? That makes it seem like get --ignore-errors is broken post-PR.

Side-note: This may mean we don't have (but need) a test case for get --ignore-errors?

NotTheDr01ds · 2025-01-04T09:35:32Z

Known affected internal commands which erroneously accepted any type:

str join

That's odd - help str join shows it accepting a list<any> but not any.

I discovered that toolkit.nu was passing null values to str join, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR).

How was this getting past the parser? Even pre-PR this generates a parser error for me:

null | str join
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support nothing input.
# =>    ╭─[entry #1:1:8]
# =>  1 │ null | str join
# =>    ·        ────┬───
# =>    ·            ╰── command doesn't support nothing input
# =>    ╰────

On the other hand, the following works both pre-and-post-PR:

[null 1 'one' null 2 'two'] | str join
# => 1one2two

132ikl · 2025-01-04T19:07:24Z

It's possible that I'm misunderstanding here, but pre-PR this would never error, since both get statements use the --ignore-errors flag, right? The intended behavior is to return a null if errors occur in either position.

I must've done something incorrectly while testing this for the PR, but I thought that get -i was erroring when passing null. Just tested again and this should be allowed. I'll add null to the allowed input/output types, and an explicit null handler when not using -i.

kubouch · 2025-01-05T21:43:13Z

crates/nu-protocol/src/value/mod.rs

@@ -3879,6 +3933,134 @@ mod tests {
        }
    }



Shouldn't be here #[cfg(test)]?

The enclosing module is #[cfg(test)] already

132ikl · 2025-01-05T23:34:11Z

sigh, looks like i have more moles to whack

fdncred · 2025-01-08T22:42:06Z

crates/nu-command/tests/commands/split_by.rs

@@ -0,0 +1,61 @@
+use nu_test_support::fs::Stub::EmptyFile;


I think this file (tests/commands/split_by.rs) is supposed to have been removed. I thought it was removed recently in a PR along with the split-by command.

shoot, i messed up the merge. thanks for catching this!

fdncred · 2025-01-08T23:17:22Z

I'm happy to say that I've seen no errors so far after landing this PR. That's a good sign. Nice work @132ikl!!

@fdncred

# Description  Re-removes the tests for `split_by`, which was removed in #14726 and accidentally re-introduced by #14741 cc @fdncred # User-Facing Changes  N/A # Tests + Formatting  N/A # After Submitting  N/A

nushell/nushell#14741 has surfaced an issue with an incorrect signature in `testing.nu` which caused it to fail completely. This fixes the signature and allows tests to run properly.

# Description  This PR makes two changes related to [run-time pipeline input type checking](#14741): 1. The check which bypasses type checking for commands with only `Type::Nothing` input types has been expanded to work with commands with multiple `Type::Nothing` inputs for different outputs. For example, `ast` has three input/output type pairs, but all of the inputs are `Type::Nothing`: ``` ╭───┬─────────┬────────╮ │ # │ input │ output │ ├───┼─────────┼────────┤ │ 0 │ nothing │ table │ │ 1 │ nothing │ record │ │ 2 │ nothing │ string │ ╰───┴─────────┴────────╯ ``` Before this PR, passing a value (which would otherwise be ignored) to `ast` caused a run-time type error: ``` Error: nu::shell::only_supports_this_input_type × Input type not supported. ╭─[entry #1:1:6] 1 │ echo 123 | ast -j -f "hi" · ─┬─ ─┬─ · │ ╰── only nothing, nothing, and nothing input data is supported · ╰── input type: int ╰──── ``` After this PR, no error is raised. This doesn't really matter for `ast` (the only other built-in command with a similar input/output type signature is `cal`), but it's more logically consistent. 2. Bypasses input type-checking (parse-time ***and*** run-time) for some (not all, see below) commands which have both a `Type::Nothing` input and some other non-nothing `Type` input. This is accomplished by adding a `Type::Any` input with the same output as the corresponding `Type::Nothing` input/output pair.   This is necessary because some commands are intended to operate on an argument with empty pipeline input, or operate on an empty pipeline input with no argument. This causes issues when a value is implicitly passed to one of these commands. I [discovered this issue](https://discord.com/channels/601130461678272522/615962413203718156/1329945784346611712) when working with an example where the `open` command is used in `sort-by` closure: ```nushell ls | sort-by { open -r $in.name | lines | length } ``` Before this PR (but after the run-time input type checking PR), this error is raised: ``` Error: nu::shell::only_supports_this_input_type × Input type not supported. ╭─[entry #1:1:1] 1 │ ls | sort-by { open -r $in.name | lines | length } · ─┬ ──┬─ · │ ╰── only nothing and string input data is supported · ╰── input type: record<name: string, type: string, size: filesize, modified: date> ╰──── ``` While this error is technically correct, we don't actually want to return an error here since `open` ignores its pipeline input when an argument is passed. This would be a parse-time error as well if the parser was able to infer that the closure input type was a record, but our type inference isn't that robust currently, so this technically incorrect form snuck by type checking until #14741. However, there are some commands with the same kind of type signature where this behavior is actually desirable. This means we can't just bypass type-checking for any command with a `Type::Nothing` input. These commands operate on true `null` values, rather than ignoring their input. For example, `length` returns `0` when passed a `null` value. It's correct, and even desirable, to throw a run-time error when `length` is passed an unexpected type. For example, a string, which should instead be measured with `str length`: ```nushell ["hello" "world"] | sort-by { length } # => Error: nu::shell::only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #32:1:10] # => 1 │ ["hello" "world"] | sort-by { length } # => · ───┬─── ───┬── # => · │ ╰── only list<any>, binary, and nothing input data is supported # => · ╰── input type: string # => ╰──── ``` We need a more robust way for commands to express how they handle the `Type::Nothing` input case. I think a possible solution here is to allow commands to express that they operate on `PipelineData::Empty`, rather than `Value::Nothing`. Then, a command like `open` could have an empty pipeline input type rather than a `Type::Nothing`, and the parse-time and run-time pipeline input type checks know that `open` will safely ignore an incorrectly typed input. That being said, we have a release coming up and the above solution might take a while to implement, so while unfortunate, bypassing input type-checking for these problematic commands serves as a workaround to avoid breaking changes in the release until a more robust solution is implemented. This PR bypasses input type-checking for the following commands: * `load-env`: can take record of envvars as input or argument * `nu-check`: checks input string or filename argument * `open`: can take filename as input or argument * `polars when`: can be used with input, or can be chained with another `polars when` * `stor insert`: data record can be passed as input or argument * `stor update`: data record can be passed as input or argument * `format date`: `--list` ignores input value * `into datetime`: `--list` ignores input value (also added a `Type::Nothing` input which was missing from this command) These commands have a similar input/output signature to the above commands, but are working as intended: * `cd`: The input/output signature was actually incorrect, `cd` always ignores its input. I fixed this in this PR. * `generate` * `get` * `history import` * `interleave` * `into bool` * `length` # User-Facing Changes  As a temporary workaround, pipeline input type-checking for the following commands has been bypassed to avoid undesirable run-time input type checking errors which were previously not caught at parse-time: * `open` * `load-env` * `format date` * `into datetime` * `nu-check` * `stor insert` * `stor update` * `polars when` # Tests + Formatting  CI became green in the time it took me to type the description 😄 # After Submitting  N/A

132ikl added 24 commits January 3, 2025 17:13

Add runtime type checking of command input types

41bdd0a

Add Value::is_subtype

f0e6af1

Replace usages of get_type().is_subtype()

9edfbe5

Add PipelineData::is_subtype, fix input type checking

13ff1ab

Allow any type input for commands with nothing input type

ae1e8ce

Fix input/output types of various commands

974333b

Fix toolkit.nu

7b9178c

Prettier error for empty pipeline

c9705be

Whack

82de5d5

Whack

e78780e

Add custom value hack

55f6b18

Whack

4cf995d

Whack

2d181bd

Whack

5555f3b

Whack

fedb8ec

Whack

1d4d3a3

Whack

99cf97c

Change polars input/output types

771228f

Revert "Add custom value hack"

efaa98f

This reverts commit 55f6b18.

Pass input type checking for all custom values

d674caf

Revert "Change polars input/output types"

772ce21

This reverts commit 771228f.

Whack

19277e2

Fix stdlib

7c686fe

Remove old code

a6ff171

NotTheDr01ds added the notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes label Jan 4, 2025

Rename is_subtype to is_subtype_of

8a8faf2

Add explicit nothing support to get

aba1321

Typo

19fb9ad

kubouch reviewed Jan 5, 2025

View reviewed changes

132ikl marked this pull request as draft January 5, 2025 21:48

132ikl added 3 commits January 5, 2025 17:08

Remove last usage of get_type().is_subtype()

ca979a2

Update doccomments to discourage use of Value::get_type().is_subtype()

90cf511

Merge main into input-typecheck

f021637

132ikl marked this pull request as ready for review January 5, 2025 22:23

132ikl marked this pull request as draft January 5, 2025 23:34

Update from csv/tsv and headers to account for Value::is_subtype_of

92833d2

132ikl marked this pull request as ready for review January 6, 2025 06:36

Merge main into input-typecheck

937fd23

sholderbach added the A:type-system Problems or features related to nushell's type system label Jan 8, 2025

sholderbach merged commit 214714e into nushell:main Jan 8, 2025

github-actions bot added this to the v0.102.0 milestone Jan 8, 2025

fdncred reviewed Jan 8, 2025

View reviewed changes

132ikl mentioned this pull request Jan 8, 2025

Remove file accidentally re-introduced by merge #14785

Merged

132ikl mentioned this pull request Jan 27, 2025

Run-time pipeline input typechecking tweaks #14922

Merged

This was referenced Jan 27, 2025

[stdlib-candidate] add aggregate nushell/nu_scripts#991

Merged

Fix issue with testing.nu in 0.101.1 nushell/nu_scripts#1023

Merged

sholderbach mentioned this pull request Jan 30, 2025

Flag typing inconsistency for record #14963

Open

maxim-uvarov mentioned this pull request Feb 5, 2025

Fix needed for nu 0.102 nushell-prophet/numd#69

Merged

132ikl mentioned this pull request Feb 23, 2025

Add PipelineType enum to better express input/output types #15165

Closed

sholderbach mentioned this pull request Mar 5, 2025

bugfix: math commands now return error with infinite range [#15135] #15236

Merged

132ikl mentioned this pull request Jul 7, 2025

fix(engine, type-system)!: enforce assignment type annotations at runtime #16079

Merged

mkatychev mentioned this pull request Sep 25, 2025

incorrect type inference when in collection with more than two types present #16753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add run-time type checking for command pipeline input#14741

Add run-time type checking for command pipeline input#14741
sholderbach merged 33 commits intonushell:mainfrom
132ikl:input-typecheck

132ikl commented Jan 4, 2025 •

edited

Loading

Uh oh!

NotTheDr01ds commented Jan 4, 2025 •

edited

Loading

Uh oh!

NotTheDr01ds commented Jan 4, 2025 •

edited

Loading

Uh oh!

132ikl commented Jan 4, 2025

Uh oh!

kubouch Jan 5, 2025

Uh oh!

132ikl Jan 5, 2025

Uh oh!

132ikl commented Jan 5, 2025

Uh oh!

fdncred Jan 8, 2025 •

edited

Loading

Uh oh!

132ikl Jan 8, 2025

Uh oh!

fdncred commented Jan 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

132ikl commented Jan 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

User-Facing Changes

Tests + Formatting

After Submitting

Uh oh!

NotTheDr01ds commented Jan 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NotTheDr01ds commented Jan 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

132ikl commented Jan 4, 2025

Uh oh!

kubouch Jan 5, 2025

Choose a reason for hiding this comment

Uh oh!

132ikl Jan 5, 2025

Choose a reason for hiding this comment

Uh oh!

132ikl commented Jan 5, 2025

Uh oh!

fdncred Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

132ikl Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

fdncred commented Jan 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

132ikl commented Jan 4, 2025 •

edited

Loading

NotTheDr01ds commented Jan 4, 2025 •

edited

Loading

NotTheDr01ds commented Jan 4, 2025 •

edited

Loading

fdncred Jan 8, 2025 •

edited

Loading