Skip to content

Add run-time type checking for command pipeline input#14741

Merged
sholderbach merged 33 commits intonushell:mainfrom
132ikl:input-typecheck
Jan 8, 2025
Merged

Add run-time type checking for command pipeline input#14741
sholderbach merged 33 commits intonushell:mainfrom
132ikl:input-typecheck

Conversation

@132ikl
Copy link
Copy Markdown
Member

@132ikl 132ikl commented Jan 4, 2025

Description

This PR adds type checking of all command input types at run-time. Generally, these errors should be caught by the parser, but sometimes we can't know the type of a value at parse-time. The simplest example is using the echo command, which has an output type of any, so prefixing a literal with echo will bypass parse-time type checking.

Before this PR, each command has to individually check its input types. This can result in scenarios where the input/output types don't match the actual command behavior. This can cause valid usage with an non-any type to become a parse-time error if a command is missing that type in its pipeline input/output (drop nth and history import do this before this PR). Alternatively, a command may not list a type in its input/output types, but doesn't actually reject that type in its code, which can have unintended side effects (get does this on an empty pipeline input, and sort used to before #13154).

After this PR, the type of the pipeline input is checked to ensure it matches one of the input types listed in the proceeding command's input/output types. While each of the issues in the "before this PR" section could be addressed with each command individually, this PR solves this issue for all commands.

This will likely cause some breakage, as some commands have incorrect input/output types, and should be adjusted. Also, some scripts may have erroneous usage of commands. In writing this PR, I discovered that toolkit.nu was passing null values to str join, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR). I found some issues in the standard library and its tests. I also found that carapace's vendor script had an incorrect chaining of get -i:

let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion)

Before this PR, if the get -i 0 ever actually did evaluate to null, the second get invocation would error since get doesn't operate on null values. After this PR, this is immediately a run-time error, alerting the user to the problematic code. As a side note, we'll need to PR this fix (get -i 0 | get -i expansion -> get -i 0.expansion) to carapace.

A notable exception to the type checking is commands with input type of nothing -> <type>. In this case, any input type is allowed. This allows piping values into the command without an error being thrown. For example, 123 | echo $in would be an error without this exception. Additionally, custom types bypass type checking (I believe this also happens during parsing, but not certain)

I added a is_subtype method to Value and PipelineData. It functions slightly differently than get_type().is_subtype(), as noted in the doccomments. Notably, it respects structural typing of lists and tables. For example, the type of a value [{a: 123} {a: 456, b: 789}] is a subtype of table<a: int>, whereas the type returned by Value::get_type is a list<any>. Similarly, PipelineData has some special handling for ListStreams and ByteStreams. The latter was needed for this PR to work properly with external commands.

Here's some examples.

Before:

1..2 | drop nth 1
Error: nu::parser::input_type_mismatch

  × Command does not support range input.
   ╭─[entry #9:1:8]
 11..2 | drop nth 1
   ·        ────┬───
   ·            ╰── command doesn't support range input
   ╰────

echo 1..2 | drop nth 1
# => ╭───┬───╮
# => │ 0 │ 1 │
# => ╰───┴───╯

After this PR, I've adjusted drop nth's input/output types to accept range input.

Before this PR, zip accepted any value despite not being listed in its input/output types. This caused different behavior depending on if you triggered a parse error or not:

1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support int input.
# =>    ╭─[entry #3:1:5]
# =>  1 │ 1 | zip [2]
# =>    ·     ─┬─
# =>    ·      ╰── command doesn't support int input
# =>    ╰────
echo 1 | zip [2]
# => ╭───┬───────────╮
# => │ 0 │ ╭───┬───╮ │
# => │   │ │ 0 │ 1 │ │
# => │   │ │ 1 │ 2 │ │
# => │   │ ╰───┴───╯ │
# => ╰───┴───────────╯

After this PR, it works the same in both cases. For cases like this, if we do decide we want zip or other commands to accept any input value, then we should explicitly add that to the input types.

1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support int input.
# =>    ╭─[entry #3:1:5]
# =>  1 │ 1 | zip [2]
# =>    ·     ─┬─
# =>    ·      ╰── command doesn't support int input
# =>    ╰────
echo 1 | zip [2]
# => Error: nu::shell::only_supports_this_input_type
# => 
# =>   × Input type not supported.
# =>    ╭─[entry #14:2:6]
# =>  2 │ echo 1 | zip [2]
# =>    ·      ┬   ─┬─
# =>    ·      │    ╰── only list<any> and range input data is supported
# =>    ·      ╰── input type: int
# =>    ╰────

User-Facing Changes

Breaking change: The type of a command's input is now checked against the input/output types of that command at run-time. While these errors should mostly be caught at parse-time, in cases where they can't be detected at parse-time they will be caught at run-time instead. This applies to both internal commands and custom commands.

Example function and corresponding parse-time error (same before and after PR):

def foo []: int -> nothing {
  print $"my cool int is ($in)"
}

1 | foo
# => my cool int is 1

"evil string" | foo
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support string input.
# =>    ╭─[entry #16:1:17]
# =>  1 │ "evil string" | foo
# =>    ·                 ─┬─
# =>    ·                  ╰── command doesn't support string input
# =>    ╰────
# => 

Before:

echo "evil string" | foo
# => my cool int is evil string

After:

echo "evil string" | foo
# => Error: nu::shell::only_supports_this_input_type
# => 
# =>   × Input type not supported.
# =>    ╭─[entry #17:1:6]
# =>  1 │ echo "evil string" | foo
# =>    ·      ──────┬──────   ─┬─
# =>    ·            │          ╰── only int input data is supported
# =>    ·            ╰── input type: string
# =>    ╰────

Known affected internal commands which erroneously accepted any type:

  • str join
  • zip
  • reduce

Tests + Formatting

  • 🟢 toolkit fmt
  • 🟢 toolkit clippy
  • 🟢 toolkit test
  • 🟢 toolkit test stdlib

After Submitting

  • Play whack-a-mole with the commands and scripts this will inevitably break

@NotTheDr01ds NotTheDr01ds added the notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes label Jan 4, 2025
@NotTheDr01ds
Copy link
Copy Markdown
Contributor

NotTheDr01ds commented Jan 4, 2025

I also found that carapace's vendor script had an incorrect chaining of get -i:

let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion)

Before this PR, if the get -i 0 ever actually did evaluate to null, the second get invocation would error since get doesn't operate on null values.

It's possible that I'm misunderstanding here, but pre-PR this would never error, since both get statements use the --ignore-errors flag, right? The intended behavior is to return a null if errors occur in either position.

Similar test:

use std/assert
let empty_dir = (mktemp -d)
(
  assert equal
  (ls $empty_dir | get -i 0 | get -i name)
  null
)

Post PR, the -i is ignored and generates an error regardless? That makes it seem like get --ignore-errors is broken post-PR.

Side-note: This may mean we don't have (but need) a test case for get --ignore-errors?

@NotTheDr01ds
Copy link
Copy Markdown
Contributor

NotTheDr01ds commented Jan 4, 2025

Known affected internal commands which erroneously accepted any type:

  • str join

That's odd - help str join shows it accepting a list<any> but not any.

I discovered that toolkit.nu was passing null values to str join, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR).

How was this getting past the parser? Even pre-PR this generates a parser error for me:

null | str join
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support nothing input.
# =>    ╭─[entry #1:1:8]
# =>  1 │ null | str join
# =>    ·        ────┬───
# =>    ·            ╰── command doesn't support nothing input
# =>    ╰────

On the other hand, the following works both pre-and-post-PR:

[null 1 'one' null 2 'two'] | str join
# => 1one2two

@132ikl
Copy link
Copy Markdown
Member Author

132ikl commented Jan 4, 2025

It's possible that I'm misunderstanding here, but pre-PR this would never error, since both get statements use the --ignore-errors flag, right? The intended behavior is to return a null if errors occur in either position.

I must've done something incorrectly while testing this for the PR, but I thought that get -i was erroring when passing null. Just tested again and this should be allowed. I'll add null to the allowed input/output types, and an explicit null handler when not using -i.

@@ -3879,6 +3933,134 @@ mod tests {
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be here #[cfg(test)]?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enclosing module is #[cfg(test)] already

@132ikl 132ikl marked this pull request as draft January 5, 2025 21:48
@132ikl 132ikl marked this pull request as ready for review January 5, 2025 22:23
@132ikl
Copy link
Copy Markdown
Member Author

132ikl commented Jan 5, 2025

sigh, looks like i have more moles to whack

@132ikl 132ikl marked this pull request as draft January 5, 2025 23:34
@132ikl 132ikl marked this pull request as ready for review January 6, 2025 06:36
@sholderbach sholderbach added the A:type-system Problems or features related to nushell's type system label Jan 8, 2025
@sholderbach sholderbach merged commit 214714e into nushell:main Jan 8, 2025
@github-actions github-actions bot added this to the v0.102.0 milestone Jan 8, 2025
@@ -0,0 +1,61 @@
use nu_test_support::fs::Stub::EmptyFile;
Copy link
Copy Markdown
Contributor

@fdncred fdncred Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file (tests/commands/split_by.rs) is supposed to have been removed. I thought it was removed recently in a PR along with the split-by command.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoot, i messed up the merge. thanks for catching this!

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jan 8, 2025

I'm happy to say that I've seen no errors so far after landing this PR. That's a good sign. Nice work @132ikl!!

fdncred pushed a commit that referenced this pull request Jan 8, 2025
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->

Re-removes the tests for `split_by`, which was removed in #14726 and
accidentally re-introduced by #14741

cc @fdncred 

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

N/A

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

N/A

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
N/A
fdncred pushed a commit to nushell/nu_scripts that referenced this pull request Jan 27, 2025
nushell/nushell#14741 has surfaced an issue with
an incorrect signature in `testing.nu` which caused it to fail
completely. This fixes the signature and allows tests to run properly.
132ikl added a commit that referenced this pull request Feb 2, 2025
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->

This PR makes two changes related to [run-time pipeline input type
checking](#14741):

1. The check which bypasses type checking for commands with only
`Type::Nothing` input types has been expanded to work with commands with
multiple `Type::Nothing` inputs for different outputs. For example,
`ast` has three input/output type pairs, but all of the inputs are
`Type::Nothing`:
  ```
  ╭───┬─────────┬────────╮
  │ # │  input  │ output │
  ├───┼─────────┼────────┤
  │ 0 │ nothing │ table  │
  │ 1 │ nothing │ record │
  │ 2 │ nothing │ string │
  ╰───┴─────────┴────────╯
  ```
Before this PR, passing a value (which would otherwise be ignored) to
`ast` caused a run-time type error:
  ```
    Error: nu::shell::only_supports_this_input_type
  
    × Input type not supported.
     ╭─[entry #1:1:6]
   1 │ echo 123 | ast -j -f "hi" 
     ·      ─┬─   ─┬─
· │ ╰── only nothing, nothing, and nothing input data is supported
     ·       ╰── input type: int
     ╰────
  
  ```

  After this PR, no error is raised.

This doesn't really matter for `ast` (the only other built-in command
with a similar input/output type signature is `cal`), but it's more
logically consistent.

2. Bypasses input type-checking (parse-time ***and*** run-time) for some
(not all, see below) commands which have both a `Type::Nothing` input
and some other non-nothing `Type` input. This is accomplished by adding
a `Type::Any` input with the same output as the corresponding
`Type::Nothing` input/output pair.
  &nbsp;
This is necessary because some commands are intended to operate on an
argument with empty pipeline input, or operate on an empty pipeline
input with no argument. This causes issues when a value is implicitly
passed to one of these commands. I [discovered this
issue](https://discord.com/channels/601130461678272522/615962413203718156/1329945784346611712)
when working with an example where the `open` command is used in
`sort-by` closure:
```nushell
ls | sort-by { open -r $in.name | lines | length }
```

Before this PR (but after the run-time input type checking PR), this
error is raised:

```
Error: nu::shell::only_supports_this_input_type

  × Input type not supported.
   ╭─[entry #1:1:1]
 1 │ ls | sort-by { open -r $in.name | lines | length }
   · ─┬             ──┬─
   ·  │               ╰── only nothing and string input data is supported
   ·  ╰── input type: record<name: string, type: string, size: filesize, modified: date>
   ╰────
```

While this error is technically correct, we don't actually want to
return an error here since `open` ignores its pipeline input when an
argument is passed. This would be a parse-time error as well if the
parser was able to infer that the closure input type was a record, but
our type inference isn't that robust currently, so this technically
incorrect form snuck by type checking until #14741.

However, there are some commands with the same kind of type signature
where this behavior is actually desirable. This means we can't just
bypass type-checking for any command with a `Type::Nothing` input. These
commands operate on true `null` values, rather than ignoring their
input. For example, `length` returns `0` when passed a `null` value.
It's correct, and even desirable, to throw a run-time error when
`length` is passed an unexpected type. For example, a string, which
should instead be measured with `str length`:

```nushell
["hello" "world"] | sort-by { length }
# => Error: nu::shell::only_supports_this_input_type
# => 
# =>   × Input type not supported.
# =>    ╭─[entry #32:1:10]
# =>  1 │ ["hello" "world"] | sort-by { length }
# =>    ·          ───┬───              ───┬──
# =>    ·             │                    ╰── only list<any>, binary, and nothing input data is supported
# =>    ·             ╰── input type: string
# =>    ╰────
```

We need a more robust way for commands to express how they handle the
`Type::Nothing` input case. I think a possible solution here is to allow
commands to express that they operate on `PipelineData::Empty`, rather
than `Value::Nothing`. Then, a command like `open` could have an empty
pipeline input type rather than a `Type::Nothing`, and the parse-time
and run-time pipeline input type checks know that `open` will safely
ignore an incorrectly typed input.

That being said, we have a release coming up and the above solution
might take a while to implement, so while unfortunate, bypassing input
type-checking for these problematic commands serves as a workaround to
avoid breaking changes in the release until a more robust solution is
implemented.

This PR bypasses input type-checking for the following commands:
* `load-env`: can take record of envvars as input or argument
* `nu-check`: checks input string or filename argument 
* `open`: can take filename as input or argument
* `polars when`: can be used with input, or can be chained with another
`polars when`
* `stor insert`: data record can be passed as input or argument
* `stor update`: data record can be passed as input or argument
* `format date`: `--list` ignores input value
* `into datetime`: `--list` ignores input value (also added a
`Type::Nothing` input which was missing from this command)

These commands have a similar input/output signature to the above
commands, but are working as intended:
* `cd`: The input/output signature was actually incorrect, `cd` always
ignores its input. I fixed this in this PR.
* `generate`
* `get`
* `history import` 
* `interleave`
* `into bool`
* `length`

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

As a temporary workaround, pipeline input type-checking for the
following commands has been bypassed to avoid undesirable run-time input
type checking errors which were previously not caught at parse-time:
* `open`
* `load-env`
* `format date`
* `into datetime`
* `nu-check`
* `stor insert`
* `stor update`
* `polars when`

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
CI became green in the time it took me to type the description 😄 

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->

N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A:type-system Problems or features related to nushell's type system notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants