Add run-time type checking for command pipeline input#14741
Add run-time type checking for command pipeline input#14741sholderbach merged 33 commits intonushell:mainfrom
Conversation
It's possible that I'm misunderstanding here, but pre-PR this would never error, since both Similar test: use std/assert
let empty_dir = (mktemp -d)
(
assert equal
(ls $empty_dir | get -i 0 | get -i name)
null
)Post PR, the Side-note: This may mean we don't have (but need) a test case for |
That's odd -
How was this getting past the parser? Even pre-PR this generates a parser error for me: null | str join
# => Error: nu::parser::input_type_mismatch
# =>
# => × Command does not support nothing input.
# => ╭─[entry #1:1:8]
# => 1 │ null | str join
# => · ────┬───
# => · ╰── command doesn't support nothing input
# => ╰────On the other hand, the following works both pre-and-post-PR: [null 1 'one' null 2 'two'] | str join
# => 1one2two |
I must've done something incorrectly while testing this for the PR, but I thought that |
| @@ -3879,6 +3933,134 @@ mod tests { | |||
| } | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
Shouldn't be here #[cfg(test)]?
|
sigh, looks like i have more moles to whack |
| @@ -0,0 +1,61 @@ | |||
| use nu_test_support::fs::Stub::EmptyFile; | |||
There was a problem hiding this comment.
I think this file (tests/commands/split_by.rs) is supposed to have been removed. I thought it was removed recently in a PR along with the split-by command.
There was a problem hiding this comment.
shoot, i messed up the merge. thanks for catching this!
|
I'm happy to say that I've seen no errors so far after landing this PR. That's a good sign. Nice work @132ikl!! |
<!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> Re-removes the tests for `split_by`, which was removed in #14726 and accidentally re-introduced by #14741 cc @fdncred # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> N/A # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> N/A # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> N/A
nushell/nushell#14741 has surfaced an issue with an incorrect signature in `testing.nu` which caused it to fail completely. This fixes the signature and allows tests to run properly.
<!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> This PR makes two changes related to [run-time pipeline input type checking](#14741): 1. The check which bypasses type checking for commands with only `Type::Nothing` input types has been expanded to work with commands with multiple `Type::Nothing` inputs for different outputs. For example, `ast` has three input/output type pairs, but all of the inputs are `Type::Nothing`: ``` ╭───┬─────────┬────────╮ │ # │ input │ output │ ├───┼─────────┼────────┤ │ 0 │ nothing │ table │ │ 1 │ nothing │ record │ │ 2 │ nothing │ string │ ╰───┴─────────┴────────╯ ``` Before this PR, passing a value (which would otherwise be ignored) to `ast` caused a run-time type error: ``` Error: nu::shell::only_supports_this_input_type × Input type not supported. ╭─[entry #1:1:6] 1 │ echo 123 | ast -j -f "hi" · ─┬─ ─┬─ · │ ╰── only nothing, nothing, and nothing input data is supported · ╰── input type: int ╰──── ``` After this PR, no error is raised. This doesn't really matter for `ast` (the only other built-in command with a similar input/output type signature is `cal`), but it's more logically consistent. 2. Bypasses input type-checking (parse-time ***and*** run-time) for some (not all, see below) commands which have both a `Type::Nothing` input and some other non-nothing `Type` input. This is accomplished by adding a `Type::Any` input with the same output as the corresponding `Type::Nothing` input/output pair. This is necessary because some commands are intended to operate on an argument with empty pipeline input, or operate on an empty pipeline input with no argument. This causes issues when a value is implicitly passed to one of these commands. I [discovered this issue](https://discord.com/channels/601130461678272522/615962413203718156/1329945784346611712) when working with an example where the `open` command is used in `sort-by` closure: ```nushell ls | sort-by { open -r $in.name | lines | length } ``` Before this PR (but after the run-time input type checking PR), this error is raised: ``` Error: nu::shell::only_supports_this_input_type × Input type not supported. ╭─[entry #1:1:1] 1 │ ls | sort-by { open -r $in.name | lines | length } · ─┬ ──┬─ · │ ╰── only nothing and string input data is supported · ╰── input type: record<name: string, type: string, size: filesize, modified: date> ╰──── ``` While this error is technically correct, we don't actually want to return an error here since `open` ignores its pipeline input when an argument is passed. This would be a parse-time error as well if the parser was able to infer that the closure input type was a record, but our type inference isn't that robust currently, so this technically incorrect form snuck by type checking until #14741. However, there are some commands with the same kind of type signature where this behavior is actually desirable. This means we can't just bypass type-checking for any command with a `Type::Nothing` input. These commands operate on true `null` values, rather than ignoring their input. For example, `length` returns `0` when passed a `null` value. It's correct, and even desirable, to throw a run-time error when `length` is passed an unexpected type. For example, a string, which should instead be measured with `str length`: ```nushell ["hello" "world"] | sort-by { length } # => Error: nu::shell::only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #32:1:10] # => 1 │ ["hello" "world"] | sort-by { length } # => · ───┬─── ───┬── # => · │ ╰── only list<any>, binary, and nothing input data is supported # => · ╰── input type: string # => ╰──── ``` We need a more robust way for commands to express how they handle the `Type::Nothing` input case. I think a possible solution here is to allow commands to express that they operate on `PipelineData::Empty`, rather than `Value::Nothing`. Then, a command like `open` could have an empty pipeline input type rather than a `Type::Nothing`, and the parse-time and run-time pipeline input type checks know that `open` will safely ignore an incorrectly typed input. That being said, we have a release coming up and the above solution might take a while to implement, so while unfortunate, bypassing input type-checking for these problematic commands serves as a workaround to avoid breaking changes in the release until a more robust solution is implemented. This PR bypasses input type-checking for the following commands: * `load-env`: can take record of envvars as input or argument * `nu-check`: checks input string or filename argument * `open`: can take filename as input or argument * `polars when`: can be used with input, or can be chained with another `polars when` * `stor insert`: data record can be passed as input or argument * `stor update`: data record can be passed as input or argument * `format date`: `--list` ignores input value * `into datetime`: `--list` ignores input value (also added a `Type::Nothing` input which was missing from this command) These commands have a similar input/output signature to the above commands, but are working as intended: * `cd`: The input/output signature was actually incorrect, `cd` always ignores its input. I fixed this in this PR. * `generate` * `get` * `history import` * `interleave` * `into bool` * `length` # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> As a temporary workaround, pipeline input type-checking for the following commands has been bypassed to avoid undesirable run-time input type checking errors which were previously not caught at parse-time: * `open` * `load-env` * `format date` * `into datetime` * `nu-check` * `stor insert` * `stor update` * `polars when` # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> CI became green in the time it took me to type the description 😄 # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> N/A
Description
This PR adds type checking of all command input types at run-time. Generally, these errors should be caught by the parser, but sometimes we can't know the type of a value at parse-time. The simplest example is using the
echocommand, which has an output type ofany, so prefixing a literal withechowill bypass parse-time type checking.Before this PR, each command has to individually check its input types. This can result in scenarios where the input/output types don't match the actual command behavior. This can cause valid usage with an non-
anytype to become a parse-time error if a command is missing that type in its pipeline input/output (drop nthandhistory importdo this before this PR). Alternatively, a command may not list a type in its input/output types, but doesn't actually reject that type in its code, which can have unintended side effects (getdoes this on an empty pipeline input, andsortused to before #13154).After this PR, the type of the pipeline input is checked to ensure it matches one of the input types listed in the proceeding command's input/output types. While each of the issues in the "before this PR" section could be addressed with each command individually, this PR solves this issue for all commands.
This will likely cause some breakage, as some commands have incorrect input/output types, and should be adjusted. Also, some scripts may have erroneous usage of commands. In writing this PR, I discovered that
toolkit.nuwas passingnullvalues tostr join, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR). I found some issues in the standard library and its tests. I also found that carapace's vendor script had an incorrect chaining ofget -i:Before this PR, if the
get -i 0ever actually did evaluate tonull, the secondgetinvocation would error sincegetdoesn't operate onnullvalues. After this PR, this is immediately a run-time error, alerting the user to the problematic code. As a side note, we'll need to PR this fix (get -i 0 | get -i expansion->get -i 0.expansion) to carapace.A notable exception to the type checking is commands with input type of
nothing -> <type>. In this case, any input type is allowed. This allows piping values into the command without an error being thrown. For example,123 | echo $inwould be an error without this exception. Additionally, custom types bypass type checking (I believe this also happens during parsing, but not certain)I added a
is_subtypemethod toValueandPipelineData. It functions slightly differently thanget_type().is_subtype(), as noted in the doccomments. Notably, it respects structural typing of lists and tables. For example, the type of a value[{a: 123} {a: 456, b: 789}]is a subtype oftable<a: int>, whereas the type returned byValue::get_typeis alist<any>. Similarly,PipelineDatahas some special handling forListStreams andByteStreams. The latter was needed for this PR to work properly with external commands.Here's some examples.
Before:
After this PR, I've adjusted
drop nth's input/output types to accept range input.Before this PR, zip accepted any value despite not being listed in its input/output types. This caused different behavior depending on if you triggered a parse error or not:
After this PR, it works the same in both cases. For cases like this, if we do decide we want
zipor other commands to accept any input value, then we should explicitly add that to the input types.User-Facing Changes
Breaking change: The type of a command's input is now checked against the input/output types of that command at run-time. While these errors should mostly be caught at parse-time, in cases where they can't be detected at parse-time they will be caught at run-time instead. This applies to both internal commands and custom commands.
Example function and corresponding parse-time error (same before and after PR):
Before:
After:
Known affected internal commands which erroneously accepted any type:
str joinzipreduceTests + Formatting
toolkit fmttoolkit clippytoolkit testtoolkit test stdlibAfter Submitting