Skip to content

Optional chaining for cell paths (?.)#7540

Closed
rgwood wants to merge 30 commits intonushell:mainfrom
rgwood:optional-chaining
Closed

Optional chaining for cell paths (?.)#7540
rgwood wants to merge 30 commits intonushell:mainfrom
rgwood:optional-chaining

Conversation

@rgwood
Copy link
Copy Markdown
Contributor

@rgwood rgwood commented Dec 20, 2022

This PR adds the ability to use ?. in cell paths, somewhat like optional chaining in JS and C#'s null-conditional operator.

This provides a succinct way for users to specify whether a failed cell path access should return a ShellError or Value::Nothing. It is not aiming for exact compatibility with JS and C#.

Examples

# fails because Value::Nothing has no data to access
$nothing.foo
# returns Value::Nothing because `?` was used
$nothing?.foo

# fails because `asdf` is not a column on the record
{foo: 'bar'}.asdf
# returns Value::Nothing because `?` was used
{foo: 'bar'}?.asdf

# fails because `asdf` is not a column on the nested record
{foo: {bar: 'baz'} }.foo.asdf
# returns Value::Nothing
{foo: {bar: 'baz'} }.foo?.asdf

# fails because there is no 10th row in the list
[{foo: 'bar'}].10
# returns Value::Nothing
[{foo: 'bar'}]?.10

# fails because there is no `foo` column in the 2nd row
# BREAKING CHANGE - PREVIOUSLY THIS WOULD SUCCEED AS LONG AS 1+ ROW HAD A `foo` COLUMN
[{foo: 'bar'} {}].foo

# returns a `list<any>` with 2 values: "bar" and $nothing
[{foo: 'bar'} {}]?.foo

How does this relate to get and select?

As always, any cell path access can be rewritten to use get (ex: $foo.a -> $foo | get a). The arguments given to get and select (ex: foo and bar in select foo bar) are actually cell paths.

Before this PR, cell paths used by get and select were not allowed to start with a .; parsing get .a would fail. After this PR, a . before the first member is (optionally) allowed. All of the following are now valid:

get a
get .a
get ?a
get ?.a

I think this makes sense; the user's intent is unambiguous in each case and it provides consistency whether using get or accessing a cell path without get.

As part of this PR, the -i/--ignore-errors flag has been removed from get and select. Using ? in cell paths now offers more fine-grained control over error handling:

# this fails because `b` is missing on the 2nd row (not new behaviour)
[{a: 1, b: 2} {a: 3}] | select b

# previously you had to use something like `default` to convert missing values to null
[{a: 1, b: 2} {a: 3}] | default null b | select b

# the old `-i` flag would return a single null value instead of a table with some null cells - usually not what you want!
[{a: 1, b: 2} {a: 3}] | select -i b

# now you can use `?`
[{a: 1, b: 2} {a: 3}] | select ?b

Future Work

If this change lands, I should be able to follow it up with another PR that makes cell path access stream properly instead of collecting ListStreams.

@rgwood rgwood added status:wait-until-after-nushell-release notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes labels Dec 20, 2022
@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 20, 2022

cc: @webbedspace

},
Example {
description: "Get a column from a table, and return null for any rows where that column does not exist",
example: "[{A: A0} {}] | get ?.A",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an example for get .A as well, with the text indicating that the dot is strictly optional. Such an example would provide needed clarity as to what the ?. syntax "means". (Currently, it sorta looks like the ? is another cell name, when actually it's an extension of the separator between names.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First attempt:
image

I feel like that could still be a little more clear. It's late here, will sleep on it.

@webbedspace
Copy link
Copy Markdown
Contributor

Could you add tests that this works with into cellpath? (i.e. Having ?. in a "cellpath string" works correctly.)

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 20, 2022

@webbedspace into cellpath has been deleted for now: #7523

@webbedspace
Copy link
Copy Markdown
Contributor

Ah, very well then.

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 22, 2022

I've confirmed that this works as expected with select, and removed the -i flag from select. I believe this would solve a real usability problem for select: #7558

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 22, 2022

This proposal confused a few people on Discord. It's possible I didn't explain it well there, but it's also possible there might be better syntax out there for this kind of functionality.

I feel that this PR is ready to go from a code quality perspective, but I'd like to hear a few more people weigh in on whether they think the syntax is OK (with alternative examples if you don't like the current syntax).

@webbedspace
Copy link
Copy Markdown
Contributor

webbedspace commented Dec 24, 2022

Actually, now that you mention it, I just noticed this does have some peculiarities I don't know if I like. (Sorry for not scrutinising that closely earlier).

It is not aiming for exact compatibility with JS and C#.

So, like, in JS and C# (and Swift), ?. means "return null if the left side is null". (JS uses undefined, but w/e)
Here, it actually means "return null if the right side is not in the left side", which I just realised is a little problematic for two reasons.
One, it requires the reader to read in both directions, versus the JS/C# left-to-right reading approach.
Two, because "the right side" isn't really visually unambiguous.
Consider $a?.b.c.d. Is the "right side" b.c.d or just b? If b is {c:{}}, then will $a?.b.c work? I assume that in this case the right side is just b, but there is this ambiguity there.

How about this alternative:

# fails because Value::Nothing has no data to access
$nothing.foo
# returns Value::Nothing because `?` was used
$nothing?.foo

# fails because `asdf` is not a column on the record
{foo: 'bar'}.asdf
# returns Value::Nothing because `?` was used
{foo: 'bar'}.asdf?

# fails because `asdf` is not a column on the nested record
{foo: {bar: 'baz'} }.foo.asdf
# returns Value::Nothing
{foo: {bar: 'baz'} }.foo.asdf?
# returns 'baz'
{foo: {bar: 'baz'} }.foo?.bar
{foo: {bar: 'baz'} }.foo?.bar?
{foo: {bar: 'baz'} }?.foo?.bar

# fails
{foo: 1} | get .bar
{foo: 1} | get ?.bar
null | get .bar
null | get .bar?
# returns Value::Nothing
{foo: 1} | get ?.bar
{foo: 1} | get .bar?
{foo: 1} | get ?.bar?
null | get ?.bar
null | get ?.bar?

Here, I've split ? away from the ?.. It's almost the same, but the new "terminal ?" makes it fully unambiguous which access is optional.

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 24, 2022

OK, yeah, people on Discord had similar issues with the placing of the ? - I didn't quite understand them at the time but if this many people have expressed concerns there's something wrong with the current approach...

Consider $a?.b.c.d. Is the "right side" b.c.d or just b? If b is {c:{}}, then will $a?.b.c work? I assume that in this case the right side is just b, but there is this ambiguity there.

Correct, the ? only applies to b.

I'm not sure I understand all of your examples. You mention a "terminal ?" but it seems like you're putting it on either side of the path member. For example, for these examples that return Value::Nothing why is the ? sometimes on the left and sometimes on the right?

# returns Value::Nothing because `?` was used
$nothing?.foo
{foo: 'bar'}.asdf?
{foo: 1} | get ?.bar
{foo: 1} | get .bar?

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 24, 2022

My stab at the "terminal ?" approach looks like this:

# fails because Value::Nothing has no data to access
$nothing.foo
# returns Value::Nothing because `?` makes the `foo` path member optional
$nothing.foo?

# fails because `asdf` is not a column on the record
{foo: 'bar'}.asdf
# returns Value::Nothing because `?` makes the `asdf` path member optional
{foo: 'bar'}.asdf?

# fails
{foo: {bar: 'baz'} }.foo.asdf
{foo: {bar: 'baz'} }.asdf.asdf
# returns Value::Nothing
{foo: {bar: 'baz'} }.foo.asdf?
{foo: {bar: 'baz'} }.asdf?.asdf?

# fails
null | get bar
{foo: 1} | get bar
# returns Value::Nothing
null | get bar?
{foo: 1} | get bar?

@webbedspace
Copy link
Copy Markdown
Contributor

webbedspace commented Dec 25, 2022

I'll explain further.

Remember that my ? means "return null if the left side is null or nonexistent".

# "exists" here means "is not null or a hole"

# this means "if $nothing exists then $nothing.foo else null"
$nothing?.foo

# this means "if {foo: 'bar'}.asdf exists then {foo: 'bar'}.asdf else null"
{foo: 'bar'}.asdf? 

 # this means "if {foo: {bar: 'baz'} }.foo.asdf exists then {foo: {bar: 'baz'} }.foo.asdf else null"
{foo: {bar: 'baz'} }.foo.asdf?
# this means "if {foo: {bar: 'baz'} }.foo exists then {foo: {bar: 'baz'} }.foo.bar else null"
{foo: {bar: 'baz'} }.foo?.bar
# this means "if {foo: {bar: 'baz'} }.foo exists and {foo: {bar: 'baz'} }.foo.bar exists then {foo: {bar: 'baz'} }.foo.bar else null"
{foo: {bar: 'baz'} }.foo?.bar?
# this means "if {foo: {bar: 'baz'} } exists and {foo: {bar: 'baz'} }.foo exists and {foo: {bar: 'baz'} }.foo.bar exists then {foo: {bar: 'baz'} }.foo.bar else null"
{foo: {bar: 'baz'} }?.foo?.bar

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 25, 2022

Thanks; I don't find that intuitive right now but maybe it'll grow on me.

I think you accidentally listed {foo: 1} | get ?.bar twice in your first example - could you confirm how that behaves (I'm guessing it fails)?

@webbedspace
Copy link
Copy Markdown
Contributor

webbedspace commented Dec 25, 2022

{foo: 1} | get ?.bar means "if the input (also called $in) exists then $in.bar else null" ($in.bar then fails)

If it helps, recall that this syntax evolved from the C ternary operator: $a?.foo in JS is syntactic sugar for "$a ? $a.foo : undefined"

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 25, 2022

Gotcha. Might want to update your examples:

...
# returns Value::Nothing
{foo: 1} | get ?.bar        <----- think this line is wrong, it fails instead of returning Value::Nothing
{foo: 1} | get .bar?
...

@webbedspace
Copy link
Copy Markdown
Contributor

Ah yes, good point, sorry.

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 25, 2022

Maybe it would help to recap the initial motivation for this: using get and select on lists where records don't all have the same columns.

Current Behaviour

Command Result
[{a: 1, b: 'foo'} {a: 2}] | select b failure
[{a: 1, b: 'foo'} {a: 2}] | get b ['foo', null]
[{a: 1, b: 'foo'} {a: 2}] | get c failure
[{a: 1, b: 'foo'} {a: 2}] | select -i b null
[{a: 1, b: 'foo'} {a: 2}] | get -i c null

Note that:

  1. get and select are inconsistent in how they handle missing data
  2. get only succeeds if it can get a value from at least 1 row; there is no way for a user to say "replace missing data with null, even if all rows are missing data"
  3. -i replaces the entire result with null - not just the missing data1.

Desired Behaviour

It would be a lot simpler if there were only 2 ways to handle a failed cell path access: fail, or replace missing data with null. The default should probably be failure, and users should have an easy way to switch to the more lenient behaviour.

Next Steps

I'm still not sure what the syntax should be. I think I am trying to solve a problem that is slightly different than the one JS solves with ?., and that has been a major point of confusion for everyone. It's currently pretty late on Christmas Eve, I'll come back to this another time.

Footnotes

  1. I think we need to get rid of -i. The biggest reason: -i's behaviour is not compatible with streaming data. You can't change your mind halfway through a ListStream and say "actually I want to return a single null instead of multiple values."

@webbedspace
Copy link
Copy Markdown
Contributor

webbedspace commented Dec 25, 2022

I think I am trying to solve a problem that is slightly different than the one JS solves with ?.

The problems are very similar, however. I don't think the difference is too major.

In JS, 1) access of a missing property produces undefined and 2) access of a property on undefined is an error; ?. is used to avoid this by having a multi-level path short-circuit (so that the erroneous access of a property of a nonexistant property isn't performed).

Nushell is more strict than JS: 1) access of a missing property is an error, and 2) access of a property on null is an error. ?. can be used to make a multi-level path turn to null + short-circuit, and the terminal ? (needed for point 1) can make a single-level path turn to null.

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 26, 2022

I think this is the main thing that bothers me about your proposal:

# returns null
$nothing?.foo
# also returns null
{foo: 'bar'}.asdf?

The position of the ? moves around even though in both cases I want the same behaviour (try to get a field, return null if the access fails).

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 27, 2022

Well, I mean, if you're doing get foo on a list in general, then you should have assurance that it's a proper table, right?

Yeah, that's usually a safe assumption to make - something's probably gone wrong if you have mixed records and null values in the same list.

I'm willing to give your proposal a try, thanks for walking me through it. I'm still not sure I have an intuitive understanding of it, but perhaps attempting to implement it will clear things up.

@webbedspace
Copy link
Copy Markdown
Contributor

webbedspace commented Dec 27, 2022

One other thing I forgot to mention all this time: Nushell already has a notion of a suffixing ? indicating "optional": def a [foo?:int] {} produces an optional argument using ? to denote it. So, this symbology already exists in the language, for whatever that's worth.

@rgwood rgwood marked this pull request as draft December 27, 2022 06:20
@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 27, 2022

I'm going to work on other things for a while; my brief attempt to implement Leon’s approach didn’t go well and I’d like to shift gears for a bit. If anyone else wants to try implementing something like this, feel free.

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Dec 31, 2022

Closing this for now. I believe we still need something like this but I wasn't able to find a syntax that makes everyone happy.

Leon's suggestion might be the way to go but I wasn't able to wrap my head around it.

@rgwood rgwood closed this Dec 31, 2022
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jan 1, 2023

If it were up to me, I'd land this PR even though I don't think the operators are "perfect". I'd also want to keep the get -i and select -i functionality until we are happy with how this works.

If we're not deprecating anything with this PR and we're just adding new optional syntax, I'm not sure why we wouldn't land it and just keep iterating on it. I haven't looked at the details for a while so I may have forgotten something.

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Jan 1, 2023

I think enough people found the initial syntax unintuitive that there's definitely something wrong with it. I'd rather not spend time polishing and documenting this before we find a syntax people are happy with.

If anyone else wants to run with this work, feel free; I've lost the motivation to work on it for now. My initial motivation for working on this was to make it easier for get and select to stream their data, but it's turned into a too-large-for-me yak shave.

@webbedspace
Copy link
Copy Markdown
Contributor

webbedspace commented Jan 1, 2023

I want to take this branch and finish it myself… although I've been thinking a bit more about the semantics (as you might expect).

Something I've been pondering is what ?. in other languages (C sharp, TypeScript) was actually designed to do. In those languages, this syntax only works on explicit null/undefined. Why? Because it's meant to be used with well-defined structure shapes – things where a property name is explicitly marked ? in the interface or w/e. They importantly do not work on holes (missing elements) because holes potentially denote a much more serious problem (that entirely the wrong class or object is being used here).

So, I've been wondering if using this syntax for holes at all is a good idea, given that it goes against this operator's "original" design intent. Maybe a second variant is needed, with a different symbol like &., that only works on holes. But the fact that other languages generally don't have this (JS being an exception due to its hole-access semantics) gives me pause for whether this solves the problem correctly.

I do also think, in this light, that the -i flag is way too coarse-grained a solution for what it does, and really should be replaced with something more specific.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jan 1, 2023

For me, it's always been about the postfix operator meaning the item before it is optional. So, I think syntax should be closer to this.
Both of these have the same result.

[{foo: 'bar'} {}] | get foo?
[{foo: 'bar'} {}].foo?
╭───┬─────╮
│ 0 │ bar │
│ 1 │     │
╰───┴─────╯

Both of these have the same result.

[{foo: 'bar'} {}] | get foo
[{foo: 'bar'} {}].foo
Error: nu::shell::column_not_found (link)

  × Cannot find column
   ╭─[entry #19:1:1]
 1 │ [{foo: 'bar'} {}] | get foo
   · ────────┬────────   ─┬─
   ·         │            ╰── value originates here
   ·         ╰── cannot find column 'Empty cell'

And if you want to keep drilling down into nested structures it would be like:
$var | get foo?.bar?.baz? <-- foo, bar, and baz are optional
$var | get foo?.bar.baz? <-- foo and baz are optional but bar is required
$var | get foo.bar.baz? <-- foo and bar are required but baz is optional
the same with
$var.foo?.bar?.baz?
$var.foo?.bar.baz?
$var.foo.bar.baz?

@webbedspace
Copy link
Copy Markdown
Contributor

webbedspace commented Jan 1, 2023

So, like, do you really think it's OK to have this one cell path modifier glyph produce null on both of these:

  • Holes in structures (missing record key {a:b}.c?, list access-beyond-end [].0?, etc.)
  • The container doesn't support that cell path format ({a:b}.0?, $nothing.foo?, (2).0?)

To me, these are different enough to merit consideration.

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Jan 1, 2023

do you really think it's OK to have this one cell path modifier glyph produce null on both of these

I go back and forth on this. I like the simplicity of having 1 glyph in a known position suppress any error. But I can appreciate that those are meaningfully different types of errors, if we can find an ergonomic way to handle them separately let's do it.

Either way, I think we probably need short-circuiting behaviour (where we stop evaluating a cell path after a failure to access one optional cell path). I initially implemented this without short-circuiting and that was bad in retrospect; it's not very useful if $var | get foo?.bar.baz? succeeds on an optional access to foo? only to fail on an access to bar shortly thereafter.

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Jan 1, 2023

Good comment by @kubouch on Discord:

Since we have try/catch now, I would lean towards ? handling only "holes" in structures. {a:b}.0? and other examples listed in the second case should throw error IMO, which can be caught in try/catch.

Yes, try/catch is more verbose than -i but at least it's obvious what is happening. (I was looking at some Fish scripts the other day and while it's "simple", the amount of flags on everything makes it impossible to figure out what is going on.)

So maybe limiting the ? to only "holes" might be a good starting point. It's also more focused so possibly less unintended side effects.

I think I agree with that.

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Jan 2, 2023

So, I think we want to try the following approach:

  1. Postfix syntax: ? goes after the path member e.g. .foo? instead of ?.foo
  2. ? only suppresses errors from holes, incorrect types still generate an error

@daniel-vainsencher
Copy link
Copy Markdown

That sounds good to me.

Cases other than optionality might be better addressed via a pattern-matching syntax, anything in that vein for nushell?

@rgwood
Copy link
Copy Markdown
Contributor Author

rgwood commented Mar 10, 2023

New PR is up for postfix ?: #8379

rgwood added a commit that referenced this pull request Mar 16, 2023
This is a follow up from #7540.
Please provide feedback if you have the time!

## Summary

This PR lets you use `?` to indicate that a member in a cell path is
optional and Nushell should return `null` if that member cannot be
accessed.

Unlike the previous PR, `?` is now a _postfix_ modifier for cell path
members. A cell path of `.foo?.bar` means that `foo` is optional and
`bar` is not.

`?` does _not_ suppress all errors; it is intended to help in situations
where data has "holes", i.e. the data types are correct but something is
missing. Type mismatches (like trying to do a string path access on a
date) will still fail.

### Record Examples

```bash

{ foo: 123 }.foo # returns 123

{ foo: 123 }.bar # errors
{ foo: 123 }.bar? # returns null

{ foo: 123 } | get bar # errors
{ foo: 123 } | get bar? # returns null

{ foo: 123 }.bar.baz # errors
{ foo: 123 }.bar?.baz # errors because `baz` is not present on the result from `bar?`
{ foo: 123 }.bar.baz? # errors
{ foo: 123 }.bar?.baz? # returns null
```

### List Examples
```
〉[{foo: 1} {foo: 2} {}].foo
Error: nu::shell::column_not_found

  × Cannot find column
   ╭─[entry #30:1:1]
 1 │ [{foo: 1} {foo: 2} {}].foo
   ·                    ─┬  ─┬─
   ·                     │   ╰── cannot find column 'foo'
   ·                     ╰── value originates here
   ╰────
〉[{foo: 1} {foo: 2} {}].foo?
╭───┬───╮
│ 0 │ 1 │
│ 1 │ 2 │
│ 2 │   │
╰───┴───╯
〉[{foo: 1} {foo: 2} {}].foo?.2 | describe
nothing

〉[a b c].4? | describe
nothing

〉[{foo: 1} {foo: 2} {}] | where foo? == 1
╭───┬─────╮
│ # │ foo │
├───┼─────┤
│ 0 │   1 │
╰───┴─────╯
```

# Breaking changes

1. Column names with `?` in them now need to be quoted.
2. The `-i`/`--ignore-errors` flag has been removed from `get` and
`select`
1. After this PR, most `get` error handling can be done with `?` and/or
`try`/`catch`.
4. Cell path accesses like this no longer work without a `?`:
```bash
〉[{a:1 b:2} {a:3}].b.0
2
```
We had some clever code that was able to recognize that since we only
want row `0`, it's OK if other rows are missing column `b`. I removed
that because it's tricky to maintain, and now that query needs to be
written like:


```bash
〉[{a:1 b:2} {a:3}].b?.0
2
```

I think the regression is acceptable for now. I plan to do more work in
the future to enable streaming of cell path accesses, and when that
happens I'll be able to make `.b.0` work again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants