Skip to content

Improve CellPath display output#14197

Merged
fdncred merged 2 commits intonushell:mainfrom
aionescu:cell-path-display
Nov 2, 2024
Merged

Improve CellPath display output#14197
fdncred merged 2 commits intonushell:mainfrom
aionescu:cell-path-display

Conversation

@aionescu
Copy link
Copy Markdown
Contributor

@aionescu aionescu commented Oct 29, 2024

Description

Fixes: #13362

This PR fixes the Display impl for CellPath, as laid out in #13362 and #14090:

> $.0."0"
$.0."0"

> $."foo.bar".baz
$."foo.bar".baz

User-Facing Changes

Cell-paths are now printed using the same $. notation that is used to create them, and ambiguous column names are properly quoted.

Tests + Formatting

After Submitting

@aionescu
Copy link
Copy Markdown
Contributor Author

aionescu commented Oct 29, 2024

Started this as a draft because there's some details to iron out (CC @fdncred @sholderbach):

  • This patch breaks the output of select, even in the most basic cases like select x. The output column is now named $.x (as a string).
  • This patch also breaks flatten for some reason (hence the failing tests): It now just returns the input table unmodified. I haven't properly investigated yet, but at first glance flatten pops a Vec<CellPath> of the call-stack, which looks like the culprit.
    • flatten's signature says it accepts string rest parameters, not cell-paths. Also, calling it with more than one column name results in an error, so maybe its signature should be changed to a single optional string parameter?

@sholderbach
Copy link
Copy Markdown
Member

  • This patch breaks the output of select, even in the most basic cases like select x. The output column is now named $.x (as a string).

I think we should have a separate function generating the select column name. So without the $. or ? optional annotation as that is indicating something about the gathering operation and not really about the location specified by the path.

  • This patch also breaks flatten for some reason (hence the failing tests): It now just returns the input table unmodified. I haven't properly investigated yet, but at first glance flatten pops a Vec<CellPath> of the call-stack, which looks like the culprit.

Here's the line where it naively tries to turn a CellPath into a string assuming that would yield a column name

let column_requested = columns.iter().find(|c| c.to_string() == column);

  • flatten's signature says it accepts string rest parameters, not cell-paths. Also, calling it with more than one column name results in an error, so maybe its signature should be changed to a single optional string parameter?

Yeah but we have a silent conversion to CellPath available for CallExt to use. (and if taken from a value that is not parse time checked it wouldn't get caught by the SyntaxShape matching anyway)

impl FromValue for CellPath {
fn from_value(v: Value) -> Result<Self, ShellError> {
let span = v.span();
match v {
Value::CellPath { val, .. } => Ok(val),
Value::String { val, .. } => Ok(CellPath {
members: vec![PathMember::String {
val,
span,
optional: false,
}],
}),
Value::Int { val, .. } => {
if val.is_negative() {

I am suspicious of those coercions #14147

@aionescu
Copy link
Copy Markdown
Contributor Author

aionescu commented Oct 29, 2024

I think we should have a separate function generating the select column name. So without the $. or ? optional annotation as that is indicating something about the gathering operation and not really about the location specified by the path.

Good point. I added a new method CellPath::to_column_name which has the same behavior as CellPath's old Display (no $., no ?, no quoting), and replaced all calls to CellPath::to_string in select with to_column_name.

let column_requested = columns.iter().find(|c| c.to_string() == column);

Yeah, that looks naive indeed :) So basically flatten takes a string, coerces it to a CellPath, and then converts it back to a string?

For now, I just replaced that to_string with to_column_name to make the tests pass, but it feels like a hack.

Also, a side-effect of this PR is that more strings are quoted by to nuon when inside structures, because the same needs_quotes is used everywhere:

> [...] | to nuon # old
[...]

> [...] | to nuon # new
["..."]

> "..." | to nuon # unchanged, outside structures it was quoted anyway
"..."

I personally like this (as it's more consistent between inside & outside structures), but I'm not sure if that's the intended output of to nuon (should it only quote when absolutely necessary?)

@aionescu aionescu marked this pull request as ready for review October 29, 2024 22:56
Copy link
Copy Markdown
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing that!

Not sure how the internals of flatten actually behave without digging into it.

Some folks want(ed) to make NUON way more parsimonous with characters than it is sane to read ^^

I personally would be fine with us disallowing the . for the existence of ranges and the spread operator alone.

s += &val.to_string();
}
PathMember::String { val, .. } => {
s += val;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we still need quotes around number-like columns otherwise they will become indistinguishable from rows.

Copy link
Copy Markdown
Contributor Author

@aionescu aionescu Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but is that a problem in this situation? Since all the column names get crammed into a string anyway, I think it's easier for users to not quote indexes, e.g. get 'a."0".x' vs get 'a.0.x'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0- PathMember::Int -> row
"0" - PathMember::String -> column

It would be impossible to determine the meaning of something like

let confusing = [{"0": bla, "1": blub}, {"0": foo, "1": bar}]
let cellp = $.0."1"

print $cellp
$confusing | select $cellp

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example doesn't work (for unrelated reasons: select with a column after a row fails with Select only allows row numbers for rows), but I see your point.

We could fix this in a separate PR, but it would be a breaking change (similar to adding ?, cf. #14233), so maybe @fdncred should chime in.

Copy link
Copy Markdown
Contributor

@fdncred fdncred Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following? Rows should be ints and Columns should be strings. Are we talking about changing that? If so, in what way?

In the example above $cellp should return blub, correct? Not sure why it doesn't.

Seems like $confusing | get $cellp works but $confusing | select $cellp doesn't. That should be fixed IMO.

Copy link
Copy Markdown
Contributor Author

@aionescu aionescu Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like $confusing | get $cellp works but $confusing | select $cellp doesn't.

That's because select doesn't let you mix rows and columns in a single cell-path (see this check).

And I think this actually makes the point raised by @sholderbach not a problem, because the column names produced by select will never contain row indices. e.g. if select produces a column a.0.b, that 0 came from a pre-existing column name, so there's no point quoting it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my opinion, but I like the way get works, as far as following cell paths, better than how select works right now. I like the flexibility of being able to do get 0.a or get a.0 and getting the same result.


for (column_index, (column, value)) in val.into_owned().into_iter().enumerate() {
let column_requested = columns.iter().find(|c| c.to_string() == column);
let column_requested = columns.iter().find(|c| c.to_column_name() == column);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is flatten accepting a nested cell path semantically or only a single column level?

Copy link
Copy Markdown
Contributor Author

@aionescu aionescu Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only accepts a single (string) column name, and trying to pass it a cell-path results in an error. (Sorry, I was thinking of select).

Yes, flatten accepts a nested cell-path semantically, but as a string (e.g. "a.b" not $.a.b).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, otherwise we could have moved away from using CellPath internally to something more explicit as just a string.

run_test(
r#"let a = 4; [... $a ... [1] ... (5) ...bare ...] | to nuon"#,
"[..., 4, ..., [1], ..., 5, ...bare, ...]",
r#"["...", 4, "...", [1], "...", 5, "...bare", "..."]"#,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That run_test kind of escapes the meaning of that old test and is more one for the nuon sanity checks right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Nov 2, 2024

It would be nice to land this soon, if it's ready, since we have a breaking change in #14042 with a revert to unbreak it in #14233.

@aionescu
Copy link
Copy Markdown
Contributor Author

aionescu commented Nov 2, 2024

IMO it's ready, but we should wait for @sholderbach's opinion.

@sholderbach
Copy link
Copy Markdown
Member

We can land this and rework the last bug source #14197 (comment) later if we don't want to entertain the breakage during development.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Nov 2, 2024

ok, let's do that. i'll land this and close my revert. thanks!

@fdncred fdncred merged commit 1c3ff17 into nushell:main Nov 2, 2024
@github-actions github-actions bot added this to the v0.100.0 milestone Nov 2, 2024
@aionescu aionescu deleted the cell-path-display branch November 2, 2024 15:37
@amtoine amtoine added the notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes label Nov 20, 2024
@amtoine
Copy link
Copy Markdown
Member

amtoine commented Nov 20, 2024

marking this as a "breaking change" as it broke the CI of Nupm

@aionescu
Copy link
Copy Markdown
Contributor Author

@amtoine
How difficult is the fix?
I'd rather not revert this (as my scripts now rely on the new behavior)

@amtoine
Copy link
Copy Markdown
Member

amtoine commented Nov 20, 2024

i have been distracted by some other stuff tonight but a fix for Nupm is in preparation, hopefully it's not too bad, should just be a matter of getting the new hashes 😉

and it was not to suggest reverting this PR, no worries 😋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cell path display output doesn't escape/quote properly

4 participants