Skip to content

Path Enhancement Project #2: parse, join and split#3256

Merged
sophiajt merged 27 commits intonushell:mainfrom
kubouch:path-parse
Apr 20, 2021
Merged

Path Enhancement Project #2: parse, join and split#3256
sophiajt merged 27 commits intonushell:mainfrom
kubouch:path-parse

Conversation

@kubouch
Copy link
Copy Markdown
Contributor

@kubouch kubouch commented Apr 3, 2021

First PR about representing paths as structured data. For more broad picture, motivation and examples, see #3219.

List of changes:

  • : Add path parse: Splits a path into a table with parent, stem and extension. On Windows, there is an additional prefix column (see Rust's Prefix). On non-Windows, the prefix is always empty, so it's pointless to have the column.
  • : Remove filestem and extension subcommands as these can be retrieved from the table
  • : Add option to easily extract complex extensions (e.g., .tar.gz) or ignore the extension (e.g., conf.d directory)
  • : Modify path join command to convert the path table back into a full path (i.e., reverse the effect of path parse)
  • : Modify remaining path subcommands to accept rows with parent, stem or extension columns.
  • : Add path split command that splits path by separator

To fix:

  • : Broken column path with named flags (passing column path causes Expected Tagged, found column path; solution: rest arg needs to be always first)
  • : Inconsistent split behavior with and without column path (ls | path split name | get name vs ls | get name | path split)
  • : Windows bugs
  • : path join panic when there is empty stream

We lose the following functionality:

  • prefix/suffix flags of path filestem

Additional TODO list:

  • : craft error messages
  • : usages
  • : examples (especially of the removed features, such as changing extension)
  • : tests
  • : rebase
  • : clippy
  • : fmt

Comments:

  • Misplaced errors (echo home/viking/spam.txt | path parse | update extension 0.9 | path join)
    • should be probably treated more generally
  • Hand-waving / vs \ in Windows tests (Windows seems to accept both \ and / as a separator)

@andrasio
Copy link
Copy Markdown
Member

andrasio commented Apr 3, 2021

@kubouch Left a comment.

@kubouch kubouch changed the title Path Enhancement Project #2: path parse & path encode Path Enhancement Project #2: parse, join and split Apr 4, 2021
@kubouch kubouch force-pushed the path-parse branch 3 times, most recently from a056e2a to f9762bf Compare April 5, 2021 15:15
@kubouch
Copy link
Copy Markdown
Contributor Author

kubouch commented Apr 5, 2021

All the basic commands are now done. I made path join to be implicit, so now all path subcommands understand the output of path parse and path split. It's pretty cool since the path join is now basically a noop :-D (it still has the append feature).

The output of path split is a table which you need to flatten to see the components (for example echo /home/viking/spam.txt | path split outputs [table 4 rows]). It seems weird at first, but I found it to be the most consistent solution. Now, the path subcommands always operate on every value of the input stream. If path split unrolled the table and we wanted to path join them back, path join would be in a conflict: should it collect the table values or apply path join on each value as the rest of the subcommands? There might be a way around this, I just couldn't find it right now.

There is one issue that needs to be solved:

❯ ls | first 2 | path expand name | get name | path split
   0   [table 7 rows]
   1   [table 7 rows]
   
❯ ls | first 2 | path expand name | path split name | get name
    0   /
    1   home
    2   kubouch
    3   git
    4   forks
    5   nushell
    6   CODE_OF_CONDUCT.md
    7   /
    8   home
    9   kubouch
   10   git
   11   forks
   12   nushell
   13   CONTRIBUTING.md

Seems like using column path somehow steps into the resulting table. Or the get recursively steps into into the tables. However, it should ideally output the same result as in the first case.

Still needs some polish: good error messages, example, tests, etc.

@kubouch
Copy link
Copy Markdown
Contributor Author

kubouch commented Apr 10, 2021

I modified the path join and path split commands to behave more like str collect and slpit row:

  • path join will attempt to join the input list instead of requiring a table
  • path split will directly output an output stream of split parts instead of wrapping it into a table

This should match better what a user would expect, IMO. The above applies when not operating by column path. Swapping table values by column path is still supported the same way as before.

Here are some examples:

❯ def ls3 [ ] { ls | first 3 | path expand name }

❯ ls3
  #                          name                          type     size       modified
───────────────────────────────────────────────────────────────────────────────────────────
  0   /home/kubouch/git/forks/nushell/CODE_OF_CONDUCT.md   File     3.4 KB   6 months ago
  1   /home/kubouch/git/forks/nushell/CONTRIBUTING.md      File     1.7 KB   1 week ago
  2   /home/kubouch/git/forks/nushell/Cargo.lock           File   164.7 KB   3 days ago

❯ ls3 | path parse name
  #              name               type     size       modified
────────────────────────────────────────────────────────────────────
  0   [row parent stem extension]   File     3.4 KB   6 months ago
  1   [row parent stem extension]   File     1.7 KB   1 week ago
  2   [row parent stem extension]   File   164.7 KB   3 days ago

❯ ls3 | path parse name | get name
  #               parent                     stem         extension
─────────────────────────────────────────────────────────────────────
  0   /home/kubouch/git/forks/nushell   CODE_OF_CONDUCT   md
  1   /home/kubouch/git/forks/nushell   CONTRIBUTING      md
  2   /home/kubouch/git/forks/nushell   Cargo             lock

❯ ls3 | path split name
  #        name        type     size       modified
───────────────────────────────────────────────────────
  0   [table 7 rows]   File     3.4 KB   6 months ago
  1   [table 7 rows]   File     1.7 KB   1 week ago
  2   [table 7 rows]   File   164.7 KB   3 days ago

❯ ls3 | path split name | get name
   0   /
   1   home
   2   kubouch
   3   git
   4   forks
   5   nushell
   6   CODE_OF_CONDUCT.md
   7   /
   8   home
   9   kubouch
  10   git
  11   forks
  12   nushell
  13   CONTRIBUTING.md
  14   /
  15   home
  16   kubouch
  17   git
  18   forks
  19   nushell
  20   Cargo.lock

❯ ls3 | path split name | each { echo $it.name | path join }
  0   /home/kubouch/git/forks/nushell/CODE_OF_CONDUCT.md
  1   /home/kubouch/git/forks/nushell/CONTRIBUTING.md
  2   /home/kubouch/git/forks/nushell/Cargo.lock

❯ ls3 | get name | path parse | select stem extension | path join
  0   CODE_OF_CONDUCT.md
  1   CONTRIBUTING.md
  2   Cargo.lock

❯ ls3 | get name | path parse | update parent { path split | first 3 | path join } | path join
  0   /home/kubouch/CODE_OF_CONDUCT.md
  1   /home/kubouch/CONTRIBUTING.md
  2   /home/kubouch/Cargo.lock

So far, I like the result. It still needs some refactoring and there might still be some edge cases, but I would propose this functionality as final.

@kubouch kubouch force-pushed the path-parse branch 2 times, most recently from db543ad to 960819d Compare April 11, 2021 14:13
kubouch added 17 commits April 17, 2021 02:06
This includes a slight refactor to all the path subcommand `action()`
functions.
Structured path is implicitly joined at every patch subcommand call.
When the input is just a list of values, the `path split` command will
split each value directly into the output stream, similar to
`split-row`. Column path--specified values are still wrapped into a
table so they can still be used to replace table fields.
When `path join` encounters a list of values, it attempts to join them,
instead of going one-by-one like the rest of the path commands. You can
still `each { echo $it | path join }` to join them one-by-one, if the
values are, e.g., tables.

Now, the behavior of `path split` and `path join` should match the
`split-row` and `str collect` counterparts and should hopefully align
better with user's expectations.
Also, doesn't collect input into vector unnecessarily.
kubouch added 8 commits April 17, 2021 02:07
* OsStr -> String encoding is now lossy, instead of throwing an error
* The consequence is action() now always returns Value instead of Result
* Removed redundant handle_value() call in `path join`
* Fix possible incorrect error detection in `path split`
* Applied rustfmt + clippy
The 'parent' column was required to be a path but didn't work with
string.
Reducing code repetition
@kubouch kubouch marked this pull request as ready for review April 17, 2021 00:25
@sophiajt
Copy link
Copy Markdown
Contributor

Lots of cool stuff in this one!

@sophiajt sophiajt merged commit 3b2ed76 into nushell:main Apr 20, 2021
@kubouch kubouch deleted the path-parse branch April 26, 2021 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants