Skip to content

Path Enhancement Project #2 #3219

@kubouch

Description

@kubouch

In the spirit of my previous PRs (#2742 #3123), I propose more changes to the Nushell path system. Some of these ideas come from the discussion with @jonathandturner and @John-Goff on Discord. First, I was considering a draft PR but it might as well span multiple PRs so I decided to create this tracking issue that could be treated as a sort of mini-RFC. Without further ado:

1. Path as structured data (WIP #3256)

Currently, we have a bunch of path subcommands with hard-coded functionality. If we need more complex path manipulations, we need to change the Rust code or use some clumsy workaround. What we could do is to express path as structured data which could potentially allow more flexibility.

Proposed changes

path parse: Break a path into structured data (what would be good fields? home? exists?):

> echo ['/home/viking/spam.txt', 'C:\Users\viking\spam.txt'] | path parse

# prefix  parent           stem  extension
---------------------------------------------
0         /home/viking     spam      txt
1 C:      C:\Users\viking  spam      txt
  • It could expand all tildes and dots but wouldn't make the path absolute (this should be handled by path expand)

path split: Split a path into parts, without the structuring

> echo '/home/viking/spam.txt' | path split

0 /
1 home
2 viking
3 spam.txt

> echo 'C:\Users\viking\spam.txt' | path split

0 C:
1 Users
2 viking
3 spam.txt

path join: Change the current command to be more multifunctional. It would:

  • Convert a table (obtained via path parse or constructed manually) back into a path string.
    • It would look for column names (drive, dirname, etc.) and throw an error if it couldn't find any that would match the path table spec
    • Allow incomplete table (e.g. only filestem and extension columns)
    • Joins with OS-specific separator (\ vs. /). Filename + extension joins with .
  • Join a list of paths/strings back into a path (similar to str collect)
  • Still allow current functionality of appending a path/string passed as argument

Existing path subcommands:

  • Remove extension
  • Remove filestembut we lose the suffix/prefix flags functionality.
  • Keep for now: basename, dirname (Why? Explained later.)
  • Keep expand
  • Keep type and exists or move them to the path table

Limitations

I considered having the parts (output of path split) and basename in the path table. This would allow us not having path split at all and removing path basename. However, it is problematic when you need to replace something — you would need to replace it in multiple places! For example, changing the filestem, you'd need to edit the basename and the last entry of parts as well. Therefore, no duplication is allowed in the output of path parse. This could be resolved having "dependent columns" as introduced in my other issue: #3220

Question

  • Currently, we have both path and string data types in nu. Should we remove the path type once we have the path table and handle path-like strings as regular strings? Or is it possible to set the path table as the path type?
  • How do we handle complex extensions? (e.g., .tar.gz) — this could be a flag

2. (ON HOLD) Platform independence & related fixes

One of the goals of this revamp is to have platform-independent paths. It should be possible on any OS to create a path for another OS.

Proposed changes

  • There could be a flag to the path command which would force the path to follow the target OS conventions (path separator, drive/root and user home folder):
    • echo ~viking/spam.txt | path --windowsC:\Users\viking\spam.txt
    • echo ~viking/spam.txt | path --unix/home/viking/spam.txt
    • Without the flag, path would follow the host OS.
  • The previous point raises questions how to expand drive and user home folder. It could be controlled by directly modifying the path table or the following:
Option Default Cargo.toml Env. var. Flag to path command
drive C: (Windows) or $nothing drive = "..." DRIVE --drive letter:
home folder /home (Linux), /Users (Mac), drive:\Users (Win) home = "..." HOME --home
user derive from current user N/A USER --user
operating system derive from host OS N/A OS --unix or --windows

(Right overwrites left)

  • Uniform behavior of special characters on all OSes. The following should work everywhere:
    • ~, ~user
    • ., .., ... expansion
    • Liberal mixing of \ and /
  • Path separator is always only one (back)slash
    • Translate multi-slashes into only one (e.g., \\ , //, /// or \\/\\//////\ into / on Unix or \ on Windows)

3. Additional features & fixes

Mostly unrelated stuff to the above but good to have IMO

  • Add missing features
    • Replace prefix/suffix
    • Construct relative paths (echo /home/viking/foo.txt | path relative-to /homeviking/foo.txt)
    • Command to query a path separator
  • Fix path expand
    • Currently, path expand does not expand non-existing path which is confusing (e.g., expanding ../existing-folder works properly while ../non-existing-folder just returns the same string)
    • Also, echo .. | path expand just entered an infinite loop on Windows for me, eating all my RAM...
  • Look into related issues

Future outlook

I think the additional verbosity of basename and dirname is a significant drawback, therefore, I would propose keeping them in addition to the new commands. Later, when we have a standard library, the dirname, basename subcommands could be removed and implemented in nu on top of the new commands.

Examples, case studies & edge cases

There is a lot of edge cases to cover. I'm just writing how the new subcommands could be used and trying to break them. It's a bit of a rambling from now on.

Replicating the functionality of current subcommands

Getting a basename (path join would need to be smart enough to join file stem and extension with .). I'm not a big fan of the verbosity but can be easily hidden inside a custom command.

> echo ['/home/viking/spam.txt', 'C:\Users\viking\spam.txt']
| path parse
| select filestem extension
| path join 

0  spam.txt
1  spam.txt

dirname, filestem and extension are direct outputs of parse.

Let's check some dirname flags. How about path dirname -n 3?

> echo '/home/viking/foo/bar/baz/spam.txt'
| path parse
| get dirname
| path split
| drop 3 

0  /home/viking

Even though more verbose, it can be extended to allow more complex dirname manipulations, including replacement (-r flag). For example

> echo '/home/viking/foo/bar/baz/spam.txt' | path parse | update dirname {
  let parts = $(get dirname | path split)
  echo [ $(echo $parts | first) arthur/britons $(echo $parts | last 2) ] | path join
} | path join

0  /home/arthur/britons/baz/spam.txt

This is currently impossible using plain dirname flags. We could use some better mechanism for replacing rows in nu (or I just didn't see it).

Replacing filestem and extension is trivial using the output of path parse. However, path filestem has --prefix and --suffix flags that strip preffix/suffix from the filestem. I believe this could be better implemented by extending the str subcommands since it might be useful for generic strings as well (and filestem is just a string after all).

expand, exists and type could be left as they are. exists and type could be potentially fields of path parse output.

Other examples

The new path join could still accept an argument:

> echo [ home arthur britons ] | path join spam.txt

/home/arthur/britons/spam.txt

Let's check some OS-specific examples. Mixing slashes is fine:

> echo ~arthur/britons\\/spam.txt | path --unix

/home/arthur/britons/spam.txt

Typing --windows every time can be annoying (assuming we're on Mac for example). We could have an env var session instead:

> echo ~arthur | with-env [OS windows] {
  path join | autoview
  ... handling more windows paths
}

C:\Users\arthur
...

How about empty drive? (partial table is fine)

> echo [[drive dirname]; [$nothing usr]] | path --windows

usr  # assume it's just a relative path

How would drive on Unix work?


> echo [[drive dirname]; [$nothing usr]] | path --unix

/usr  # uses empty string as "drive", join with path separator

> echo [[drive dirname]; [C: usr]] | path --unix

# Should throw error

Empty string should be treated as $nothing:

> echo [[drive dirname]; ["" usr]] | path --unix

/usr

Some joining:

> echo [/home/arthur britons] | path --windows join

/home/arthur\britons

This could be fixed by encoding the path again. But we have another problem:

> echo [/home/arthur britons] | with-env [OS windows] { path join | path join }

\home\arthur\britons     # Should it be this?
C:\Users\arthur\britons  # or this?

Should we autodetect the home path without ~? I'm not sure, probably best to keep it simple and stick with the 1st option.

Should non-existing path be file or directory (or error?)

> echo [foo/bar] | path parse

# drive  dirname          filestem  extension
---------------------------------------------
0        foo/bar                                # this?
0        foo              bar                   # or this?
                                                # or throw an error?

Related Issues

#2535 #3143 #3199 #3220 #3329

Related PRs

#2742 #3123 #3201 #3210

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions