direnv icon indicating copy to clipboard operation
direnv copied to clipboard

direnv rewrite?

Open zimbatm opened this issue 5 years ago • 16 comments

Except that then you'd never get it to work properly on Windows, unless you bundled both an msys and cygwin version. And you'd still have to solve the cross-system environment munging problem: PATH on cygwin, and all env vars containing anything resembling paths on msys/git bash. (Assuming you could link msys or cygwin to Go code at all, which I have yet to find a single example on the internet of anybody successfully doing.)

The status of my previous experiment in getting direnv to work on Windows was that the "pass an extra argument" trick will get it to basically work on cygwin with some adjustment to the tests (and an inability to handle empty strings), but that on msys/git bash that trick breaks horribly. AFAICT, the only way to get uncorrupted vars on msys (and empty vars on Windows, period) is to pipe the cygwin/msys environment into direnv. That is, direnv can never trust the contents of the actual process environment outside of its own (non-empty, and obfuscated so msys doesn't munge them) state variables.

Basically you need the shell-side hook to be something like

  • call some direnv command that's the first half of export, to see if the dir or watchlist indicate a change
  • run, say, bash -c 'declare -x' | direnv export-part-2 myshelltype to run the second half of export, parsing the bash declarations to get a valid environment

I mean, technically you could do that by having direnv export simply output that second command when it detects a change, but you still have to have the parent shell output the enviroment or it's no good.

That realization is what prompted the, "well hell, might as well just write it in bash to begin with, then it's one executable, portable everywhere, and a lot fewer process swaps, no Go API worries, etc.". Plus, direnv is already 25% bash by volume. :wink: (That's line count; it's >30% by character count.)

But if you prefer to use the more convoluted way to get windows to work, or to give up on windows altogether, those are certainly options that exist. I'm just looking at it from the POV of, "what value is Go adding here?" And AFAICT the answer is, "not much". And for Windows support, it's an active detriment.

There is apparently an msys go compiler based on gccgo, but it doesn't do cygwin. I also don't know if said compiler actually produces msys-linked programs, or if it's just meant to build mingw programs (which aren't the same thing). That is, if it builds Go programs that think their target runtime is Windows, then it's useless. And even if it didn't, then cygwin would still not be doable.

Originally posted by @pjeby in https://github.com/direnv/direnv/pull/636#issuecomment-627482713

zimbatm avatar May 15 '20 15:05 zimbatm

If dropping cygwin allows direnv to work on Windows, then I would be inclined to drop cygwin. Correct me if I am wrong but it seems like cygwin is quite legacy at this point and msys is the new "linux-in-windows" environment?

zimbatm avatar May 15 '20 16:05 zimbatm

What? I don't know if that's true, but I do use cygwin as my main shell on Windows, not msys. (Maybe you're thinking of WSL? IIUC, WSL is actually Linux with a Windows filesystem mapped into it, and would use the linux version of direnv.)

The difference between cygwin and msys is that cygwin supports running posix-y programs in a posix-y environment on Windows, so you can do posix-y things. msys, on the other hand, is more for you to run posixy-programs to do Windows-y things. :wink:. It tries to provide a minimal posix environment, and is optimized towards being able to run Windows programs within that environment... hence the variable and command-line munging that causes problems for direnv.

As far as msys goes, I really only bother having git bash (a very minimalist msys installation) around to test things when I want to write portable bash, to be sure how it differs in platform-specific behavior of edge cases (things like command substitution, piping from strings, etc.) from cygwin.

In any case it's moot, since cygwin is actually pretty doable; it's msys that's the pain in the assets. In truth, on cygwin I can get stock direnv to work by hacking the bash hook and .direnvrc to pass PATH in another environment variable, because cygwin only corrupts PATH.

msys, on the other hand, corrupts every single environment variable it can get its grubby little paws on, and command-line arguments as well. I actually got my cygwin port to run most of the test suite without issue, even run fish. msys was dead in the water from the starting gate because it can't use the command line as a way to bypass the environment munging.

In summary:

  • cygwin: munges PATH and drops empty vars, can be mostly worked around using another env var or the command line
  • msys: munges every environment variable and the command line; only workaround is piping the entire environment, or obfuscating it within an env variable or command line argument

The one positive that msys has over cygwin (sorta) is that its munging of env vars and command line arguments ensures that DIRENV_DUMP_FILE_PATH is a Windows path for direnv to use instead of a posix path. (But I'd already worked out how to work around that when I was doing testing on cygwin, so it's not exactly a major coup.)

So, be it cygwin or msys, a fully functional direnv port on Windows must have some out-of-band method for communicating the environment to direnv, if direnv itself is not a native posix app for the given environment. The cygwin and msys DLLs take care of forwarding the "real" (i.e. posix) environment and command line to binaries they run, but not for foreign Windows apps (e.g. anything written in Go).

Now, for cygwin, the "out of band" communication was to add an extra argument to export and dump, that carried the real PATH. This worked out pretty decently, but I didn't try it on msys until I had most of the tests running on cygwin and didn't realize just how much more weird msys is. For msys, the command line arguments are not "out of band", and all environment variables are translated, not just the PATH. That basically leaves piping and obfuscation as the only out-of-band communication methods available.

So you can see why my conclusion at that point was "see if there's a way to do fast timestamp checking in bash". :wink: Because writing code to obsfuscate and/or pipe env vars from the parent shell to another program so it can pipe them back to bash (note that the empty vars and translation problems go both ways here!) so that it can then obfuscate or pipe them again to direnv dump, and thence back to export, seemed to make a lot less sense than translating the Go bits to bash, so that none of that back-and-forth has to happen in the first place.

Fundamentally, the issue is that Go builds Windows programs, which are useless for any environment that you'd want to use direnv in on Windows. (Except WSL, which is actually Linux, so a Linux build of direnv is fine... but then that's not really supporting Windows, is it? :wink: And IIUC, WSL is not an option if you want to run any other kind of VM on your Windows box.)

I guess to sum up: cygwin support on Windows is easy and mostly written, if you ignore the "direnv will unset your empty variables when it reverts them if they change" problem. (And you can use stock direnv if you hack the hook and .direnvrc to forward the real PATH to the .envrc and pull it back out again.) msys is sorta working with stock direnv, as long as you don't mind having Windows paths in your msys environment. But you will definitely mess up anything that relies on manipulating posix paths in the environment there. (Unlike cygwin, which will only screw up your PATH unless you do the workarounds.)

So yeah.... all that work I was doing on cygwin porting? That was easy compared to what msys would need to be fully functional, in the sense of allowing the .envrc to run with the proper environment.

Theoretically, you could make a launcher program in C and build it for cygwin and msys, to do the translation. Maybe build direnv as a dll using cgo, with a main function taking a command line and environment variables in a format easy to create from C. Then the front end would load direnv.dll and go from there. But in that case, the Go dll would need to not use any Go stdlib functions related to the OS or paths, and instead delegate those to functions exposed by the launcher. (Ideally, the launcher be the thing to invoke bash, so that the environment variables passed that direction would be sane as well, and I/O redirection could work.)

Researching that angle, I found these potentially useful tidbits:

  • https://stackoverflow.com/questions/40573401/building-a-dll-with-go-1-7
  • http://blog.ralch.com/tutorial/golang-sharing-libraries/
  • https://github.com/golang/go/issues/11058
  • https://github.com/golang/go/issues/11100
  • https://github.com/glycerine/guestdll

Some of these include code samples of at least loading a Go DLL from a non-Go program, albeit Windows rather than Cygwin or msys. (But some are for linking with a main program built with gcc, which is used by both msys and Cygwin, so that's promising). Also, the code looks similar to the Cygwin code for loading a Windows DLL generally, so there's that. (i.e. msys is likely similar.)

Another possible alternative to piping is to use a special environment variable, e.g DIRENV_REAL_ENVIRONMENT=$(declare -x) direnv export bash. I tested this with msys and msys does not corrupt the variables embedded within DIRENV_REAL_ENVIRONMENT, but of course it would have to be parsed, and it also slows things down with an extra fork and subshell on every prompt. This could be used with dump as well, and within the stdlib (i.e., export would pass it on to the stdlib, after reverting the previous changes to it).

There are a lot of open issues and potential gotchas with that approach, especially in the "passing stuff to stdlib/.envrc" part, though, which is why I went "screw this, why not just do it in bash and make all these problems go away, and have only one executable to distribute everywhere?"

But I get that you might not be comfortable with that, so it might make more sense to fork or create a bash alternative, though I'd really like to keep a common file format and API as much as possible if that did happen. Right now I'm mostly getting by with my bash hacks for cygwin to make direnv work for the one thing I actually need it for right now (to gimme up the Go environment, ironically!), so I don't feel a strong need to put a lot of effort into any of these porting strategies right away. (IOW, I'm not in any rush for any of these, so don't feel rushed to decide or do anything about any of these, on my account at least.)

pjeby avatar May 15 '20 18:05 pjeby

After a bit of fiddling, I got one of those examples to build and link a C main program to a go library under msys... and it built a plain Windows executable. I tried linking it with -lmsys-2.0, and got it linked against the DLL... whereupon it promptly segfaulted before doing anything.

It appears the Go side wants to load msvcrt as its C runtime, which probably conflicts with msys posix emulation runtime. It also links to various OS-level Windows DLLs.

So, that route doesn't seem like it would be a quick win, if it can be done at all.

pjeby avatar May 15 '20 20:05 pjeby

Oops, looks like the reason it was segfaulting was because I was building it with the mingw toolchain, not the msys toolchain. So it is possible to build a windows DLL version of a go library, and build an msys C main that links to it. The C main sees correct env vars, the Go library sees a minimal Windows environment.

In principle, the same should be possible for Cygwin, but I haven't yet managed to set up a 64-bit cygwin gcc to try it with.

Attempting to link statically instead of having a separate DLL doesn't work, btw (or at least I haven't figured out how), so it'd be two files to install.

Also, the achilles heel to this approach is that the launcher has to be built for the exact runtime environment it's used in: when I tried running the msys-built executable under git bash, it wound up with a weird environment that didn't much resemble the original except for PATH. Running it under Cygwin, it still dropped empty environment vars and mangled PATH.

So, even if you go this route, I'm not sure how many platform-specific .exe's would be required to go with the (hopefully common) golang .dll. (And just to make things extra fun, the golang.dll has to be built under the mingw64 environment, but the main program has to be built from the msys2 environment. Fun times!)

pjeby avatar May 15 '20 20:05 pjeby

Oh, also, the golang DLL I built did nothing but output the environment as seen by Go. Dunno if anything more complicated (e.g. all of direnv) would work.

pjeby avatar May 15 '20 21:05 pjeby

Thanks @pjeby, for taking the time to explore and expand on the Windows environments. I think that I have finally caught up with your explanations and understanding of the problem.

To summarize the current situation:

  • cygwin: munges the PATH
  • msys2: munges all env vars and arguments
  • WSL: all good since it's just Linux, and the user should install direnv as a Linux binary.

And this happens because direnv is currently compiled as a Windows executable so the compatibility layer kicks-in. If there were cygwin and msys2 editions of direnv then this issue wouldn't happen correct?

Another consideration we didn't talk about is whenever PowerShell support is desirable. In that case, I believe that the windows version of direnv would be fine?

zimbatm avatar May 16 '20 10:05 zimbatm

Well, not unless you had the ability to write an .envrc in Powershell (or maybe Elvish?) because otherwise the bash executable you run to parse it is still going to have to be built with msys or cygwin, so the same problems will exist at the direnv<->bash interface, even if the powershell<->direnv part is fine.

Also, the cygwin and msys2 editions of direnv would have to each come in at least 32 and 64 bit versions, each. And cygwin also drops empty environment variables when it runs non-cygwin programs. But yes, native cygwin and msys direnv programs would work. Basically, if direnv were written in C or some other language supported by gcc (other than Go), then it could be compiled for each runtime, and would be seen as native by that runtime, and so would see the same environment vars and command line arguments as its parent or child shells.

I'm planning to play around a bit more with my sketches of a pure-bash direnv; last night I wrote and tested a 7-line bash 3.2 function that parses the output of declare -x into a pseudo-array for purpose of diffing environment variables, that should work with my existing "diff two env pseudo-arrays" and "generate patches from pseudo-arrays by calling a specified function" functions. I'm getting a progressively clearer picture of how the core export command would work, in the second half of the algorithm. (i.e., after the fast path where the target .envrc is identified and the timestamps are validated.)

It'd probably look something like:

# Save the current enviroment
__env_current=$(declare -x)
if [[ $DIRENV_DIFF ]]; then
  # restore the old environment
  env::parse "" "$(echo "$DIRENV_DIFF"|base64 -d|gunzip)"
  env::patch __env_new_ __env_old_ env::apply
fi

# Unset existing watches and remove the timestamp files;
# new ones will be added to the environment in the subshell
watches::clear
# TODO should add a watch on `allow` source here

# This would actually put the declare -x in a trap, but basically this:
__env_new=$(exec 3>&1 1>&2; load_envrc "$ENVRC_TO_LOAD"; declare -x >&3)

# Run a diff from current to new 
unset "${!__env_new_[@]}" "${!__env_old_[@]}"
env::parse __env_old_ "$__env_current"
env::parse __env_new_ "$__env_new"
env::diff __env_old_ __env_new_

# Inject meta stuff into the outgoing patch
__env_new_DIRENV_DIFF=$(declare -p "${!__env_new_[@]}" "${!__env_old_[@]}"|gzip|base64)

# Output it in a form the calling shell can handle
env::patch __env_old_ __env_new_ env::export-${shell_type}

Right now the main things lacking a draft are env::export-fish and env::export-tcsh, and making sure that the diff function excludes the right things (e.g. PWD, OLDPWD, PS1, direnv stuff, etc.). I also haven't done anything about allow and checking to see if an envrc is allowed.

There's also not a lot of error checking yet; if for example the DIRENV_DIFF is corrupt then the current draft of env::parse would just silently stop parsing it at the point of error. But this is basically what direnv export's second half would look like, and it's not that much longer or complex than the Go version. (In some ways, it's simpler, since rc.Load is a single line here, with no need for DIRENV_BASH or special setup for the environment to pass in. And direnv dump is basically declare -x.)

(Not that this obligates you to anything, just trying to show that a bash version would be practical and in many areas a lot simpler and considerably more compact than the Go version.)

pjeby avatar May 16 '20 17:05 pjeby

I don't want to hijack this thread, and I understand the difficulties of porting direnv to msys/cygwin, but as a long-time direnv user I wonder if anyone has found a direnv alternative (even a simple/bad one, even zsh-only or something) that works on msys/cygwin? I have to do some msys stuff and I miss direnv! (Feel free to delete this comment if it's considered off-topic.)

garyo avatar Sep 20 '21 17:09 garyo

I'm currently using direnv on cygwin, with a patch to work around path issues, based on #630 with some additional changes I never pushed. I still want to write a pure bash version, but it's a "someday" sort of project.

Before I created my patches, though, I had an alternative workaround which is described here: https://github.com/direnv/direnv/issues/594#issuecomment-624373690

Specifically, the two code blocks to be added to .bashrc and .config/direnv/lib were, I believe, sufficient to use as a day-to-day workaround for the issues, but it's been over a year so my memory may be a bit hazy.

pjeby avatar Sep 20 '21 17:09 pjeby

Another angle I've been exploring is to use https://github.com/mvdan/sh as a replacement for the bash interpreter. That way direnv would be freed from its dependency on bash, and have fewer path-conversion issues. There is a long-running "gosh" branch if you want to take a look.

zimbatm avatar Dec 24 '21 13:12 zimbatm

I wonder if the effort with gosh ended up anywhere. I'm thinking about cross-platform Direnv that would work on Windows (natively, in CMD and/or PowerShell) nicely every single day :)

burdiyan avatar Dec 02 '23 22:12 burdiyan

If I recall correctly, the problem was that cygwin/msys/etc. environment variables have translation issues going to a pure Windows Go app, which means the approach of using a shell interpreter written in Go doesn't address the issues in any way.

It's true that if you only care about Windows native shells, this isn't an issue, but for posix-ish environments on Windows rewriting in gosh won't help anything, if I recall correctly.

pjeby avatar Dec 03 '23 02:12 pjeby

Isn't this something that could be fixed in the embedded shell interpreter itself?

burdiyan avatar Dec 03 '23 15:12 burdiyan

Not really, no. Not the issues I'm talking about, anyway. And even if you use a shell written in Go, it's still presumably written for Posix paths, not Windows paths -- and then your .envrc files aren't portable any more. If you want an equivalent for Cmd/Powershell, you probably need to write something in Powershell.

pjeby avatar Dec 04 '23 08:12 pjeby

It would remove one layer of translation making this much more manageable. But somebody has to finish the gosh branch first.

zimbatm avatar Dec 04 '23 13:12 zimbatm

Is there any writeup on what's left for the gosh branch to be finished? Or the take aways and progress so far?

burdiyan avatar Dec 04 '23 15:12 burdiyan