Skip to content

Improved prompt detection and passthrough #7

@bitprophet

Description

@bitprophet

Description

Pre-intro

Apologies for ticket length; the issue at hand is not simple and has many overlapping factors/considerations. Consider skipping down to the bottom of the description, where there is a concise summary that should function as a tl;dr.

Intro

This ticket used to be partly about prompt detection. We're now of the opinion that detecting prompts beforehand (in order to know when to present users with a Python-level prompt) will alway be painful and will never cover 100% of possible use cases. Instead, we feel that actual live interaction with the remote end (i.e. sending local stdin to the other side) will not only sidestep this problem, but be more useful and more in line with user expectations. See #177 for more on the "expect" approach.

The "live" approach itself has shortcomings, but none significantly worse than manually invoking ssh by hand, and anything in this space is certainly better than the "nothing" we have now.

Investigation into SSH and terminal behavior

Mostly because we can't really hope to offer "better" behavior than vanilla
ssh does. Plus this presents a learning opportunity -- all of the below
behaviors are reflected in Paramiko itself, as one might expect.

There are basically two issues at stake when performing fully interactive command line calls remotely: the mixing of stdout and stderr, and how stdin is echoed.

Stdout/stderr

Stdout and stderr mixing were tested with the following program (which prints 0
through 9 alternating to stdout and stderr, unbuffered).

#!/usr/bin/env python

import sys
from itertools import izip, cycle

for pipe, num in izip(cycle([sys.stdout, sys.stderr]), range(10)):
    pipe.write("%s\n" % num)
    pipe.flush()

No pty

When invoked normally (without -t) ssh appears to separate stdout
and stderr on at least a line-by-line basis, if not moreso, insofar as we see all of stdout first, and then stderr. Printed normally:

$ ssh localhost "~/test.py"
0
2
4
6
8
1
3
5
7
9

With streams separated for examination:

$ ssh localhost "~/test.py" >out 2>err
$ cat out
0
2
4
6
8
$ cat err
1
3
5
7
9

Thus, pty-less SSH is going to look a bit different than the same program interacted with locally.

With pty

When invoked with a pty, we get the expected result of the numbers being in
order, but the streams are now combined together before we get to them (since
all we get is the output from the pseudo-terminal device on the remote end,
just as if we were reading a real terminal window). Printed normally:

$ ssh localhost -t "~/test.py"
0
1
2
3
4
5
6
7
8
9
Connection to localhost closed.

Examining the streams:

$ ssh localhost -t "~/test.py" >out 2>err
$ cat out
0
1
2
3
4
5
6
7
8
9
$ cat err
Connection to localhost closed.

Thus, the tradeoff here is "correct"-looking output versus the ability to get a
distinct stdout and stderr.

Echoing of stdin

No pty

Without a pty, ssh must echo the user's stdin wholesale (or hide it entirely,
though there do not appear to be options for this) and this means that password
prompts become unsafe. Sudo without a pty:

$ ssh localhost "sudo ls /"
Password:mypassword

.DS_Store
.Spotlight-V100
.Trashes
.com.apple.timemachine.supported
Applications
Developer
[...]

Note that the user's password, typed to stdin, shows up in the output. For
thoroughness, let's examine what went to which stream:

$ ssh localhost "sudo ls /" >out 2>err
mypassword
$ cat out
.DS_Store
.Spotlight-V100
.Trashes
.com.apple.timemachine.supported
Applications
Developer
[...]
$ cat err
Password:

As expected, the user's stdin didn't end up in the streams from the remote end
(ergo it is the local terminal echoing stdin, and not the remote end) and the
password prompt showed up in stderr.

With pty

Here's the same sequence but with -t enabled, forcing a pty:

$ ssh -t localhost "sudo ls /"
Password:
.DS_Store               Applications
.Spotlight-V100         .Trashes
Developer               [...]
Connection to localhost closed.

Note that in addition to not echoing the user's password, ls picked up on the
terminal being present and altered its behavior. This is orthogonal to our
research but is still a useful thing to keep in mind.

As before, use of pty means that all output now goes into stdout, leaving
stderr empty save for local output from the ssh program itself:

$ ssh -t localhost "sudo ls /" >out 2>err
$ cat out
Password:
.DS_Store               Applications
.Spotlight-V100         .Trashes
Developer               [...]
$ cat err
Connection to localhost closed.

And as with the previous invocation, our password never shows up, even on our
local terminal.

Non-hidden output

Finally, as a sanity test to ensure that non-password stdin is echoed by the
remote pty when appropriate, we remove a (previously created) test file with
rm's "are you sure" option enabled:

$ ssh -t localhost "rm -i /tmp/testfile"
remove /tmp/testfile? y
Connection to localhost closed.

And proof that it is the remote end doing the echoing -- our stdin shows up in
the stdout from the remote end:

$ ssh -t localhost "rm -i /tmp/testfile" >out 2>err
$ cat out
remove /tmp/testfile? y
$ cat err
Connection to localhost closed.

Conclusion

As seen above, there are a number of different behaviors one may encounter when
using, or not using, a pty. The tradeoff being, essentially, access to distinct
stdout and stderr streams (but garbled output and blanket echo of stdin) versus
a more shell-like behavior (but without the ability to tell the remote stderr
from stdout).

In our experience, the ssh program defaults to not using a pty, but the
average Fabric user is probably best served by enforcing one. New
users are more likely to expect "shell-like" behavior (such as proper
multiplexing of stdout and stderr, and hiding of password prompt stdin) and
Fabric already defaults to a "shell-like" behavior insofar as it wraps commands
in a login shell.

Summation of early comments

A summary of findings so far (contains up through comment 16):

  1. Python's default I/O buffering is typically line-by-line (linewise). I/O is
    not typically printed to the destination until a line ending is encountered.
    This applies both to input and output. (It's also why
    fabric.utils.fastprint was created -- one must manually flush output to
    e.g. stdout to get things like progress bars to show up reliably.)
  2. Fab's current mode of I/O is also linewise, partly because of point 1, and
    partly to allow printing of stdout and stderr streams independently. As a
    side effect, partial line output such as prompts will not be displayed to
    the Fabric user's console.
  3. As seen above, SSH's default buffering mode is mostly linewise, insofar as the
    default non-pty behavior mixes the two streams up but on a line by line
    basis, but it is still capable of presenting partial lines (prompts) when
    necessary.
  4. Because we cannot discern a reliable way of printing less-than-a-line output without moving to bytewise buffering, we'll need to switch to printing every byte as we receive it, in order for the user to see things such as prompts (or more complicated output, e.g. curses apps or things like top).
    • If/when the secret of ssh's print buffering is found, use that algorithm instead.
  5. Forcing Python's stdin to be bytewise requires the use of the Unix-only
    termios and tty libraries, but I believe there may be Windows
    alternatives. For now, we plan to focus on the best Unix-oriented approach
    and will implement Windows compatibility later if possible. (Sorry, Windows
    folks.)
  6. Obtaining remote data bytewise is a bit easier insofar as data from the
    client isn't linewise. However, shortening the size of the buffer throws a
    wrench in Fabric's current method of detecting whether there is no more
    output to be had, so we are currently experimenting with other approaches,
    specifically select.select (which, yes, is another Windows compatibility
    pain point.)
    • Any new solution should also hopefully obviate all the annoying, painful,
      error-prone issues with the current output_thread I/O loop, insofar as line
      remainders and such are concerned.
    • Ideally, as with select, this should also remove the need for threads
      entirely, which will make it easier to fully paralellize Fabric in the
      future, and kill another entire class of occasional problems.
  7. With bytewise output, we run into problems where the remote stdout and
    stderr get mixed up character-by-character (e.g. the last line of regular
    output can become garbled up with a "following" line containing a prompt, since
    many prompts print to stderr). Until/unless we can figure out how the
    regular SSH client accomplishes its "linewise but not really" buffering, the
    only way to avoid this problem is to set set_combine_stderr to True.
    • We could, and probably should, offer this as a setting in case users have
      need for it.
  8. And without using a pty, we are forced to manually echo all stdin, just as
    how vanilla SSH does (see previous major section). This then presents issues
    with password prompts becoming insecure.

Putting it all together

So, here's the planned TODO for this issue, given all of the above and the
current state of the feature branch (namely, hardcoded bytewise stdin, skipping
out on the output threads in favor of select, and printing prefixes after
each newline):

  1. Abstract out the currently-implemented stdin manipulation; it essentially
    requires a try/finally and I think it'd be handy to have as a
    context manager or similar.
    • Possibly also make it configurable, since bytewise stdin is not
      absolutely required much of the time. Still feel it should be enabled by
      default, though.
    • Offer an option to allow suppression of stdin echoing, just because.
  2. Expose set_combine_stderr as a user-facing option. Default should be on -- not too many people need the distinct
    stderr access, and with it off, output is very likely to be garbled
    unexpectedly. It's an advanced user sort of thing.
  3. Change the pty option to default to True (currently False). This will
    provide the smoothest user experience, and since we're combining the streams
    by default anyway, it's a no-brainer.
  4. Decide what to do with output_thread's password detection and response.
    This may become more difficult with bytewise buffering, and was originally
    implemented to get around the lack of stdin.
    • Drop the feature entirely, since users can now enter prompts
      interactively. Dropping features isn't great, though.
    • Repackage it as a "password memory" feature (it needs an overhaul
      anyways). Maybe as part of Investigate pexpect/expect integration #177.
    • Keep it entirely as-is, and just use the output capturing as the read
      buffer in place of the current approach (checking the as-big-as-possible
      chunk from the remote end). Possibly quickest. We won't be able to hide the
      prompt itself from user eyes anymore (that's the biggest reason See whether paramiko.SSHClient.invoke_shell + paramiko.Channel.send is feasible #80 can't
      work) but that's not required, just nice.
  5. Figure out if it's possible to omit printing the output prefix in lines where the user's input is being echoed by the remote end. Currently this results in said prefix showing up mid-line in some prompt situations (usually where the echoed stdin is the first data to show up in the stdout buffer, though it could also be a problem once the user hits Enter to submit the prompt too).
    • Might be able to conditionally hide prefix in cases where the byte coming in to stdout is the same as the last byte seen on stdin, but that is messy (e.g. output coming in long after the user is done typing -- do we add time memory? how much of one? etc)
    • Depending on exactly how it shakes out, this may not even be an issue for anything but the case where the typed input's echo is the first stdout. will have to see.
  6. Add an interact_with that makes use of invoke_shell, assuming it can work seamlessly with the final exec_command based solution without code duplication.
  7. Come up with Windows-compatible solutions, if possible, for all Unix-isms
    used in this effort.
  8. Note in the parallel-related ticket(s) that this solution will make it more difficult for a parallel execution setup to function, insofar as bytewise-vs-linewise output is concerned. A truly parallel execution would be incredibly confusing even on a line-by-line basis, however, so a better solution is likely to be needed anyways.
  9. Reorganize operations.py and network.py -- nuke old outdated code, shuffle around new code, it should ideally live in another module that is neither network or operations (?)
  10. Document all of the above changes thoroughly, and attend to related tickets re: tutorial etc.
    • Update changelog (the pty default is now backwards incompatible!)
    • Make sure users know they need to deactivate both pty and
      combine-streams options in order to get distinct streams.
    • Update skeleton usage docs re: interactivity
    • Search for mentions of use of the stderr attribute and update them since it's not populated by default anymore

Originally submitted by Jeff Forcier (bitprophet) on 2009-07-20 at 05:24pm EDT

Relations


Closed as Done on 2010-08-06 at 11:22pm EDT

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions