Skip to content

Redirection: Avoid blocking when err+out> redirects too much stderr message#7240

Closed
WindSoilder wants to merge 1 commit intonushell:mainfrom
WindSoilder:redirection
Closed

Redirection: Avoid blocking when err+out> redirects too much stderr message#7240
WindSoilder wants to merge 1 commit intonushell:mainfrom
WindSoilder:redirection

Conversation

@WindSoilder
Copy link
Copy Markdown
Contributor

@WindSoilder WindSoilder commented Nov 25, 2022

Description

After reading into code, I found that err+out> will always redirect stdout message first, then stderr message.

Then when we have a script which want to redirect too much error message too make nushell hangs.

Here is a python script example:

# test.py
import sys

print('aa'*300, flush=True)
print('bb'*999999, file=sys.stderr, flush=True)
print('cc'*300, flush=True)

And running script:
python test.py err+out> 1.txt
I think it's better to make them interleaved like #6658 mentioned.

And BTW, fix --raw flag passing into external save command.

User-Facing Changes

(List of all changes that impact the user experience here. This helps us keep track of breaking changes.)

Tests + Formatting

Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

  • cargo fmt --all -- --check to check standard code formatting (cargo fmt --all applies these changes)
  • cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect to check that you're using the standard code style
  • cargo test --workspace to check that all tests pass

After Submitting

If your PR had any user-facing changes, update the documentation after the PR is merged, if necessary. This will help us keep the docs up to date.

@sholderbach
Copy link
Copy Markdown
Member

I only looked at the diff so far. How is this enabling interleaved reading? And how do we ensure that there is no blocking with potentially buffered reading?

@WindSoilder
Copy link
Copy Markdown
Contributor Author

WindSoilder commented Nov 26, 2022

How is this enabling interleaved reading?

Actually the key lays in the save command, when save command receive --stderr parameter, it consumes stdout and stderr in different thread.

And how do we ensure that there is no blocking with potentially buffered reading?

The key here is still multi-threading in save command. And, I just found that current implementation in main branch can lead to blocking.

consider the following python script:

# test.py
import sys
import time

print('aa'*300, flush=True)
time.sleep(3)
print('bb'*999999, file=sys.stderr, flush=True)
time.sleep(3)
print('cc'*300, flush=True)

And run the script with python test.py err+out> 1.txt will block nushell. This pr also solves the issue

@sophiajt
Copy link
Copy Markdown
Contributor

sophiajt commented Nov 26, 2022

I tested this against:

fn main() {
    eprintln!("Hello1, world!");
    println!("Hello2, world!");
    eprintln!("Hello3, world!");
    println!("Hello4, world!");
    eprintln!("Hello5, world!");
    println!("Hello6, world!");
}

then did:

> errout err+out> ../bar.txt
> cat ../bar.txt
Hello2, world!
Hello4, world!
Hello6, world!
Hello1, world!
Hello3, world!
Hello5, world!

The bash equivalent keeps them in the same order:

jt@pop-os:~/Source/nushell$ errout &> ../foo.txt
jt@pop-os:~/Source/nushell$ cat ../foo.txt 
Hello1, world!
Hello2, world!
Hello3, world!
Hello4, world!
Hello5, world!
Hello6, world!

@WindSoilder
Copy link
Copy Markdown
Contributor Author

Ok, I've checked bash, the bash way to implement redirect is using dup system call, I should change the title.

@WindSoilder WindSoilder changed the title Redirection: make err+out> works like bash way Redirection: Avoid blocking when err+out> redirects too much stderr message Nov 28, 2022
@WindSoilder WindSoilder marked this pull request as draft November 28, 2022 06:34
@WindSoilder WindSoilder marked this pull request as ready for review November 29, 2022 06:27
@rgwood
Copy link
Copy Markdown
Contributor

rgwood commented Nov 29, 2022

I spent some time looking into interleaving stderr and stdout. This blog post was quite helpful, as was section 5.5 in The Linux Programming Interface.

How Bash Does It

As WindSoilder says, Bash uses dup()/dup2(). For example dup2(1, 2) will close the current process's stderr (2) and replace it with a copy of stdout (1) so that writes to stderr go to the same place as stdout. I believe the semantics of concurrent writes are then handled by POSIX's guarantees for pipes:

Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved

How Nu Can Do It

I don't think using dup() is very practical for Nu. dup() needs to be called from the child process, which means we'd need to give up on using Rust's nice cross-platform Command/spawn()/Child abstractions and drop down to fork+dup+exec on *nix. I guess this is technically possible but I'm scared that it will have a long tail of bugs and cross-platform inconsistencies (I have no idea how we'd accomplish this on Windows).

Another option would be to use a Rust channel and spin up threads that just pump data from the external's stderr+stdout into the channel as it arrives. Some hacky prototype code that seems to work:

let mut process = Command::new("some-external")
.stderr(Stdio::piped())
.stdout(Stdio::piped())
.spawn()?;

let mut child_stderr = process.stderr.take().unwrap();
let mut child_stdout = process.stdout.take().unwrap();

let (tx, rx): (Sender<Vec<u8>>, Receiver<Vec<u8>>) = mpsc::channel();

// pump stderr into the channel
let thread_tx = tx.clone();
thread::spawn(move || {
    let mut buf = [0; 10];
    loop {
        let bytes_read = child_stderr.read(&mut buf).unwrap();
        if bytes_read == 0 {
            break;
        }

        let v = Vec::from(&buf[..bytes_read]);
        thread_tx.send(v).unwrap();
    }
});

// pump stdout into the channel
let thread_tx = tx.clone();
thread::spawn(move || {
    let mut buf = [0; 10];
    loop {
        let bytes_read = child_stdout.read(&mut buf).unwrap();
        if bytes_read == 0 {
            break;
        }

        let v = Vec::from(&buf[..bytes_read]);
        thread_tx.send(v).unwrap();
    }
});

// read from the channel
while let Ok(bytes) = rx.recv() {
    print!("{}", String::from_utf8_lossy(&bytes));
}

@rgwood
Copy link
Copy Markdown
Contributor

rgwood commented Dec 1, 2022

Running the first Python script like python3 test.py err+out> 1.txt gives me unexpected, nondeterministic results;

... (lots of bbbb chars)
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
... (lots more bbbb chars)

I'm a little surprised that the print('aa'*300, flush=True) call doesn't show up first in the output. I don't have a good understanding of why this PR works the way it does, will try to fix that.

@WindSoilder
Copy link
Copy Markdown
Contributor Author

don't have a good understanding of why this PR works the way it does

After checking my code a bit, I think this PR is not always work like it does, it uses multi-threading, so there is no guarantee about the order it printed out...

It can only guarantee that all message to stdout will consume in order, and all message to stderr will consume in order. But message between stdout and stderr can have different order..... And it even can't guatantee that print('aa' * 300, flush=True) is atomic :-( That's bad

@WindSoilder WindSoilder marked this pull request as draft December 2, 2022 00:28
@WindSoilder
Copy link
Copy Markdown
Contributor Author

Have a look into this again, the external result handling should be done at the beginning of process creation:

https://rust-lang-nursery.github.io/rust-cookbook/os/external.html#redirect-both-stdout-and-stderr-of-child-process-to-the-same-file

I don't think it's an easy task, and it requires a lot of change.

Close the pr for now, it's a corner case of redirection, I think it's ok to leave current behavior for now

@WindSoilder WindSoilder closed this Dec 3, 2022
@rgwood
Copy link
Copy Markdown
Contributor

rgwood commented Dec 3, 2022

That example is a great find, thanks.

@WindSoilder WindSoilder deleted the redirection branch December 24, 2022 22:06
fdncred pushed a commit that referenced this pull request May 17, 2023
# Description

Fixes: #8565

Here is another pr #7240 tried to address the issue, but it works in a
wrong way.

After this change `o+e>` won't redirect all stdout message then stderr
message and it works more like how bash does.

# User-Facing Changes

For the given python code:
```python
# test.py
import sys

print('aa'*300, flush=True)
print('bb'*999999, file=sys.stderr, flush=True)
print('cc'*300, flush=True)
```

Running `python test.py out+err> a.txt` shoudn't hang nushell, and
`a.txt` keeps output in the same order

## About the change
The core idea is that when doing lite-parsing, introduce a new variant
`LiteElement::SameTargetRedirection` if we meet `out+err>` redirection
token(which is generated by lex function),

During converting from lite block to block,
LiteElement::SameTargetRedirection will be converted to
PipelineElement::SameTargetRedirection.

Then in the block eval process, if we get
PipelineElement::SameTargetRedirection, we'll invoke `run-external` with
`--redirect-combine` flag, then pipe the result into save command

## What happened internally?

Take the following command as example:
`^ls o+e> log.txt`

lex parsing result(`Tokens`) are not changed, but `LiteBlock` and
`Block` is changed after this pr.
### LiteBlock before
```rust
LiteBlock {
    block: [
        LitePipeline { commands: [
            Command(None, LiteCommand { comments: [], parts: [Span { start: 39041, end: 39044 }] }),
            // actually the span of first Redirection is wrong too..
            Redirection(Span { start: 39058, end: 39062 }, StdoutAndStderr, LiteCommand { comments: [], parts: [Span { start: 39050, end: 39057 }] }),
        ]
    }]
}
```
### LiteBlock after
```rust
LiteBlock { 
    block: [
        LitePipeline {
            commands: [
                SameTargetRedirection {
                    cmd: (None, LiteCommand { comments: [], parts: [Span { start: 147945, end: 147948}]}),
                    redirection: (Span { start: 147949, end: 147957 }, LiteCommand { comments: [], parts: [Span { start: 147958, end: 147965 }]})
                }
            ]
        }
    ]
}
```
### Block before
```rust
Pipeline {
    elements: [
        Expression(None, Expression {
            expr: ExternalCall(Expression { expr: String("ls"), span: Span { start: 39042, end: 39044 }, ty: String, custom_completion: None }, [], false),
            span: Span { start: 39041, end: 39044 },
            ty: Any, custom_completion: None 
        }),
        Redirection(Span { start: 39058, end: 39062 }, StdoutAndStderr, Expression { expr: String("out.txt"), span: Span { start: 39050, end: 39057 }, ty: String, custom_completion: None })] }
```
### Block after
```rust
Pipeline {
    elements: [
        SameTargetRedirection { 
            cmd: (None, Expression {
                expr: ExternalCall(Expression { expr: String("ls"), span: Span { start: 147946, end: 147948 }, ty: String, custom_completion: None}, [], false),
                span: Span { start: 147945, end: 147948},
                ty: Any, custom_completion: None
            }),
            redirection: (Span { start: 147949, end: 147957}, Expression {expr: String("log.txt"), span: Span { start: 147958, end: 147965 },ty: String,custom_completion: None}
        }
    ]
}
```

# Tests + Formatting

Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A
clippy::needless_collect` to check that you're using the standard code
style
- `cargo test --workspace` to check that all tests pass
- `cargo run -- crates/nu-utils/standard_library/tests.nu` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```

# After Submitting

If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants