Skip to content

make save stream on list stream data#7675

Merged
rgwood merged 5 commits intonushell:mainfrom
WindSoilder:stream_save
Jan 15, 2023
Merged

make save stream on list stream data#7675
rgwood merged 5 commits intonushell:mainfrom
WindSoilder:stream_save

Conversation

@WindSoilder
Copy link
Copy Markdown
Contributor

@WindSoilder WindSoilder commented Jan 4, 2023

Description

Closes: #7590

User-Facing Changes

So the following command

1..100 | each { |i| sleep 400ms; $i} | save --raw -f output.txt

Will stream data to output.txt

But I'm note sure how to make a proper test for it, so I leave with no new test cases..

Also rename from string_binary_list_value_to_bytes to value_to_bytes to accepts more Value type.

Tests + Formatting

Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

  • cargo fmt --all -- --check to check standard code formatting (cargo fmt --all applies these changes)
  • cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect to check that you're using the standard code style
  • cargo test --workspace to check that all tests pass

After Submitting

If your PR had any user-facing changes, update the documentation after the PR is merged, if necessary. This will help us keep the docs up to date.

Copy link
Copy Markdown
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would lead to a regression how the output gets formatted.

Comment on lines +110 to +113
file.write_all(&value_to_bytes(val)?)
.map_err(|err| ShellError::IOError(err.to_string()))?;
file.write_all("\n".as_bytes())
.map_err(|err| ShellError::IOError(err.to_string()))?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the correct behavior if we want to generate a file in a streaming way for some file types?
This change would always use the Value::as_string() if I understand it. But formats like nuon/csv/json are reasonable to stream from a list of values, with some effort.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a regression. Previously ls | save foo.json converted ls results to JSON, now it fails with Can't convert to string.

Maybe that should be fixed. But to be honest I don't like the existing functionality where save magically converts formats based on the file extension, and I would not be sad if someone decided to get rid of it.

Copy link
Copy Markdown
Contributor Author

@WindSoilder WindSoilder Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But formats like nuon/csv/json are reasonable to stream from a list of values, with some effort.

Ah, that's a good catch.

Maybe that should be fixed. But to be honest I don't like the existing functionality where save magically converts formats based on the file extension, and I would not be sad if someone decided to get rid of it.

I think we should keep this behavior, we have --raw flag to disable it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it will only stream data if --raw flag is provided or target file doesn't have an extension

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry not sure if I consider this resolved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sholderbach My bad, sorry for merging.

What exactly are you looking for? ls | save foo.json now converts to JSON as before.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think for something like ls | save foo.json, the data is converted to json, streaming only happened when we provide --raw flag

Comment on lines -233 to +244
other => Err(ShellError::OnlySupportsThisInputType(
"string, binary or list".into(),
other.get_type().to_string(),
span,
// This line requires the Value::Error match above.
other.expect_span(),
)),
other => Ok(other.as_string()?.into_bytes()),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the intended semantics here? Was this a recent regression or is this making things more implicit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when calling input.into_value(span), if the input is PipelineData::ListStream, the value will be Value::List, then value_to_bytes will try to map each element as string.

It will goes to error if we're trying to iterate through ListStream and doesn't try to convert from element(maybe integer) to string

@rgwood rgwood added the A:streaming Issues related to streaming data (or collecting data when it should be streamed) label Jan 4, 2023
@rgwood
Copy link
Copy Markdown
Contributor

rgwood commented Jan 4, 2023

Thank you for working on this! I'm always really happy to see streaming improvements.

But I'm note sure how to make a proper test for it, so I leave with no new test cases..

Yeah, it's difficult to test that the data is actually streaming. But it can be useful to test that the end result is correct; for that I like to use each to convert Lists to ListStreams (ex: [1 2 3] | each { |i| $i } | ...)

@WindSoilder WindSoilder marked this pull request as draft January 5, 2023 01:29
@sholderbach
Copy link
Copy Markdown
Member

Apart from the streaming in the appropriate output format, we might also want take into account the file creation behavior on failure.

main
grafik

stream_save
grafik

@WindSoilder WindSoilder marked this pull request as ready for review January 6, 2023 12:37
@NotLebedev
Copy link
Copy Markdown
Contributor

Also not clear how errors mid-stream are handled. Currently this example

image

results in

image

which is logical if error mid-stream is considered as break. If this is intended than documentation should specify this semantics and differentiate this from other types of errors.

@WindSoilder
Copy link
Copy Markdown
Contributor Author

It's strange that my code results to this error:
img

And then file output.txt contains the following:

1
2
3
4
5
6
7
8
9

I think it works as expected..And error message is fine

Copy link
Copy Markdown
Contributor

@rgwood rgwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice improvement, let's land it and try it out. I did some basic testing and the results seemed good.

We need to do some work on general error handling in streams eventually, but I think that's beyond the scope of this work item.

@rgwood rgwood merged commit 0353eb4 into nushell:main Jan 15, 2023
@WindSoilder WindSoilder deleted the stream_save branch August 2, 2023 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A:streaming Issues related to streaming data (or collecting data when it should be streamed)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

save should stream data whenever possible

4 participants