Skip to content

Support Arbitrary JSON values in JSON Reader (#4905)#4911

Merged
tustvold merged 4 commits intoapache:masterfrom
tustvold:json-list-root
Oct 12, 2023
Merged

Support Arbitrary JSON values in JSON Reader (#4905)#4911
tustvold merged 4 commits intoapache:masterfrom
tustvold:json-list-root

Conversation

@tustvold
Copy link
Copy Markdown
Contributor

@tustvold tustvold commented Oct 9, 2023

Which issue does this PR close?

Closes #4905

Rationale for this change

We should support decoding JSON payloads that don't have an object as the root.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 9, 2023
@tustvold
Copy link
Copy Markdown
Contributor Author

This does represent a fairly minor performance regression, but I'm not too concerned

small_bench_primitive   time:   [7.0711 µs 7.0744 µs 7.0778 µs]
                        change: [+2.9718% +3.2020% +3.6122%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

large_bench_primitive   time:   [2.4153 ms 2.4161 ms 2.4169 ms]
                        change: [+5.7843% +5.8318% +5.8805%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

small_bench_list        time:   [13.203 µs 13.211 µs 13.220 µs]
                        change: [+1.2264% +1.3806% +1.5824%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really solid to me -- thank you @tustvold

I had some documentation suggestions, but nothing I think that is required before merging

Comment on lines +24 to +26
//! The reader is agnostic to whitespace, including `\n` and `\r`, and will ignore such characters.
//! This allows parsing sequences of one or more arbitrarily formatted JSON values, including
//! but not limited to newline-delimited JSON.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor suggestion to improve the wording here

Suggested change
//! The reader is agnostic to whitespace, including `\n` and `\r`, and will ignore such characters.
//! This allows parsing sequences of one or more arbitrarily formatted JSON values, including
//! but not limited to newline-delimited JSON.
//! The reader ignores whitespace between JSON values, including `\n` and `\r`.
//! This allows parsing sequences of one or more arbitrarily formatted JSON values, including
//! but not limited to newline-delimited JSON.

}
}

/// Create a new [`ReaderBuilder`] with the provided [`FieldRef`]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Create a new [`ReaderBuilder`] with the provided [`FieldRef`]
/// Create a new [`ReaderBuilder`] that will parse JSON values with a root schema of [`FieldRef`].

Perhaps we can add a note in new() that says it does require the root to be an object like {..}.

I wonder if new_from_field might be a more descriptive name (as this isn't making a new Field, it is making a new reader that reads data with the type on the field

false,
)?;
let (data_type, nullable) = match self.is_field {
false => (DataType::Struct(self.schema.fields.clone()), false),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not allow null root fields with structs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because RecordBatch can't support nulls at the root level

elements: Vec<TapeElement>,

num_rows: usize,
cur_row: usize,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cur_row: usize,
/// logical row being decoded
cur_row: usize,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for reading JSON Array to Arrow

2 participants