Handle BYTE_ARRAY physical type in arrow-json (be able to load files output from pandas with no dtypes)

## Goal
arrow-json should be able to load parquet files output from python pandas with no dtypes.

## Use case
Given the following python code:

```python
import pandas as pd
data = '[{"a": 1, "b": "Hello", "c": {"d": "something"}, "e": [1,2,3]}]'
df = pd.read_json(data, dtype=False, orient='record')
df.to_parquet("test.parquet", engine="fastparquet", object_encoding="json", stats=False)
df2 = pd.read_parquet("test.parquet", engine="fastparquet")
print(df2)
print(df2.dtypes)
```

This outputs:

```
   a      b                   c          e
0  1  Hello  {'d': 'something'}  [1, 2, 3]
a     int64
b    object
c    object
e    object
dtype: object
```
The types aren't great, but it can write and the file is loaded. ✅ 

Using [VSCode parquet-viewer](https://github.com/dvirtz/vscode-parquet-viewer) plugin (TypeScript) we can see the loaded data:

<img width="568" alt="image" src="https://user-images.githubusercontent.com/28823/208564162-0ea3ebfe-3ea1-4a71-a681-e21669b3748f.png">

The Typescript/Javascript implementation is able to load the file ✅ 


However, when I try to load this using `arrow-json`, I seethe following error:

```rust
async fn parquet_to_json<T>(data: T) where T: AsyncFileReader + Send + Unpin + 'static {

    let builder = ParquetRecordBatchStreamBuilder::new(data)
        .await
        .unwrap()
        .with_batch_size(3);
    let file_metadata = builder.metadata().file_metadata();
    println!("schema: {:?}", file_metadata.schema_descr());

    let stream = builder.build().unwrap();
    let results = stream.try_collect::<Vec<_>>().await.unwrap();
    let mut out_buf = Vec::new();
    let mut writer = LineDelimitedWriter::new(&mut out_buf);
    writer
        .write_batches(&results)
        .expect("could not write batches");
    let json_out = String::from_utf8_lossy(&out_buf);
    println!("result: {}", json_out);
}
```

```
thread 'main' panicked at 'could not write batches: JsonError("data type Binary not supported in nested map for json writer")'
```

The schema as `arrow-rs` knows it: 

```
schema: SchemaDescriptor { schema: GroupType { basic_info: BasicTypeInfo { name: "schema", repetition: None, converted_type: NONE, logical_type: None, id: None }, fields: [PrimitiveType { basic_info: BasicTypeInfo { name: "a", repetition: Some(OPTIONAL), converted_type: NONE, logical_type: None, id: None }, physical_type: INT64, type_length: 64, scale: -1, precision: -1 }, PrimitiveType { basic_info: BasicTypeInfo { name: "b", repetition: Some(OPTIONAL), converted_type: JSON, logical_type: None, id: None }, physical_type: BYTE_ARRAY, type_length: -1, scale: -1, precision: -1 }, PrimitiveType { basic_info: BasicTypeInfo { name: "c", repetition: Some(OPTIONAL), converted_type: JSON, logical_type: None, id: None }, physical_type: BYTE_ARRAY, type_length: -1, scale: -1, precision: -1 }, PrimitiveType { basic_info: BasicTypeInfo { name: "e", repetition: Some(OPTIONAL), converted_type: JSON, logical_type: None, id: None }, physical_type: BYTE_ARRAY, type_length: -1, scale: -1, precision: -1 }] } }
```

I don't know what the parquet spec days here but basic files are loadable from other implementations, and being able to read files output from pandas must surely be a significant use case.

## Related tickets / PRs:

Related ticket: https://github.com/apache/arrow-rs/issues/154
BinaryArray doesn't exist (anymore?) as I only see `Binary` as a `DataType` and `BYTE_ARRAY` in the schema output, so I wasn't sure if this was the same issue.

There was a previous PR for the above ticket: https://github.com/apache/arrow/pull/8971 which was closed. This looks like this also would have failed to do 'the right thing'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle BYTE_ARRAY physical type in arrow-json (be able to load files output from pandas with no dtypes) #3373

Goal

Use case

Related tickets / PRs:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Handle BYTE_ARRAY physical type in arrow-json (be able to load files output from pandas with no dtypes) #3373

Description

Goal

Use case

Related tickets / PRs:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions