-
-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Right now Writer has only one way of writing parquet files:
Writer::write(string $path, Schema $schema, iterable $rows)
With this approach, we can pass \Generator, and internally RowGroupBuilder will keep reading from it, splitting data into RowGroups by their size.
Once all data is written writer closes the file writing all metadata at the end of it.
This approach does not let us fully integrate Writer with ETL Loaders because load method is executed only for a given chunk of Rows, so we would need to keep appending data into the file but that would create too many and too small RowGroups which would affect parquet reader performance.
What we need is a method that will not close the parquet file and that would let us keep adding more batches (internal RowGroupBuilder should work exactly as it's working now).
Once all data is saved user should run close() method that will save the parquet file metadata at the end of the file closing also file stream.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status