Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

  • Schema::normalize() : array
  • Schema::fromArray(array $data) : self

Fixed

Changed

Removed

  • Schema Definition Constraints
  • Schema Definition Metadata is no longer accepting objects

Deprecated

Security


Resolves: #932

Description

Example:

[
    {
        "ref": "id",
        "type": {
            "type": "scalar",
            "scalar_type": "integer",
            "nullable": false
        },
        "metadata": []
    },
    {
        "ref": "str",
        "type": {
            "type": "scalar",
            "scalar_type": "string",
            "nullable": true
        },
        "metadata": []
    },
    {
        "ref": "uuid",
        "type": {
            "type": "uuid",
            "nullable": false
        },
        "metadata": []
    },
    {
        "ref": "json",
        "type": {
            "type": "json",
            "nullable": true
        },
        "metadata": []
    },
    {
        "ref": "map",
        "type": {
            "type": "map",
            "key": {
                "type": {
                    "type": "scalar",
                    "scalar_type": "string",
                    "nullable": false
                }
            },
            "value": {
                "type": {
                    "type": "scalar",
                    "scalar_type": "integer",
                    "nullable": false
                }
            },
            "nullable": false
        },
        "metadata": []
    },
    {
        "ref": "list",
        "type": {
            "type": "list",
            "element": {
                "type": {
                    "type": "scalar",
                    "scalar_type": "integer",
                    "nullable": false
                }
            },
            "nullable": false
        },
        "metadata": []
    },
    {
        "ref": "struct",
        "type": {
            "type": "structure",
            "elements": [
                {
                    "name": "street",
                    "type": {
                        "type": "scalar",
                        "scalar_type": "string",
                        "nullable": false
                    }
                },
                {
                    "name": "city",
                    "type": {
                        "type": "scalar",
                        "scalar_type": "string",
                        "nullable": false
                    }
                }
            ],
            "nullable": false
        },
        "metadata": []
    }
]
$schema = schema_from_json(\file_get_contents('schema.json')); 

df()
  ->read(from_csv('file.csv', schema: $schema)
  ->write(to_output())
  ->run(); 

@github-actions
Copy link
Contributor

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 35.238mb +0.06%  | 812.348ms -0.15% | ±1.95% +66.70%  |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.964mb +0.23%   | 339.657ms -1.70% | ±0.52% -85.17%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.107mb +0.39%   | 1.052s +0.36%    | ±0.85% +173.86% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 239.806mb +0.01% | 1.239s +0.39%    | ±0.38% -50.47%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.885mb +0.23%   | 34.978ms +0.60%  | ±0.78% -14.93%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.887mb +0.23%   | 430.973ms -0.66% | ±0.27% -18.65%  |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev          |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 110.587mb +0.01% | 64.263ms +0.22% | ±1.10% +130.73% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 95.618mb +0.02%  | 462.880ms -1.47% | ±0.27% -62.54% |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.101mb +0.02%  | 71.279ms -1.53%  | ±0.49% -45.30% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 106.523mb +0.02% | 51.837ms -0.95%  | ±0.47% -73.44% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 321.728mb +0.01% | 1.483s -1.63%    | ±0.27% -48.50% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.925mb +0.06%  | 40.831ms -0.20%  | ±0.33% -2.47%  |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 76.662mb +0.01%  | 3.322ms -3.10%   | ±1.60% -40.15%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 96.388mb +0.01%  | 182.030ms -2.49% | ±0.15% -87.22%  |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 74.914mb +0.02%  | 18.170ms -1.50%  | ±0.78% +70.03%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 77.902mb +0.01%  | 1.850ms +1.51%   | ±3.18% +79.20%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 77.902mb +0.01%  | 1.712ms -5.17%   | ±0.19% -91.37%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 75.014mb +0.02%  | 2.576ms -3.02%   | ±2.52% -18.66%  |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 75.543mb +0.01%  | 14.414ms -1.27%  | ±0.20% -86.25%  |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 75.543mb +0.01%  | 14.810ms -0.20%  | ±1.95% +56.66%  |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 73.447mb +0.02%  | 1.700μs -5.24%   | ±0.00% -100.00% |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 73.447mb +0.02%  | 0.300μs -25.00%  | ±0.00% -100.00% |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 87.001mb +0.01%  | 12.981ms -1.97%  | ±0.55% -69.12%  |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 116.362mb +0.01% | 65.045ms -0.63%  | ±0.90% +152.92% |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 76.062mb +0.01%  | 1.268ms -13.52%  | ±2.93% -8.50%   |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 79.344mb +0.10%  | 58.920ms -0.18%  | ±1.07% +881.73% |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 78.164mb +0.01%  | 3.878ms -6.00%   | ±1.78% +52.72%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 73.525mb +0.02%  | 40.091ms -2.65%  | ±2.81% +92.13%  |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 73.525mb +0.02%  | 40.141ms -3.08%  | ±2.02% +227.08% |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 73.525mb +0.02%  | 39.584ms -2.45%  | ±0.65% +47.44%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 75.888mb +0.01%  | 7.300ms -0.66%   | ±0.82% +9.98%   |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 73.447mb +0.02%  | 28.810ms -1.80%  | ±0.19% -81.20%  |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 73.447mb +0.02%  | 13.318μs -0.93%  | ±1.06% -56.61%  |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 73.447mb +0.02%  | 15.876μs -0.52%  | ±1.20% -22.97%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 96.390mb +0.01%  | 186.445ms -0.29% | ±0.70% -34.13%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 116.688mb +0.01% | 491.754ms -0.21% | ±0.31% -67.26%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 60.166mb +0.03%  | 246.100ms -0.77% | ±0.80% -21.87%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 15.099mb +0.11%  | 52.863ms -0.67%  | ±2.22% +248.37% |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.943mb +0.02%  | 436.438ms +0.63% | ±0.87% +373.23% |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.482mb +0.09%  | 85.865ms -1.17%  | ±0.06% -91.81%  |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@norberttech norberttech merged commit b445935 into flow-php:1.x Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Schema to/from Array

1 participant