Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: delta-io/delta-rs
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: python-v0.16.2
Choose a base ref
...
head repository: delta-io/delta-rs
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: python-v0.16.3
Choose a head ref
  • 10 commits
  • 27 files changed
  • 6 contributors

Commits on Mar 21, 2024

  1. chore: object store 0.9.1 (#2311)

    # Description
    The description of the main changes of your pull request
    
    # Related Issue(s)
    <!---
    For example:
    
    - closes #106
    --->
    
    # Documentation
    
    <!---
    Share links to useful documentation
    --->
    ion-elgreco authored Mar 21, 2024
    Configuration menu
    Copy the full SHA
    eb8c19c View commit details
    Browse the repository at this point in the history

Commits on Mar 22, 2024

  1. fix: try to fix timeouts (#2318)

    # Description
    I ran this version in a pipeline at work and now it didn't give this
    anymore:
    
    `OSError: Generic MicrosoftAzure error: Error after 0 retries in
    30.000943594s, max_retries:10, retry_timeout:180s, source:error sending
    request for url
    (https://<redacted>.blob.core.windows.net/dev-preprocessed/<redacted>/product_line_code=<redacted>/event_year_month_clt=202209/part-00001-874ec394-72d0-44fb-8cbb-2b7b73b41ede-c000.snappy.parquet):
    operation timed out`
    ion-elgreco authored Mar 22, 2024
    Configuration menu
    Copy the full SHA
    22f03b4 View commit details
    Browse the repository at this point in the history
  2. docs: add example in to_pyarrow_dataset (#2315)

    # Description
    Adds example from docs into docstring
    
    ---------
    
    Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
    ion-elgreco and rtyler authored Mar 22, 2024
    Configuration menu
    Copy the full SHA
    a9da105 View commit details
    Browse the repository at this point in the history

Commits on Mar 23, 2024

  1. fix: handle conflict checking in optimize correctly (#2208)

    # Description
    This removes the optimize `update()` before commit behaviour.
    
    When digging, I discovered that a z-order after a merge would cause
    corrupted commits:
    
    https://gist.github.com/emcake/4edfb72d77e08e8a600b8c0c902e2718
    
    This should be prevented, as I'd expect a merge to remove files and for
    the conflict checker to kick in and prevent to z-order from going
    through. On digging I found that the conflict checker never came into
    play because of the call to update() before commit:
    https://github.com/delta-io/delta-rs/blob/main/crates/core/src/operations/optimize.rs#L738
    
    This should have been caught by tests, but the test for conflict
    checking was been ignored since it was written:
    https://github.com/delta-io/delta-rs/blob/main/crates/core/tests/command_optimize.rs#L261
    
    It looks like removing the update passes all tests and allows the
    conflict checking test to be added back in too. This causes one minor
    dilemma for long-running optimizes that use the min commit interval
    parameter - due to the way that commit works, if there is no updating
    then after there had been 15 intermediate commits it would fail. I've
    changed it to use `commit_with_retries` and it now accounts for the
    commits it's made in the retry count.
    
    ---------
    
    Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
    Co-authored-by: David Blajda <db@davidblajda.com>
    3 people authored Mar 23, 2024
    Configuration menu
    Copy the full SHA
    f49eedb View commit details
    Browse the repository at this point in the history
  2. Revert 2291 merge predicate fix (#2323)

    reverts #2291
    
    Introduces a regression to merge that occurred before
    #2158
    
    Causes the added test case to fail
    Blajda authored Mar 23, 2024
    Configuration menu
    Copy the full SHA
    39cec78 View commit details
    Browse the repository at this point in the history
  3. fix: merge concurrency control (#2324)

    # Description
    reintroduces the concurrency control with fix
    ion-elgreco authored Mar 23, 2024
    Configuration menu
    Copy the full SHA
    7928e95 View commit details
    Browse the repository at this point in the history
  4. fix: merge pushdown handling (#2326)

    # Description
    Fix broken test case with partitions
    
    - fixes #2158
    
    ---------
    
    Co-authored-by: ion-elgreco <15728914+ion-elgreco@users.noreply.github.com>
    Blajda and ion-elgreco authored Mar 23, 2024
    Configuration menu
    Copy the full SHA
    00c919f View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2024

  1. fix(rust): serialize MetricDetails from compaction runs to a string (#…

    …2317)
    
    I am by no means a Rust developer and haven't touched it in years; so
    please let me know if there's a better way to go about this. The Rust
    z_order and optimize.compact already serializes the metrics before it is
    passed back to Python, which then deserializes it back, so the Python
    behavior in terms of expecting this as a Dict has not changed which I
    think is what we want.
    
    # Description
    Adds a custom serialzer and Display implementation for the
    `MetricDetails` fields, namely `filesAdded` and `filesRemoved` so that
    those fields are written as strings instead of a struct to the commit
    log. Query engines expect these fields to be strings on reads.
    
    I had trouble getting the pyspark tests running locally, but here is an
    example optimize commit log that gets written with these changes:
    
    ```
    {"commitInfo":{"timestamp":1711125995487,"operation":"OPTIMIZE","operationParameters":{"targetSize":"104857600","predicate":"[]"},"clientVersion":"delta-rs.0.17.1","readVersion":10,"operationMetrics":{"filesAdded":"{\"avg\":19956.0,\"max\":19956,\"min\":19956,\"totalFiles\":1,\"totalSize\":19956}","filesRemoved":"{\"avg\":4851.833333333333,\"max\":10358,\"min\":3734,\"totalFiles\":6,\"totalSize\":29111}","numBatches":6,"numFilesAdded":1,"numFilesRemoved":6,"partitionsOptimized":1,"preserveInsertionOrder":true,"totalConsideredFiles":6,"totalFilesSkipped":0}}}
    ```
    
    # Related Issue(s)
    - #2087
    
    # Documentation
    
    N/A
    liamphmurphy authored Mar 24, 2024
    Configuration menu
    Copy the full SHA
    923dfef View commit details
    Browse the repository at this point in the history
  2. feat: logical Node for find files (#2194)

    # Description
    Some of my first workings on David's proposal in #2006, this is also
    meant to push #2048 and general CDF forward as well by making the
    logical operations of delta tables more composable than they are today.
    
    # Related Issue(s)
    #2006 
    #2048 
    
    I think and @Blajda correct me here, we can build upon this and
    eventually move towards a `DeltaPlanner` esq enum for operations and
    their associated logical plan building.
    
    # Still to do
    
    - [ ] Implement different path for partition columns that don't require
    scanning the file
    - [ ] Plumbing into `DeltaScan` so delta scan can make use of this
    logical node
    - [ ] General polish and cleanup, there are lots of unnecessary fields
    and way things are built
    - [ ] More tests, there is currently a large integration style end to
    end test, but this can / should be broken down
    hntd187 authored Mar 24, 2024
    Configuration menu
    Copy the full SHA
    f56d8c9 View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2024

  1. adhere to protocol

    ion-elgreco authored and rtyler committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    abafd2d View commit details
    Browse the repository at this point in the history
Loading