-
Notifications
You must be signed in to change notification settings - Fork 615
Comparing changes
Open a pull request
base repository: delta-io/delta-rs
base: python-v0.16.2
head repository: delta-io/delta-rs
compare: python-v0.16.3
- 10 commits
- 27 files changed
- 6 contributors
Commits on Mar 21, 2024
-
chore: object store 0.9.1 (#2311)
# Description The description of the main changes of your pull request # Related Issue(s) <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation --->
Configuration menu - View commit details
-
Copy full SHA for eb8c19c - Browse repository at this point
Copy the full SHA eb8c19cView commit details
Commits on Mar 22, 2024
-
fix: try to fix timeouts (#2318)
# Description I ran this version in a pipeline at work and now it didn't give this anymore: `OSError: Generic MicrosoftAzure error: Error after 0 retries in 30.000943594s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://<redacted>.blob.core.windows.net/dev-preprocessed/<redacted>/product_line_code=<redacted>/event_year_month_clt=202209/part-00001-874ec394-72d0-44fb-8cbb-2b7b73b41ede-c000.snappy.parquet): operation timed out`
Configuration menu - View commit details
-
Copy full SHA for 22f03b4 - Browse repository at this point
Copy the full SHA 22f03b4View commit details -
docs: add example in to_pyarrow_dataset (#2315)
# Description Adds example from docs into docstring --------- Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
Configuration menu - View commit details
-
Copy full SHA for a9da105 - Browse repository at this point
Copy the full SHA a9da105View commit details
Commits on Mar 23, 2024
-
fix: handle conflict checking in optimize correctly (#2208)
# Description This removes the optimize `update()` before commit behaviour. When digging, I discovered that a z-order after a merge would cause corrupted commits: https://gist.github.com/emcake/4edfb72d77e08e8a600b8c0c902e2718 This should be prevented, as I'd expect a merge to remove files and for the conflict checker to kick in and prevent to z-order from going through. On digging I found that the conflict checker never came into play because of the call to update() before commit: https://github.com/delta-io/delta-rs/blob/main/crates/core/src/operations/optimize.rs#L738 This should have been caught by tests, but the test for conflict checking was been ignored since it was written: https://github.com/delta-io/delta-rs/blob/main/crates/core/tests/command_optimize.rs#L261 It looks like removing the update passes all tests and allows the conflict checking test to be added back in too. This causes one minor dilemma for long-running optimizes that use the min commit interval parameter - due to the way that commit works, if there is no updating then after there had been 15 intermediate commits it would fail. I've changed it to use `commit_with_retries` and it now accounts for the commits it's made in the retry count. --------- Co-authored-by: R. Tyler Croy <rtyler@brokenco.de> Co-authored-by: David Blajda <db@davidblajda.com>
Configuration menu - View commit details
-
Copy full SHA for f49eedb - Browse repository at this point
Copy the full SHA f49eedbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 39cec78 - Browse repository at this point
Copy the full SHA 39cec78View commit details -
fix: merge concurrency control (#2324)
# Description reintroduces the concurrency control with fix
Configuration menu - View commit details
-
Copy full SHA for 7928e95 - Browse repository at this point
Copy the full SHA 7928e95View commit details -
fix: merge pushdown handling (#2326)
# Description Fix broken test case with partitions - fixes #2158 --------- Co-authored-by: ion-elgreco <15728914+ion-elgreco@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 00c919f - Browse repository at this point
Copy the full SHA 00c919fView commit details
Commits on Mar 24, 2024
-
fix(rust): serialize MetricDetails from compaction runs to a string (#…
…2317) I am by no means a Rust developer and haven't touched it in years; so please let me know if there's a better way to go about this. The Rust z_order and optimize.compact already serializes the metrics before it is passed back to Python, which then deserializes it back, so the Python behavior in terms of expecting this as a Dict has not changed which I think is what we want. # Description Adds a custom serialzer and Display implementation for the `MetricDetails` fields, namely `filesAdded` and `filesRemoved` so that those fields are written as strings instead of a struct to the commit log. Query engines expect these fields to be strings on reads. I had trouble getting the pyspark tests running locally, but here is an example optimize commit log that gets written with these changes: ``` {"commitInfo":{"timestamp":1711125995487,"operation":"OPTIMIZE","operationParameters":{"targetSize":"104857600","predicate":"[]"},"clientVersion":"delta-rs.0.17.1","readVersion":10,"operationMetrics":{"filesAdded":"{\"avg\":19956.0,\"max\":19956,\"min\":19956,\"totalFiles\":1,\"totalSize\":19956}","filesRemoved":"{\"avg\":4851.833333333333,\"max\":10358,\"min\":3734,\"totalFiles\":6,\"totalSize\":29111}","numBatches":6,"numFilesAdded":1,"numFilesRemoved":6,"partitionsOptimized":1,"preserveInsertionOrder":true,"totalConsideredFiles":6,"totalFilesSkipped":0}}} ``` # Related Issue(s) - #2087 # Documentation N/A
Configuration menu - View commit details
-
Copy full SHA for 923dfef - Browse repository at this point
Copy the full SHA 923dfefView commit details -
feat: logical Node for find files (#2194)
# Description Some of my first workings on David's proposal in #2006, this is also meant to push #2048 and general CDF forward as well by making the logical operations of delta tables more composable than they are today. # Related Issue(s) #2006 #2048 I think and @Blajda correct me here, we can build upon this and eventually move towards a `DeltaPlanner` esq enum for operations and their associated logical plan building. # Still to do - [ ] Implement different path for partition columns that don't require scanning the file - [ ] Plumbing into `DeltaScan` so delta scan can make use of this logical node - [ ] General polish and cleanup, there are lots of unnecessary fields and way things are built - [ ] More tests, there is currently a large integration style end to end test, but this can / should be broken down
Configuration menu - View commit details
-
Copy full SHA for f56d8c9 - Browse repository at this point
Copy the full SHA f56d8c9View commit details
Commits on Mar 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for abafd2d - Browse repository at this point
Copy the full SHA abafd2dView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff python-v0.16.2...python-v0.16.3