feat: table overwrite functionality by dontirun · Pull Request #674 · apache/iceberg-go

dontirun · 2026-01-09T20:36:49Z

Resolves #668

There are quite a few things I'm not too sure about with this implementation (like the filtering logic or if I've properly tested it)

dontirun · 2026-01-09T20:37:55Z

table/transaction.go

+		metadata:        meta,
+		fs:              fs,
+		projectedSchema: t.meta.CurrentSchema(),
+		boundRowFilter:  boundFilter,
+		caseSensitive:   true,
+		rowLimit:        -1, // No limit
+		concurrency:     1,
+	}


I don't know if this is correct

zeroshade · 2026-01-10T19:02:11Z

I'll give this a look to start reviewing, in the meantime can you fix the lint issues?

twuebi · 2026-01-13T11:58:14Z

This requires the replacing table to match the schema of the replaced table, right?

Asking since I'm interested in something like replaceTransaction from Java which is used by CREATE OR REPLACE statements

dontirun · 2026-01-13T14:51:34Z

This requires the replacing table to match the schema of the replaced table, right?

Yes

zeroshade · 2026-01-13T21:19:52Z

The spark integration tests for overwrite don't seem to be passing

dontirun · 2026-01-13T23:54:56Z

The spark integration tests for overwrite don't seem to be passing

I definitely didn't do those correctly, any guidance on whether they're structured fine or if I'm going about it the right way?

I couldn't figure out testing it locally

zeroshade · 2026-01-17T00:20:21Z

I'll try to find some time next week to take a look at the spark tests and see if I can figure it out.

zeroshade

Gave this a bit more of a closer review, let me know what you think

table/transaction.go

table/table.go

table/table_test.go

table/transaction.go

zeroshade

Overall this looks like a good first pass at this, there's some possible optimizations I can think of but we can do that in a second pass.

I have one significant comment but otherwise everything looks great! Thanks!

zeroshade · 2026-02-05T21:07:08Z

table/table.go

+// OverwriteTable is a shortcut for NewTransaction().OverwriteTable() and then committing the transaction.
+// The batchSize parameter refers to the batch size for reading the input data, not the batch size for writes.
+// The concurrency parameter controls the level of parallelism. If concurrency <= 0, defaults to runtime.GOMAXPROCS(0).
+func (t Table) OverwriteTable(ctx context.Context, tbl arrow.Table, batchSize int64, filter iceberg.BooleanExpression, caseSensitive bool, concurrency int, snapshotProps iceberg.Properties) (*Table, error) {


Update the docstring to explain what the filter actually means (i.e. how does it affect the overwrite), add the same to the function below please

zeroshade · 2026-02-05T21:13:49Z

table/transaction.go

+		return nil, nil
+	}
+
+	arrowSchema, err := SchemaToArrowSchema(t.meta.CurrentSchema(), nil, false, false)


the first return value from scanner.GetRecords is the arrow schema, so we shouldn't need to do this again here, right?

zeroshade · 2026-02-05T21:18:36Z

table/transaction.go

+	table := array.NewTableFromRecords(arrowSchema, records)
+	defer table.Release()
+
+	rdr := array.NewTableReader(table, table.NumRows())
+	defer rdr.Release()
+
+	var result []iceberg.DataFile
+	itr := recordsToDataFiles(ctx, t.tbl.Location(), t.meta, recordWritingArgs{
+		sc:        rdr.Schema(),
+		itr:       array.IterFromReader(rdr),
+		fs:        fs.(io.WriteFileIO),
+		writeUUID: &commitUUID,
+	})


you received a record iterator from scanner.GetRecords, why create a table from it, and a reader, instead of just passing in the same iterator that you got back from scanner.GetRecords?

i.e. something like:

sc, iter, err := scanner.GetRecords(ctx, []FileScanTask{*scanTask}) if err != nil { /* .... */ } dfIter := recordsToDataFiles(ctx, t.tbl.Location(), t.meta, recordWritingArgs{ sc: sc, itr: iter, fs: fs.(io.WriteFileIO), writeUUID: &commitUUID, })

In theory if there are no records, you should just get an empty list of data files at the end, right?

Oops. Big miss by me here 😅, thanks for pointing it out

zeroshade

Looks great to me, there's some optimizations we can look into for follow-ups. but this is good for a first pass at it. Thanks!

![little change](https://media1.tenor.com/m/x5h2y8ZpkUYAAAAC/it-weill-change-a-little-bit-it-will-change.gif) This makes a few tweaks to the API for the [recently added](#674) `Overwrite` API to favor options for parameters that have a reasonable sensible default and migrates them to `OverwriteOptions`. Technically, it also fixes an issue that was introduced where the caseSensitive parameter was just ignored and hardcoded to `true` in the functions that support the `Overwrite` functionality but this wasn't the primary motivation on my end. I just happened to notice it as I was refactoring.

@dontirun

![alan](https://media1.tenor.com/m/ywHbbYEidf0AAAAd/hey-groundhog.gif) This adds support for performing delete operations on a table with the addition of `Delete(ctx context.Context, filter iceberg.BooleanExpression, snapshotProps iceberg.Properties, opts ...DeleteOption)` on both transaction and table (as with the other methods, the table version just wraps creating a new transaction, doing the thing and then committing it). I did some refactoring to reuse the deletion code from the overwrite implementation and also updated the integration tests to make it easier to add validation sql statements. I hope people agree that adding a test function in `validation.py` that wraps a spark SQL statement to perform the validation is less than ideal. To keep the validation SQL closer to the test logic, I refactored `validation.py` to have a `sql` argument which allows the integration test to run the validation script and pass it the SQL needed to get the output used for the assertions. The tradeoff is that some of the validations were running two sql statements which makes for two spark sql round trips now instead of a single one. However, for most of them, the validation was redundant (doing a count and _then_ getting the full rows which implicitly also shows the number of rows) so only one test is ending up running two sql statements. Thanks to @dontirun for contributing the overwrite support in #674 which set the foundation to make this very easy since the `copy-on-write` deletion is essentially a subset of the overwrite with filter.

dontirun commented Jan 9, 2026

View reviewed changes

Arun Donti and others added 3 commits January 28, 2026 15:21

feat: table overwrite functionality

9709701

chore: linting

a16941b

the rewritten files need a separate UUID from the transaction uuid

2ffd1bb

zeroshade force-pushed the feat-overwrite branch from 92c5269 to 2ffd1bb Compare January 28, 2026 21:28

zeroshade self-requested a review January 28, 2026 22:16

zeroshade requested changes Jan 28, 2026

View reviewed changes

zeroshade and others added 2 commits January 28, 2026 18:24

fix order for rewriteUUID

6f3fb24

fix: code review comments

59e3055

dontirun commented Feb 3, 2026

View reviewed changes

table/transaction.go Show resolved Hide resolved

fix: missing updates

bf771da

dontirun requested a review from zeroshade February 3, 2026 21:30

zeroshade requested changes Feb 5, 2026

View reviewed changes

refactor: record writing + docstring changes

3c46c51

dontirun requested a review from zeroshade February 5, 2026 22:14

Arun Donti added 3 commits February 5, 2026 17:50

fix: try to fix memory leak

1b19c60

chore: linting

7022578

chore: linting

d425e47

zeroshade approved these changes Feb 6, 2026

View reviewed changes

zeroshade merged commit c2c1533 into apache:main Feb 6, 2026
11 checks passed

This was referenced Feb 6, 2026

refactor: Update Overwrite API to Use Options #717

Merged

feat(table): add support for copy-on-write delete #718

Merged

BrewTestBot mentioned this pull request Mar 5, 2026

iceberg-cli 0.5.0 Homebrew/homebrew-core#270855

Merged

Conversation

dontirun commented Jan 9, 2026

Uh oh!

dontirun Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade commented Jan 10, 2026

Uh oh!

twuebi commented Jan 13, 2026

Uh oh!

dontirun commented Jan 13, 2026

Uh oh!

zeroshade commented Jan 13, 2026

Uh oh!

dontirun commented Jan 13, 2026

Uh oh!

zeroshade commented Jan 17, 2026

Uh oh!

zeroshade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zeroshade left a comment

Choose a reason for hiding this comment

Uh oh!

zeroshade Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

dontirun Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants