Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jul 1, 2025

Which issue does this PR close?

Rationale for this change

This minor patch adds some comments to ObjectBuilder and ListBuilder and fixes one existing test.

This patch makes sure ObjectBuilder and ListBuilder to be finalized before its parent builder is finalized. If finish is forgotten to be called on them, the compiler will show error.

What changes are included in this PR?

Are these changes tested?

Updated tests.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jul 1, 2025
Comment on lines +1513 to +1515
inner_list.finish();
outer_list.finish();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bug. Caught by the change. These builders were not called finish.

Copy link
Contributor

@alamb alamb Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome -- i also think it is a great example of how this change improves the variant builders

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the test is still broken/useless, if it lacks any checking to verify the builder actually produced what it was asked to build?

Copy link
Member Author

@viirya viirya Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test just wants to test field validation for object builders, one failure case, one success case. It doesn't care about what it actually builds.

I said "bug" doesn't mean that the test is useless but it lacks the finish calls making it looks not following the insert - finish pattern that is the what the issue proposed to worry about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, it can achieve the effort to address the original issue without the worries.

it does seem like the empty drop helps this one usecase

I will try and write a test that shows what happens when a builder isn't finishd and we can evaluate the issues

/// Finalize object
///
/// This consumes self and writes the object to the parent buffer.
pub fn finish(mut self) -> Result<(), ArrowError> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't actually remove finish as there are still examples where the builders are in same scope with top-level builder. So explicit call of finish is still necessary for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's said, and, for the case these builders are in the same scope as the top-level builder, after this PR, if such a builder is not called finish, it will be caught by the compiler now, e.g.,

error[E0505]: cannot move out of `builder` because it is borrowed
    --> parquet-variant/src/builder.rs:1510:9
     |
1469 |         let mut builder = VariantBuilder::new().with_validate_unique_fields(true);
     |             ----------- binding `builder` declared here
...
1485 |         let mut outer_list = builder.new_list();
     |                              ------- borrow of `builder` occurs here
...
1510 |         builder.finish();
     |         ^^^^^^^ move out of `builder` occurs here
1511 |     }
     |     - borrow might be used here, when `outer_list` is dropped and runs the `Drop` code for type `ListBuilder`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds like a great idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! The compiler is normally willing to end the lifetime of the orphaned outer_list just before the builder.finish call, but providing an impl Drop somehow force-extends the lifetime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler is normally willing to end the lifetime of the orphaned outer_list just before the builder.finish call

No, outer_list's lifetime normally will be ended (if no finish call) after the scope as usual. That's why the compiler complains because it borrows builder. So builder.finish call cannot be happened before the end of outer_list lifetime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That only happens if there's an impl Drop tho:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=fe049aba262d590948887ee08f43618c

Somehow, the compiler is able to shorten the lifetime of outer_list if impl Drop doesn't get in the way.

Copy link
Member Author

@viirya viirya Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That only happens if there's an impl Drop tho:

Isn't what this PR does?

///
/// This doesn't consume self but just writes the object to the parent buffer.
fn _finish(&mut self) -> Result<(), ArrowError> {
assert!(!self.finished, "ObjectBuilder has already been finished");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents that dropping a finished builder causes re-finishing it.

@viirya
Copy link
Member Author

viirya commented Jul 1, 2025

cc @alamb

@alamb
Copy link
Contributor

alamb commented Jul 1, 2025

👋 @viirya -- I am checking this one out

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR is I think it is great. Thank you @viirya

I also wrote up some additional documentation and examples for your consideration here:

Comment on lines +1513 to +1515
inner_list.finish();
outer_list.finish();

Copy link
Contributor

@alamb alamb Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome -- i also think it is a great example of how this change improves the variant builders

@alamb
Copy link
Contributor

alamb commented Jul 1, 2025

FYI @friendlymatthew and @scovich and @micoo227 who just added the validation code today!

Add comments and examples to clarify panic behavior on drop
@viirya
Copy link
Member Author

viirya commented Jul 1, 2025

I also wrote up some additional documentation and examples for your consideration here:

Thank you. I merged it.

@scovich
Copy link
Contributor

scovich commented Jul 2, 2025

This change makes me nervous, after having been bitten in the past -- multiple times in multiple projects and multiple languages -- by unconditional finalizations like this that don't consider whether the the finalization is actually desirable. The most recent was a bad interaction between spark and hadoop, where the upload stream's unconditional auto-close behavior meant that early return due to an exception uploaded a partial object to cloud storage (= a torn write that caused havoc to readers who don't expect torn writes in cloud storage to be possible). The offending code was buried many levels deep, which made it a real treat to root-cause, and even more "fun" to workaround.

Having any object attempt to "finish" in spite of a panic=unwind, or even a ? return would be unexpected and unwelcome behavior IMO. And changing a Result into a panic inside drop glue is even worse (double fault risk).

If the worry is that somebody might forget to invoke the finish method... can we try to figure out other ways to address it, which don't resort to implicit/spooky behavior?

@viirya
Copy link
Member Author

viirya commented Jul 2, 2025

Making sense. I forgot that this is not a good practice to put the finalization login into Drop under panic=unwind.

@viirya
Copy link
Member Author

viirya commented Jul 2, 2025

I reverted the Drop impl and related comments and tests added by @alamb. Just keep some comments unrelated to Drop and the small fix to the original tests.

We can figure out the other ways to address that.

@viirya viirya changed the title Finalize ObjectBuilder and ListBuilder in their Drop impl Add comments to ObjectBuilder and ListBuilder and fix test Jul 2, 2025
@viirya
Copy link
Member Author

viirya commented Jul 2, 2025

Hmm, actually, with an empty Drop, it can catch up the bug in the test like I mentioned in #7843 (comment). So it doesn't do any "finish" stuff or panic in Drop.

In other words, it can achieve the effort to address the original issue without the worries.

@viirya viirya changed the title Add comments to ObjectBuilder and ListBuilder and fix test Make sure ObjectBuilder and ListBuilder to be finalized before its parent builder Jul 2, 2025
Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm neutral about this change. Not convinced it's actually useful in practice, but it also seems harmless enough so there's no reason to block it.

Against: Outside unit tests, recursive builders will almost always live in a nested scope vs. their parent -- usually a called function or method, or at least a match clause -- which would trivially defeat the drop sentinel.

For: The sentinel is helpful when it is able to catch an issue, and harmless the rest of the time. In the uncommon case that somebody actually wants to avoid calling finish, they can just drop the builder (which also documents the intent).

For: Like it or not, unit tests will end up being a very significant consumer of the variant API, so "quality of life" features that make unit tests easier to write and maintain are not necessarily a bad thing.

Against: Suppose a unit test does have a bug in forgetting to finish a builder. I'm very unclear on what was being tested, if the resulting variant wasn't checked in a way that would notice big swaths of missing data? Even if the drop sentinel catches it, the test is still arguably broken/useless due to incomplete verification of what it claims to test.

Against: A quick grep over the rest of arrow-rs suggests that none of the other crates' builders employ drop sentinels like this. The only matches for impl.* Drop are for buffers and FFI wrappers (to manually delete internal pointers).

/// Finalize object
///
/// This consumes self and writes the object to the parent buffer.
pub fn finish(mut self) -> Result<(), ArrowError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! The compiler is normally willing to end the lifetime of the orphaned outer_list just before the builder.finish call, but providing an impl Drop somehow force-extends the lifetime?

Comment on lines +1513 to +1515
inner_list.finish();
outer_list.finish();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the test is still broken/useless, if it lacks any checking to verify the builder actually produced what it was asked to build?

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

So I personally prefer the automatic finalize on drop behavior -- I can understand how that behavior can cause trouble when external resources are affected but in this case I think it makes this code much easier to use for most cases.

If we don't finalize the object on drop of the builder, for example, the resulting Variant object built by the builder seems to be malformed. Here is an example that would work with finalize on drop, but not with this PR as is

#[test]
    fn test_object_list_no_finish() {
        let mut builder = VariantBuilder::new();

        let mut list_builder = builder.new_list();

        {
            let mut object_builder = list_builder.new_object();
            object_builder.insert("id", 1);
            object_builder.finish().unwrap();
        }

        {
            let mut object_builder = list_builder.new_object();
            object_builder.insert("id", 2);
            // NOTE object_builder.finish() is not called here
        }

        list_builder.finish();

        let (metadata, value) = builder.finish();

        let variant = Variant::try_new(&metadata, &value).unwrap();
        let list = variant.as_list().unwrap();
}

If object_builder.finish() is not called, then this line fails:

        let list = variant.as_list().unwrap();

The error is like this (which is pretty unintuitive to me):

called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Received empty bytes")
thread 'builder::tests::test_object_list_no_finish' panicked at parquet-variant/src/builder.rs:1154:59:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Received empty bytes")
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:697:5
   1: core::panicking::panic_fmt
             at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/panicking.rs:75:14
   2: core::result::unwrap_failed
             at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/result.rs:1732:5
   3: core::result::Result<T,E>::unwrap
             at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1137:23
   4: parquet_variant::builder::tests::test_object_list_no_finish
             at ./src/builder.rs:1154:23
   5: parquet_variant::builder::tests::test_object_list_no_finish::{{closure}}
             at ./src/builder.rs:1133:36
   6: core::ops::function::FnOnce::call_once
             at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

So TLDR I still prefer the finalize on drop behavior

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

I'm neutral about this change. Not convinced it's actually useful in practice, but it also seems harmless enough so there's no reason to block it.

Against: Outside unit tests, recursive builders will almost always live in a nested scope vs. their parent -- usually a called function or method, or at least a match clause -- which would trivially defeat the drop sentinel.

I don't understand what you mean by defeat the drop sentinel. Since the ObjectBuilder has a mut reference to its parent, I don't think the rust compiler will allow anything to the same underlying parent builder

Maybe I am missing something here and an example would help 🤔

For: The sentinel is helpful when it is able to catch an issue, and harmless the rest of the time. In the uncommon case that somebody actually wants to avoid calling finish, they can just drop the builder (which also documents the intent).

My example above shows that not dropping the builder can lead to a corrupted output so I don't think we support the case of avoiding calling finish without some more work (to be clear I am not sure the "rollback a partially built object" s a valuable API yet)

For: Like it or not, unit tests will end up being a very significant consumer of the variant API, so "quality of life" features that make unit tests easier to write and maintain are not necessarily a bad thing.

Against: Suppose a unit test does have a bug in forgetting to finish a builder. I'm very unclear on what was being tested, if the resulting variant wasn't checked in a way that would notice big swaths of missing data? Even if the drop sentinel catches it, the test is still arguably broken/useless due to incomplete verification of what it claims to test.

I agree the tests could be improved and we should add a test showing what happens if you don't call finish (I have an example above)

Against: A quick grep over the rest of arrow-rs suggests that none of the other crates' builders employ drop sentinels like this. The only matches for impl.* Drop are for buffers and FFI wrappers (to manually delete internal pointers).

Another difference to the other builders is they don't have this model of nested nested builders -- so there is no "parent" state to finalize. Variant is somewhat different than the other arrow types because its nesting level camn be arbitary, which is what these nested builders allow)

@scovich
Copy link
Contributor

scovich commented Jul 3, 2025

Against: Outside unit tests, recursive builders will almost always live in a nested scope vs. their parent -- usually a called function or method, or at least a match clause -- which would trivially defeat the drop sentinel.

I don't understand what you mean by defeat the drop sentinel. Since the ObjectBuilder has a mut reference to its parent, I don't think the rust compiler will allow anything to the same underlying parent builder

Maybe I am missing something here and an example would help 🤔

From what I understood, the example you gave above would trivially defeat an impl Drop sentinel (= empty drop method), because the object_builder goes out of scope before the call to list_builder.finish():

        let mut builder = VariantBuilder::new();
        let mut list_builder = builder.new_list();

        {
            let mut object_builder = list_builder.new_object();
            object_builder.insert("id", 1);
            // NOTE object_builder.finish() is not called here
        }

        list_builder.finish();
        let (metadata, value) = builder.finish();        

@scovich
Copy link
Contributor

scovich commented Jul 3, 2025

For: The sentinel is helpful when it is able to catch an issue, and harmless the rest of the time. In the uncommon case that somebody actually wants to avoid calling finish, they can just drop the builder (which also documents the intent).

My example above shows that not dropping the builder can lead to a corrupted output

Doesn't that mean our current implementation is not even safe in the presence of early return by e.g. ?, let alone UnwindSafe? Any error at any nesting depth must be treated as unrecoverable, and the entire variant builder discarded? e.g. instead of this:

        {
            let mut object_builder = list_builder.new_object();
            object_builder.insert("id", 2);
            // NOTE object_builder.finish() is not called here
        }

imagine this:

        fn try_append(...) -> Result<(), ArrowError> {
            let mut object_builder = list_builder.new_object();
            object_builder.insert("id", 2);

            // some fallible operation whose `?` could return early

            object_builder.finish()?; // an err here also renders the parent unusable
            Ok(())
        }

With the current behavior, the caller of try_append is forced to fail the entire variant encoding attempt, for fear that it might have been left in an inconsistent state. And if they forget to treat some Err as fatal... boom. At a minimum, it seems like we should "poison" the parent builder so it knows this happened and its (now fallible) finish can fail cleanly? It should also refuse to create new sub-builders or append new values, to avoid wasting work. But that's a lot of work to code up...

I don't think we support the case of avoiding calling finish without some more work (to be clear I am not sure the "rollback a partially built object" s a valuable API yet)

Honestly, now that the nested builders store their bytes in a separate vec, I would have expected rollback safety for free. But I guess we're still storing something in the parent vec. I would imagine that's easily addressed by having the constructor grab a marker, which finish advances; then drop can simply truncate the vec to that mark. Seems simpler than safely poisoning it, and safer than relying on callers to maintain an invisible (and unusual) error handling invariant?

No need to remove extra entries from the metadata dictionary, they're harmless.

@scovich
Copy link
Contributor

scovich commented Jul 3, 2025

So I personally prefer the automatic finalize on drop behavior -- I can understand how that behavior can cause trouble when external resources are affected but in this case I think it makes this code much easier to use for most cases.

Relating to the ?-safety and unwind safety comment above: unconditionally finalizing on drop (hopefully) produces a physically valid variant -- but the result is still a logically invalid "torn write". That kind of spookiness is really hard to reason around when coding, and even harder to triage when something goes wrong. As I mentioned, I've hit this issue many times over the years, in various forms. It's always surprising and never in a good way.

@scovich
Copy link
Contributor

scovich commented Jul 3, 2025

Honestly, now that the nested builders store their bytes in a separate vec, I would have expected rollback safety for free. But I guess we're still storing something in the parent vec. I would imagine that's easily addressed by having the constructor grab a marker, which finish advances; then drop can simply truncate the vec to that mark. Seems simpler than safely poisoning it, and safer than relying on callers to maintain an invisible (and unusual) error handling invariant?

Looking at the code, I think the bad behavior comes from the pending concept that the list and object builders employ. In a sense, they are unconditionally finalizing the builders (any subsequent method call checks for a pending builder)... but the finalization is only partial if the builder didn't actually finalize. Meanwhile, the variant builder itself lacks any such "pending" mechanism, which likely allows a different set of problems. In fact, it looks like the variant builder allows to "append" any number of variant values -- almost a column-oriented builder concept, but lacking any mechanism for tracking objects and the boundaries between them.

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

Relating to the ?-safety and unwind safety comment above: unconditionally finalizing on drop (hopefully) produces a physically valid variant -- but the result is still a logically invalid "torn write". That kind of spookiness is really hard to reason around when coding, and even harder to triage when something goes wrong. As I mentioned, I've hit this issue many times over the years, in various forms. It's always surprising and never in a good way.

The idea of a logically lost object being hard to debug makes sense to me. Thank you -- I missed that

Propsal

  1. Add a function like ObjectBuilder::rollback to explicitly "cancel" the currently in progress object
  2. Implement ObjectBuilder::drop such that it calls ObjectBuilder::rollback if ObjectBuilder::build() has not been called

That way:

  1. No way to get accidentally completed ("torn") objects
  2. No panics on drop
  3. No partially / corrupt objects if finish isn't called

To be clear, it doesn't mean that @viirya needs to fix this in this PR (though if he does that would be great)

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

Also, I filed a separate ticket for corrupted objects if build() is not called;
-#7863

@alamb alamb self-requested a review July 3, 2025 20:05
@alamb alamb changed the title Make sure ObjectBuilder and ListBuilder to be finalized before its parent builder [Variant] Make sure ObjectBuilder and ListBuilder to be finalized before its parent builder Jul 3, 2025
@viirya
Copy link
Member Author

viirya commented Jul 3, 2025

To be clear, it doesn't mean that @viirya needs to fix this in this PR (though if he does that would be great)

Sounds great to me. Is @scovich okay for this proposal? If so, I can work it on this PR too.

@scovich
Copy link
Contributor

scovich commented Jul 3, 2025

To be clear, it doesn't mean that @viirya needs to fix this in this PR (though if he does that would be great)

Sounds great to me. Is @scovich okay for this proposal? If so, I can work it on this PR too.

Oi... I didn't see this comment, and I already have a PR almost ready to go.

@scovich
Copy link
Contributor

scovich commented Jul 3, 2025

NOTE: The sentinel impl Drop in this PR will still be useful -- it will detect the missing finish calls in the new unit tests I'm adding (requiring them to drop the unfinished builders).

@viirya
Copy link
Member Author

viirya commented Jul 4, 2025

Then maybe we can merge this now?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this one in and continue on #7865

@alamb alamb merged commit 13d79b3 into apache:main Jul 4, 2025
12 checks passed
@viirya
Copy link
Member Author

viirya commented Jul 4, 2025

Thanks @alamb @scovich

@viirya viirya deleted the builder_drop branch July 4, 2025 19:13
alamb pushed a commit that referenced this pull request Jul 7, 2025
)

# Which issue does this PR close?

- Closes #7863
- Closes #7798

# Rationale for this change

Reviews and testing on #7843
exposed the fact that creating a variant list or object builder has side
effects that leave the parent in an inconsistent/invalid state if the
child builder is never finalized. Rework the finalization logic to be
more direct so that child builders have no effect on their parents
before their `finish` method is called.

# What changes are included in this PR?

* Define a new `ParentState` enum that tracks the necessary information
for a child to fully finalize its parent.
* Remove the `pending` machinery from builders

# Are these changes tested?

Existing unit tests mostly cover this change. 

Added new tests to verify that failing to call `finish` leaves the
parent unmodified.

# Are there any user-facing changes?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Remove explicit ObjectBuilder::finish() and ListBuilder::finish and move to Drop impl

3 participants