Skip to content

persist: address a batch of encoding TODOs#9338

Merged
danhhz merged 6 commits intoMaterializeInc:mainfrom
danhhz:persist_meta_todos
Nov 29, 2021
Merged

persist: address a batch of encoding TODOs#9338
danhhz merged 6 commits intoMaterializeInc:mainfrom
danhhz:persist_meta_todos

Conversation

@danhhz
Copy link
Copy Markdown
Contributor

@danhhz danhhz commented Nov 29, 2021

See individual commits for details.

Motivation

  • This PR refactors existing code.

Cleanups in prep for protobuf encoding of BlobMeta.

Tips for reviewer

I left the unsealed_ts_upper here as a TODO because the rest of this PR is really straightforward refactoring and I wanted to keep it really easy to review. I'll get to it this week.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • [N/A] This PR adds a release note for any
    user-facing behavior changes.

It's derivable from id_mapping and graveyard and so unnecessary.
Our usage of Description for data timestamps in trace batches is
correct, but somewhat just because it was there, we also used it for
declaring SeqNo bounds of unsealed batches. In that role, it's correct,
but unnecessary. First, we never compact anything on a SeqNo basis, so
the since is always "zero" (the single element antichain of the minimum
SeqNo). Second, SeqNos are totally ordered and always will be, so
there's no need for the lower and upper bounds to be Antichain (as
Description hardcodes).

In theory, Description is just a superset of the information we need to
represent and our usage of it was correct. In practice, the Antichain
caused some bits of code to be unnecessarily complicated and we probably
weren't being defensive about erroring on non-"zero" sinces.
To mirror the previous merge of the in-mem Unsealed and Trace into
Arrangement.
@danhhz danhhz requested a review from ruchirK November 29, 2021 17:09
@ruchirK
Copy link
Copy Markdown
Contributor

ruchirK commented Nov 29, 2021

haven't looked at this in detail, but one small thought here is -- maybe do we want to bump meta/update the golden test on each commit? I guess the two cases I'm thinking about are:

  1. In case we want to revert a subset of the commits for some reason it would be a bit simpler
  2. In case someone else tries to build and run MZ with a subset of the commits here that doesn't include the final commit

I dunno how compelling either of those reasons are, and I know bumping meta/updating golden each time is tedious. Wasn't sure if doing it each time also came with other downsides.

No strong preference either way, so feel free to leave it as is if you feel strongly

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Nov 29, 2021

I was 50/50 on this. Tbh, neither of those reasons are particularly compelling to me, but it's easy enough to switch that 51/49 is worth it. I'll have to force push it though, there's no way to model this as fixup commits, have you started reviewing yet? If so, I'll just do it after the LGTM

@ruchirK
Copy link
Copy Markdown
Contributor

ruchirK commented Nov 29, 2021

I'll review asap -- gimme like 15 minutes (ive already seen most of these changes so it shouldn't take long)

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Nov 29, 2021

  1. In case we want to revert a subset of the commits for some reason it would be a bit simpler

Of the two, this is more compelling. In theory I agree with it. In practice, no one is going to want to revert any of these commits individually. Setting a good precedent is probably where I go from 50/50 to 51/49 on this :)

  1. In case someone else tries to build and run MZ with a subset of the commits here that doesn't include the final commit

This to me is a non-goal. I have a reasonably strong opinion that the unit of atomic work is a PR, not a commit. I suppose this also negates (1) a bit, but I'm still willing to change it.

Copy link
Copy Markdown
Contributor

@ruchirK ruchirK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! First 4 commits and last commit definitely look good -- id love to do another quick pass over the ArrangementMeta commit after lunch but it seems totally reasonable. feel free to force push!

/// - The compaction level of trace_batches is weakly decreasing when iterating
/// from oldest to most recent time intervals.
/// - Every trace_batch's upper is <= the overall trace's seal frontier.
/// - TODO: key uniqueness invariants?
Copy link
Copy Markdown
Contributor

@ruchirK ruchirK Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this todo is a bit stale -- we already enforce that all blob keys are globally unique across all arrangements in BlobMeta

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 Totally blanked on this, I'll fix it in the next PR.

@ruchirK
Copy link
Copy Markdown
Contributor

ruchirK commented Nov 29, 2021

This to me is a non-goal. I have a reasonably strong opinion that the unit of atomic work is a PR, not a commit.

I think I agree with that, with the one exception that commits that crash on startup could make git bisection difficult unless theres a way to git bisect across pr boundaries (maybe that difficulty is already baked in with how we handle old versions of meta idk?)

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Nov 29, 2021

I think I agree with that, with the one exception that commits that crash on startup could make git bisection difficult unless theres a way to git bisect across pr boundaries (maybe that difficulty is already baked in with how we handle old versions of meta idk?)

git bisect does exactly what you want with merge commits. Since we use either a merge commit or a squash for all our PRs, it does this now! One thing git got totally completely right

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Nov 29, 2021

I just went and did this rebase, but didn't push it because I think I've changed my mind. Thoughts on the following?

  • Bumping this in every commit makes it a big hassle if I have to rebase this to merge it. For example, we might want to hold up persist: actually use columnar encoding for unsealed batches. #9239 on top of this merging, which is a practice I'd like to keep to a minimum
  • Bumping it at every commit churns the golden_test.json file, which needlessly adds to the size of the git repo
  • Being at a high version number might create an impression of persist volatility
  • We have a limited number of versions before this has to get expanded to more than 1 byte. I've seen cases where the original author thought they'd never run out and then something unforeseen meant they did. It seems unlikely that this would be the case for us here and that we'd ever run out of 256 of them given that we're hoping to make this actually backward compatible soon, but I'm still somewhat uncomfortable with using them recklessly

@ruchirK
Copy link
Copy Markdown
Contributor

ruchirK commented Nov 29, 2021

Sounds like we should leave it as is. The reasons for changing were never super strong, and I think I'm most convinced by the fact that the version bumps we have available are a finite resource and it would be genuinely annoying to exhaust that.

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Nov 29, 2021

Did you get a chance to look at the ArrangementMeta commit to your satisfaction?

@ruchirK
Copy link
Copy Markdown
Contributor

ruchirK commented Nov 29, 2021

Yup I did! sorry should have mentioned that earlier

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Nov 29, 2021

No worries! TFTR!

This is a single bump to cover the other commits in this PR.
@danhhz danhhz force-pushed the persist_meta_todos branch from 81d9e0c to ea74a45 Compare November 29, 2021 21:02
@danhhz danhhz enabled auto-merge November 29, 2021 21:02
@danhhz danhhz merged commit 3a3897e into MaterializeInc:main Nov 29, 2021
@danhhz danhhz deleted the persist_meta_todos branch November 29, 2021 21:54
@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Nov 30, 2021

Doh! I was supposed to hold this until after the 0.11.0 release got cut but totally brain farted on it. Opening up a revert now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants