Skip to content

execinfra: prefer leased table descriptors#74722

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
postamar:use-leased-descs-in-flow
Jan 12, 2022
Merged

execinfra: prefer leased table descriptors#74722
craig[bot] merged 1 commit intocockroachdb:masterfrom
postamar:use-leased-descs-in-flow

Conversation

@postamar
Copy link
Copy Markdown

This commit makes better use of the table descriptor protos in the
DistSQL specs by first checking if they've already been leased. If so,
then we use those instead of re-generating catalog.TableDescriptor
instances.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

}
if ctx != nil && ctx.Descriptors != nil {
leased, _ := ctx.Descriptors.GetLeasedTableByID(ctx.EvalCtx.Ctx(), ctx.Txn, desc.ID)
if leased != nil && leased.GetVersion() == desc.Version {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is checking the version even necessary at all?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that it is, yes. Or, at least, I certainly don't want to have to convince myself it's not. The alternative is that the descriptor is either newer or older.

It's worth noting that much hand-wringing was done to convince ourselves that it's safe to use enums that are out of sync because the values are always there if they're going to be used. That made me uneasy at the time. I worry about more complex cross-descriptor interactions.

My preference in the longer term is to encode id and version pairs for all needed descriptors and then make it cheap to retrieve them in one go. Generally they'll be cached. That way we'd have nice invariants that all of the exact descriptors matched across machines.

The edge case is if the descriptor hasn't been committed yet. In that case we'll still need to serialize it all. That fact has always made me sort of sad.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this information, I was also wondering about the enums. I agree with your general assessment.

@postamar
Copy link
Copy Markdown
Author

postamar commented Jan 12, 2022

name                    old time/op    new time/op    delta
KV/Scan/SQL/rows=1-16      190µs ± 7%     184µs ± 4%   -3.60%  (p=0.016 n=10+8)
KV/Scan/SQL/rows=10-16     196µs ± 4%     198µs ±12%     ~     (p=0.762 n=8+10)

name                    old alloc/op   new alloc/op   delta
KV/Scan/SQL/rows=1-16     19.5kB ± 1%    17.4kB ± 1%  -11.12%  (p=0.000 n=9+10)
KV/Scan/SQL/rows=10-16    21.0kB ± 1%    18.9kB ± 1%  -10.20%  (p=0.000 n=10+10)

name                    old allocs/op  new allocs/op  delta
KV/Scan/SQL/rows=1-16        222 ± 0%       210 ± 1%   -5.59%  (p=0.000 n=7+10)
KV/Scan/SQL/rows=10-16       256 ± 0%       244 ± 0%   -4.84%  (p=0.000 n=10+7)

@postamar postamar force-pushed the use-leased-descs-in-flow branch 2 times, most recently from 1cd6d1e to cefc87c Compare January 12, 2022 03:25
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 🚀

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @postamar, and @yuzefovich)


pkg/sql/colfetcher/colbatch_scan.go, line 203 at r2 (raw file):

	limitHint := rowinfra.RowLimit(execinfra.LimitHint(spec.LimitHint, post))
	// TODO(ajwerner): The need to construct an immutable here

This TODO can now be removed, right?


pkg/sql/colfetcher/index_join.go, line 409 at r2 (raw file):

	}

	// TODO(ajwerner): The need to construct an immutable here

ditto

@postamar
Copy link
Copy Markdown
Author

Yes, good catch on those TODOs, they flew under my nose. I'm still messing around with this, btw, and I'll keep the microbenchmark results updated.

This commit makes better use of the table descriptor protos in the
DistSQL specs by first checking if they've already been leased. If so,
then we use those instead of re-generating catalog.TableDescriptor
instances.

This has a statistically-significant impact on memory allocations, as
illustrated with this microbenchmark which I ran on my development
machine:

    name                    old time/op    new time/op    delta
    KV/Scan/SQL/rows=1-16      190µs ± 7%     184µs ± 4%   -3.60%  (p=0.016 n=10+8)
    KV/Scan/SQL/rows=10-16     196µs ± 4%     198µs ±12%     ~     (p=0.762 n=8+10)

    name                    old alloc/op   new alloc/op   delta
    KV/Scan/SQL/rows=1-16     19.5kB ± 1%    17.4kB ± 1%  -11.12%  (p=0.000 n=9+10)
    KV/Scan/SQL/rows=10-16    21.0kB ± 1%    18.9kB ± 1%  -10.20%  (p=0.000 n=10+10)

    name                    old allocs/op  new allocs/op  delta
    KV/Scan/SQL/rows=1-16        222 ± 0%       210 ± 1%   -5.59%  (p=0.000 n=7+10)
    KV/Scan/SQL/rows=10-16       256 ± 0%       244 ± 0%   -4.84%  (p=0.000 n=10+7)

This change opens us to the possibility of no longer shipping the whole
table descriptor proto in the DistSQL spec.

Release note: None
@postamar postamar force-pushed the use-leased-descs-in-flow branch from cefc87c to b4e85a5 Compare January 12, 2022 16:47
@postamar
Copy link
Copy Markdown
Author

I made only very minor changes since the reviews: removed the TODOs and removed an unnecessary cast that the linter was complaining about. I'll merge as soon as this passes CI.

This change now opens the door to replacing the whole table descriptor proto in the specs with an id+version pair. This ought to have quite an effect in reducing the footprint of proto de/serialization! This requires some version gating and improvements to the catalog API. The latter (batching + retrieval by version in addition to by ID) is stuff that Schema needs to do anyway at some point.

@postamar postamar marked this pull request as ready for review January 12, 2022 16:52
@postamar postamar requested a review from a team January 12, 2022 16:52
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: this is a big win on its own

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @postamar, and @yuzefovich)

@postamar
Copy link
Copy Markdown
Author

Thanks for the reviews!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 12, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 12, 2022

Build succeeded:

@craig craig bot merged commit 34aefc2 into cockroachdb:master Jan 12, 2022
@nvb
Copy link
Copy Markdown
Contributor

nvb commented Jan 12, 2022

Very nice! Does this resolve #74352 on its own, or would you also like to address the "replacing the whole table descriptor proto in the specs with an id+version pair" point first?

@postamar
Copy link
Copy Markdown
Author

Thanks! No, I think it only resolves part of it. Indeed, the other part would be not shipping around the whole table descriptor proto in the first place, which is a lot more work than I'm prepared to do right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants