Skip to content

bulk-io: Immutable.VisibleColumns responsible for over half of heap allocated memory in IMPORT #55842

@nvb

Description

@nvb

When running a large import, I see two nearby calls to tabledesc.(*Immutable).VisibleColumns responsible for 54% of all heap-allocated space on one of the machines. This seems pretty bad – to put this in context, we only allocate about a 1/10th of this amount of heap memory in service of AddSSTable requests.

Screen Shot 2020-10-21 at 10 29 56 PM

Type: alloc_space
Time: Oct 22, 2020 at 2:27am (UTC)
Showing nodes accounting for 1923370MB, 100% of 1923370MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
                                       522060.88MB 50.05% |   github.com/cockroachdb/cockroach/pkg/ccl/importccl.(*csvRowProducer).Row /go/src/github.com/cockroachdb/cockroach/pkg/ccl/importccl/read_import_csv.go:146
                                       520953.11MB 49.95% |   github.com/cockroachdb/cockroach/pkg/ccl/importccl.(*csvRowProducer).Row /go/src/github.com/cockroachdb/cockroach/pkg/ccl/importccl/read_import_csv.go:138
                                            2.51MB 0.00024% |   github.com/cockroachdb/cockroach/pkg/sql/row.NewDatumRowConverter /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/row_converter.go:266
1043016.50MB 54.23% 54.23% 1043016.50MB 54.23%                | github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc.(*Immutable).VisibleColumns /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc/structured.go:3456
----------------------------------------------------------+-------------
                                        82425.83MB   100% |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb.(*ReplicatedEvalResult).Unmarshal /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb/proposer_kv.pb.go:2699
82425.83MB  4.29% 58.51% 82425.83MB  4.29%                | github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb.(*ReplicatedEvalResult_AddSSTable).Unmarshal /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb/proposer_kv.pb.go:2906
----------------------------------------------------------+-------------

The two calls to VisibleColumns are both in csvRowProducer.Row:

// TODO(anzoteh96): this should really be only checked once per import instead of every row.
for _, col := range p.importCtx.tableDesc.VisibleColumns() {
if col.IsComputed() {
return nil,
errors.Newf(
"IMPORT CSV with computed column %q requires targeted column specification",
col.Name)
}
}
expectedColsLen = len(p.importCtx.tableDesc.VisibleColumns())

We should address the TODO here and avoid these calls. There's no reason that each row should have to allocate two or even one []descpb.ColumnDescriptor. ColumnDescriptors are large, and tables can have many columns.

We could also consider caching this filtered slice on the tabledesc.Immutable struct, though we'd need to confirm that doing so is worth the cost. We already store two almost identical []ColumnDescriptor slices on tabledesc.Immutable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-disaster-recoveryC-performancePerf of queries or internals. Solution not expected to change functional behavior.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions