-
Notifications
You must be signed in to change notification settings - Fork 4.1k
bulk-io: Immutable.VisibleColumns responsible for over half of heap allocated memory in IMPORT #55842
Description
When running a large import, I see two nearby calls to tabledesc.(*Immutable).VisibleColumns responsible for 54% of all heap-allocated space on one of the machines. This seems pretty bad – to put this in context, we only allocate about a 1/10th of this amount of heap memory in service of AddSSTable requests.
Type: alloc_space
Time: Oct 22, 2020 at 2:27am (UTC)
Showing nodes accounting for 1923370MB, 100% of 1923370MB total
----------------------------------------------------------+-------------
flat flat% sum% cum cum% calls calls% + context
----------------------------------------------------------+-------------
522060.88MB 50.05% | github.com/cockroachdb/cockroach/pkg/ccl/importccl.(*csvRowProducer).Row /go/src/github.com/cockroachdb/cockroach/pkg/ccl/importccl/read_import_csv.go:146
520953.11MB 49.95% | github.com/cockroachdb/cockroach/pkg/ccl/importccl.(*csvRowProducer).Row /go/src/github.com/cockroachdb/cockroach/pkg/ccl/importccl/read_import_csv.go:138
2.51MB 0.00024% | github.com/cockroachdb/cockroach/pkg/sql/row.NewDatumRowConverter /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/row_converter.go:266
1043016.50MB 54.23% 54.23% 1043016.50MB 54.23% | github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc.(*Immutable).VisibleColumns /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc/structured.go:3456
----------------------------------------------------------+-------------
82425.83MB 100% | github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb.(*ReplicatedEvalResult).Unmarshal /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb/proposer_kv.pb.go:2699
82425.83MB 4.29% 58.51% 82425.83MB 4.29% | github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb.(*ReplicatedEvalResult_AddSSTable).Unmarshal /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb/proposer_kv.pb.go:2906
----------------------------------------------------------+-------------
The two calls to VisibleColumns are both in csvRowProducer.Row:
cockroach/pkg/ccl/importccl/read_import_csv.go
Lines 137 to 146 in 40b7942
| // TODO(anzoteh96): this should really be only checked once per import instead of every row. | |
| for _, col := range p.importCtx.tableDesc.VisibleColumns() { | |
| if col.IsComputed() { | |
| return nil, | |
| errors.Newf( | |
| "IMPORT CSV with computed column %q requires targeted column specification", | |
| col.Name) | |
| } | |
| } | |
| expectedColsLen = len(p.importCtx.tableDesc.VisibleColumns()) |
We should address the TODO here and avoid these calls. There's no reason that each row should have to allocate two or even one []descpb.ColumnDescriptor. ColumnDescriptors are large, and tables can have many columns.
We could also consider caching this filtered slice on the tabledesc.Immutable struct, though we'd need to confirm that doing so is worth the cost. We already store two almost identical []ColumnDescriptor slices on tabledesc.Immutable.
