Memory consumption during replication

I see some unexpected memory consumption during replication process. This is not reproduced on dev-env but can be seen on other environments: hardware or virtual machines.

If node considers the object has not enough [replicas](https://github.com/nspcc-dev/neofs-node/blob/f4c3e40f47bf4eb9c2e6831c28e0056b5212babc/pkg/services/policer/check.go#L241), it [reads](https://github.com/nspcc-dev/neofs-node/blob/f4c3e40f47bf4eb9c2e6831c28e0056b5212babc/pkg/services/replicator/process.go#L30) it into memory to put it into others container nodes. 

I reduced `replicator.pool_size` down to `1` and set more aggressive GC setting (GOGC=20). However after some time I see that some object are keep stored in the memory for longer than I expect. 

```
heap profile: 63: 803864784 [36195: 73107425384] @ heap/1048576
6: 402702336 [122: 8188280832] @ 0x4d8e0b 0xc4ca2e 0xc520b5 0xc899b8 0xc89e9c 0xc897ae 0xc9a574 0xca48a8 0xc99e65 0xc99ad6 0xc939ef 0xc99a39 0xc9aa3f 0xd0d4c6 0xe12576 0xe112fa 0xe138bb 0xbcc0f7 0x46b921
#	0x4d8e0a	os.ReadFile+0xea												os/file.go:693
#	0xc4ca2d	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/fstree.(*FSTree).Get+0xad			github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/fstree/fstree.go:304
#	0xc520b4	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor.(*BlobStor).Get+0x2f4				github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/get.go:20
#	0xc899b7	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).Get.func1+0xb7				github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard/get.go:73
#	0xc89e9b	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).fetchObjectData+0x41b			github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard/get.go:127
#	0xc897ad	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).Get+0x22d				github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard/get.go:86
#	0xc9a573	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).get.func1+0x113		github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/get.go:84
#	0xca48a7	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).iterateOverSortedShards+0xc7	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/shards.go:225
#	0xc99e64	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).get+0x324			github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/get.go:78
#	0xc99ad5	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Get.func1+0x55			github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/get.go:48
#	0xc939ee	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).execIfNotBlocked+0xce		github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/control.go:147
#	0xc99a38	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Get+0xb8			github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/get.go:47
#	0xc9aa3e	github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.Get+0x9e					github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/get.go:172
#	0xd0d4c5	github.com/nspcc-dev/neofs-node/pkg/services/replicator.(*Replicator).HandleTask+0x105				github.com/nspcc-dev/neofs-node/pkg/services/replicator/process.go:30
#	0xe12575	github.com/nspcc-dev/neofs-node/pkg/services/policer.(*Policer).processNodes+0xfd5				github.com/nspcc-dev/neofs-node/pkg/services/policer/check.go:241
#	0xe112f9	github.com/nspcc-dev/neofs-node/pkg/services/policer.(*Policer).processObject+0xc19				github.com/nspcc-dev/neofs-node/pkg/services/policer/check.go:127
#	0xe138ba	github.com/nspcc-dev/neofs-node/pkg/services/policer.(*Policer).shardPolicyWorker.func1+0x17a			github.com/nspcc-dev/neofs-node/pkg/services/policer/process.go:65
#	0xbcc0f6	github.com/panjf2000/ants/v2.(*goWorker).run.func1+0x96								github.com/panjf2000/ants/v2@v2.4.0/worker.go:68
```

At this memory profile I see 6 objects of 64 MiB (max object size in the network) in memory. This number is changing but tends to grow up over time (later I saw 10 objects in this run). Maybe we can do something about that.

Maybe it is just GC thing (need to try `GOMEMLIMIT` with go1.19 build), maybe something else. 
 
To reduce number of object reads, node might want to skip replication in case of having `2048 access denied` errors (related to https://github.com/nspcc-dev/neofs-node/issues/1709). This can happen when eACL restricts system operations.

Any other ideas are appreciated.

/cc @fyrchik 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory consumption during replication #2178

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory consumption during replication #2178

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions