Skip to content

engine: Support reading objects without decoding#2753

Merged
cthulhu-rider merged 2 commits intomasterfrom
1723-storage-get-binary
Mar 5, 2024
Merged

engine: Support reading objects without decoding#2753
cthulhu-rider merged 2 commits intomasterfrom
1723-storage-get-binary

Conversation

@cthulhu-rider
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Feb 26, 2024

Codecov Report

Attention: Patch coverage is 79.05759% with 40 lines in your changes are missing coverage. Please review.

Project coverage is 22.03%. Comparing base (2da3676) to head (09e385c).

Files Patch % Lines
pkg/local_object_storage/blobstor/fstree/fstree.go 44.44% 11 Missing and 4 partials ⚠️
pkg/local_object_storage/blobstor/peapod/peapod.go 66.66% 6 Missing and 1 partial ⚠️
cmd/blobovnicza-to-peapod/blobovniczatree/get.go 0.00% 5 Missing ⚠️
...al_object_storage/blobstor/compression/compress.go 0.00% 5 Missing ⚠️
pkg/local_object_storage/engine/get.go 89.18% 2 Missing and 2 partials ⚠️
pkg/local_object_storage/shard/get.go 95.34% 2 Missing ⚠️
pkg/local_object_storage/writecache/get.go 87.50% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2753      +/-   ##
==========================================
+ Coverage   21.89%   22.03%   +0.14%     
==========================================
  Files         787      787              
  Lines       46643    46763     +120     
==========================================
+ Hits        10211    10304      +93     
- Misses      35555    35576      +21     
- Partials      877      883       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@carpawell carpawell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, not valuable to me as a separate PR. Not clear why we need it, operation params are questionable, the issue does not describe why this should be done (it appeared earlier than the one we are trying to resolve now).

return nil, fmt.Errorf("stat object file %q: %w", p, err)
}
sz := fi.Size()
if sz > math.MaxInt {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what this check does save us from?
@roman-khimov

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only from garbage in the storage maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise data cannot be fully read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how int64 can be bigger than MaxInt? do we target such systems? can we run on such systems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on 32-bit systems

do we target such systems? can we run on such systems?

yes, yes

Copy link
Member

@carpawell carpawell Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me it is no, no. i had to google examples of such machines (meaning years of their production), also it seems that you have a file/object bigger than such machines can have ram

Copy link
Contributor Author

@cthulhu-rider cthulhu-rider Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

machines (meaning years of their production)

this may be at the OS level

anyway, i dont see such limitation in neither docs nor code

b = make([]byte, sz)
}

_, err = io.ReadFull(f, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not streaming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u mean return io.Reader? no need for now, may be added later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have big problems with memory, i thought that extending the storage interface with row data should be done for memory optimizations mainly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for memory optimizations

it is: as i mentioned in the commit, in some cases we wanna skip decode-encode round

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean reading a full object in memory at once is not good to me in a PR that tries to beat memory problems

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR that tries to beat memory problems

it doesn't. Main commit says what problem it solves - excessive decode-encode round

@cthulhu-rider
Copy link
Contributor Author

Not clear why we need it

tried to explain in the commit. This is 100%-needed functionality for any storage IMO - just get what u store as []byte

Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need better coverage as well.

return nil, fmt.Errorf("stat object file %q: %w", p, err)
}
sz := fi.Size()
if sz > math.MaxInt {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only from garbage in the storage maybe.

@cthulhu-rider cthulhu-rider force-pushed the 1723-storage-get-binary branch from 8654d4f to 06a5d3b Compare March 1, 2024 10:48
@cthulhu-rider
Copy link
Contributor Author

increased test coverage. Fixed old bug on the way

@cthulhu-rider cthulhu-rider force-pushed the 1723-storage-get-binary branch 2 times, most recently from b306b0d to 5123eda Compare March 1, 2024 10:59
@roman-khimov
Copy link
Member

We need to have some broader context for this. It looks like a beginning of a custom memory allocator which just shouldn't be done. We can mmap things, we can stream things, we can pool some chunks (but it doesn't work well for objects of different sizes unfortunately). If nothing else works in-place allocation is an option too (which we had for years anyway, just in a bit different form), especially given that it will be more effective already (one flat buffer, no decoding).

@cthulhu-rider
Copy link
Contributor Author

ok lets use make only for now

@cthulhu-rider cthulhu-rider force-pushed the 1723-storage-get-binary branch from 5123eda to cf86ba6 Compare March 4, 2024 12:58
@cthulhu-rider
Copy link
Contributor Author

dropped options

Previously, the local object storage engine always decoded the requested
objects. In cases where it was necessary to obtain a binary object, an
excessive decoding-encoding round was performed.

Now `GetBytes` method to read object binary is provided.

Closes #1723.

Signed-off-by: Leonard Lyubich <leonard@morphbits.io>
Previously, `Shard.Get*` methods returned unset meta presence flag
if when object was requested and found in the metabase. The
documentation did not explain why. Callers do not isolate the case of a
cache hit (this is not visible from the return anyway). In total, this
behavior was recognized as incorrect and fixed.

Refs 69e1e6c.

Signed-off-by: Leonard Lyubich <leonard@morphbits.io>
@cthulhu-rider cthulhu-rider force-pushed the 1723-storage-get-binary branch from cf86ba6 to 09e385c Compare March 4, 2024 14:18
Copy link
Member

@carpawell carpawell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets try it, but IMO we need object streaming RPC for a correct optimization.

@cthulhu-rider
Copy link
Contributor Author

we need object streaming RPC for a correct optimization

two different topics. Here we cover redundant decoding, while ur talking about smaller bufferization (which is also good to be done, but unrelated to the current PR/issue)

@cthulhu-rider cthulhu-rider merged commit 14b650c into master Mar 5, 2024
@cthulhu-rider cthulhu-rider deleted the 1723-storage-get-binary branch March 5, 2024 08:39
@carpawell
Copy link
Member

two different topics

Not different to me, the main idea is the same.

ur talking about smaller bufferization

Im talking about less memory consumption and/or allocations number. RPC that streams an object with partial reading may improve important things by orders of magnitude, while reading the whole object into memory and transferring it as a single message should only make it a few times better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants