osd,librados: add fingerprint functions for dedup tier#22987
osd,librados: add fingerprint functions for dedup tier#22987liewegas merged 8 commits intoceph:masterfrom
Conversation
a9147ae to
c078684
Compare
c078684 to
64b434b
Compare
64b434b to
d883af0
Compare
|
@liewegas These commits show the dedup implemetation with fixed chunking. Can you take a look?
|
src/osd/PrimaryLogPG.h
Outdated
| unsigned char fingerprint[CEPH_CRYPTO_SHA1_DIGESTSIZE + 1]; | ||
| public: | ||
| string do_fingerprint(bufferlist & list) { | ||
| char * ptr = list.c_str(); |
There was a problem hiding this comment.
instead of c_str(), this should iterate over the bufferlist buffers. it may not alrady be a contiguous buffer and c_str() might trigger an expensive rebuild().
There was a problem hiding this comment.
BTW this could be added as a bufferlist member, next to crc32c(). I'm not sure that's really where we want it long term, but they two can move together.
There was a problem hiding this comment.
hmm, maybe an implementation that adds a type sha1_digest_t (that's 128 or 160 or however many bits), and an operator<< and to_str() method that renders the usual hex string...
There was a problem hiding this comment.
I agree. I will fix it as your comment.
There was a problem hiding this comment.
Out of curiosity, why did you choose SHA1 and not XXHASH as the fingerprint?
There was a problem hiding this comment.
@mykaul SHA1 is just one of option. I'm focusing on dedup operations such as flushing, reference counting. But, as s future work, I will add other hash algorithms.
src/osd/PrimaryLogPG.cc
Outdated
| osd_op.indata.claim_append(chunk_data); | ||
| osd_op.op.flags = osd_op.op.flags | CEPH_OSD_OP_FLAG_WITH_DEDUP; | ||
| // add refcount op | ||
| call.source = tgt_soid; |
There was a problem hiding this comment.
the source is this object, not the target object (named as the sha1 hash)
src/osd/osd_types.h
Outdated
| TYPE_NONE = 0, | ||
| TYPE_REDIRECT = 1, | ||
| TYPE_CHUNKED = 2, | ||
| TYPE_DEDUPED = 3, |
There was a problem hiding this comment.
This doesn't seem like the place to add it. REDIRECT means the whole object is elsewhere, CHUNKED means pieces of it are elsewhere. Dedup seems like a particular case of either. Can we stick with CHUNKED here? Some chunks maybe deduped chunks, some not; some may be refcouned, some not. The current types already allow us to express that.
There was a problem hiding this comment.
Right. Thanks for your comment.
src/osd/osd_types.h
Outdated
| TYPE_DEDUPED = 3, | ||
| }; | ||
| enum { | ||
| TYPE_SHA1_FINGERPRINT = 1, |
There was a problem hiding this comment.
This seems like the important addition. It feels like a "policy" or preference, though: it does't tell us anything about the existing chunks (which may or may not use this hash, or any hash), only what we ought to do with any new chunks.
In fact, do we need to persist this at all?
There was a problem hiding this comment.
@liewegas This is post-dedup, which means the fingerprint is not generated when client's data is come. The fingerprint will be generated at flushing step. So, we need to keep fingerprint info such as sha1, sha256.
Without fingerprint info, how do we know which fingerprint method is used?
| } | ||
| void clear_flag(cflag_t f) { | ||
| flags = (cflag_t)(flags & ~f); | ||
| } |
There was a problem hiding this comment.
i would add clear_flags() { flags = 0; } and set_flagss(cflag_t t) { flags = t; }
src/osd/PrimaryLogPG.cc
Outdated
| break; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
i don't understand why this is helping. and regardless, i don't think we should be special casing something like this in the generic OP_CALL case!
There was a problem hiding this comment.
@liewegas
Let assume that we flush the chunked object.
Operations look like below
- op[2] = "write op", "chunk_get"
Write op will make a new object, but the object will not be stored on filesystem yet when chunk_get is called. So, chunk_get will fail and it receives ENOENT. In this case, we need to use chunk_set instead of chunk_get because chunk_get read existing value first and then add a new value.
To handle this case, I think there are two solutions. First is making a new OP such as CEPH_OSD_OP_WRITE_WITH_REF. Second is adding a special case as above.
What is your opinion?
|
If we really need to persist a checksum preference, let's add a field to object_info_t.
I wonder, though, if it's necessary. The fingerprint, chunking policy, and so on is more likely going to be a pool property?
|
|
I would make a new operation, chunk_write_or_get, that does it in one go.
That's better anyway because you want to teh write part to turn into a
no-op if the object is already there (or, maybe, to do a read and compare
if we are feeling paranoid). Either way a single new op that doe sthe
atomic write-or-get is better, I think!
|
|
It might even be that the policy around how to chunk teh object and
what hash to use is a property of the *target* pool so that multiple
consumers will align their chunking strategies and dedup effectively.
Maybe!
|
286de2d to
cc0695a
Compare
|
@liewegas I updated commits as your comments. Can you take a look? |
1e98641 to
46a4213
Compare
src/osd/osd_types.h
Outdated
| default: return "unknown"; | ||
| } | ||
| } | ||
| fingerprint_t fingerprint; |
There was a problem hiding this comment.
pool_opts_t might be a better choice for this?
src/mon/OSDMonitor.cc
Outdated
| wait_for_finished_proposal(op, new Monitor::C_Command(mon, op, 0, rs, | ||
| get_last_committed() + 1)); | ||
| return true; | ||
| } else if (prefix == "osd pool fingerprint") { |
There was a problem hiding this comment.
I think this should be a pool property set via 'osd pool set '. Look at something like 'compression_mode' as a model
src/osd/PrimaryLogPG.cc
Outdated
|
|
||
| break; | ||
|
|
||
| case CEPH_OSD_OP_CHUNK_WRITE_OR_GET: |
There was a problem hiding this comment.
This logic looks right, but I think this should live in the same class as the GET/PUT.
In fact, now that I think about it, since we already have a specialized GET/PUT method (for chunks) that is different than the regular refcount behavior, perhaps we should create a new class calls 'cas' that collects all of these methods for managing the objects in the cas pool: WRITE_OR_GET, PUT, and eventually things like enumerating backreferences to support scrub, etc. That will also let us collect the user interface into a cls_cas_client.h (or whatever) for non-tiering users of the CAS pool (for example, we expect that RGW will write chunks directly into the cas tier).
There was a problem hiding this comment.
@liewegas I agree. A new class call is right way. I'll rebase this commit.
46a4213 to
85fd749
Compare
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
85fd749 to
2d6ca55
Compare
|
@liewegas Updated. Can you take a look ? |
| @@ -0,0 +1,45 @@ | |||
| // -*- mode:C; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- | |||
There was a problem hiding this comment.
I think we should aim to have chunk_write_or_get, chunk_put, and chunk_read_refcount all in the same cls. When I suggested 'cas' I expected them all to be there, but I forgot that the manifest can also refcount a non-cas object.
The main thing that worries me is that the existing methods in cls_refcount are already used by rgw and we have to be careful not to break them. On the other hand, the refcounts I expect for the cas objects are a bit different, where I expect we'll have some objects will millions of refs and we want to do some clever things with the way those are efficiently represented and scrubbed. This makes me want a clean cls_cas... except for the non-fingerprint objects that aren't actually content-addressibly named. Maybe cls_chunk captures the spirit of both? And lets us pull all of the chunk refcounting into a single class without recursively calling into other cls_refcount methods.
I'm sorry to keep moving the goal posts around on this! If you want we can sketch this out in a bit more detail first before adjusting the code again. The main thing I want to ensure is that other consumers of the cas pool (besides the tiering code... probably rgw in the nearish future) have a simple and clean interface to consume.
src/include/types.h
Outdated
| WRITE_EQ_OPERATORS_1(errorcode32_t, code) | ||
| WRITE_CMP_OPERATORS_1(errorcode32_t, code) | ||
|
|
||
| struct sha1_fp_t { |
There was a problem hiding this comment.
nit: can we call this sha1_digest_t (fp == fingerprint i assume, but digest is clearer)
src/include/types.h
Outdated
| struct sha1_fp_t { | ||
| #define SHA1_DIGEST_SIZE 20 | ||
| unsigned char v[SHA1_DIGEST_SIZE]; | ||
| bool valid; |
There was a problem hiding this comment.
I'm not sure that the bool valid is necessary. The default ctor can just zero the sha1, and if we really need to represent an undefined value we can use boost::optional<sha1_digest_t>?
src/include/types.h
Outdated
|
|
||
| struct sha1_fp_t { | ||
| #define SHA1_DIGEST_SIZE 20 | ||
| unsigned char v[SHA1_DIGEST_SIZE]; |
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
This commit change the inferface of set/get/clear flag in order to handle bitwise operation Signed-off-by: Myoungwon Oh <omwmw@sk.com>
cas class is introduced(cas class includes a write_or_get op) This operation increase the reference count if the chunk is already stored. Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
2ea5572 to
917062d
Compare
|
@liewegas Updated. please review again. sha1_digest_t is already defined in rgw_common.h So, we need another name. |
it's only used in 3 instances.. let's rename the rgw one to sha1_digest_array_t? Is that okay @rzarzynski ? |
|
@liewegas: I don't see any reason why not. Let's do. |
rename existing sha1_digest_t to sha1_digest_array_t and add a new sha1_digest_t Signed-off-by: Myoungwon Oh <omwmw@sk.com>
|
@liewegas rename is done. |
* refs/pull/22987/head: common,rgw: rename sha1_digest_t osd: decrement old chunk's reference count if the chunk has a reference. src/test: add a unit test osd: using fingerprint OID if fingerprint is set osd: add flag interfaces in chunk_info_t common/buffer.cc: add sha1 fingerprint osd: add fingerprint property mon: add a command to set fingerprint algorithm
This PR introduces chunking and fingerprint implementation.
Please refer to #21999.
Signed-off-by: Myoungwon Oh omwmw@sk.com