Skip to content

osd,librados: add fingerprint functions for dedup tier#22987

Merged
liewegas merged 8 commits intoceph:masterfrom
myoungwon:wip-chunk-fingerprint
Sep 11, 2018
Merged

osd,librados: add fingerprint functions for dedup tier#22987
liewegas merged 8 commits intoceph:masterfrom
myoungwon:wip-chunk-fingerprint

Conversation

@myoungwon
Copy link
Member

@myoungwon myoungwon commented Jul 11, 2018

This PR introduces chunking and fingerprint implementation.

Please refer to #21999.

  • add fingerprint interfaces
  • implement post-dedup

Signed-off-by: Myoungwon Oh omwmw@sk.com

@myoungwon myoungwon changed the title WIP: osd,librados: add chunking and fingerprint WIP: osd,librados: add chunking and fingerprint functions for dedup Jul 11, 2018
@myoungwon myoungwon changed the title WIP: osd,librados: add chunking and fingerprint functions for dedup WIP: osd,librados: add chunking and fingerprint functions for dedup tier Jul 11, 2018
@myoungwon myoungwon force-pushed the wip-chunk-fingerprint branch 2 times, most recently from a9147ae to c078684 Compare July 25, 2018 05:37
@myoungwon myoungwon force-pushed the wip-chunk-fingerprint branch from c078684 to 64b434b Compare August 11, 2018 16:49
@myoungwon myoungwon changed the title WIP: osd,librados: add chunking and fingerprint functions for dedup tier WIP: osd,librados: add fingerprint functions for dedup tier Aug 11, 2018
@myoungwon myoungwon force-pushed the wip-chunk-fingerprint branch from 64b434b to d883af0 Compare August 12, 2018 13:08
@myoungwon
Copy link
Member Author

@liewegas These commits show the dedup implemetation with fixed chunking. Can you take a look?

  1. I think set_chunk is very similar with fixed chunking. So, fingerprint and dedup process can be applied to the set_chunk op. Other chunking method with a new interface (e.g., contents defined chunking..) would be added later.

  2. After finishing this work, the scrub process to fix leaking reference should be added. I think many test cases (not only unit test, but also stress tests) can be added at that time.

  3. The key operation is that the base tier send write op + ref_count op when flushing is needed.

@myoungwon myoungwon requested a review from liewegas August 15, 2018 13:15
unsigned char fingerprint[CEPH_CRYPTO_SHA1_DIGESTSIZE + 1];
public:
string do_fingerprint(bufferlist & list) {
char * ptr = list.c_str();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of c_str(), this should iterate over the bufferlist buffers. it may not alrady be a contiguous buffer and c_str() might trigger an expensive rebuild().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this could be added as a bufferlist member, next to crc32c(). I'm not sure that's really where we want it long term, but they two can move together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe an implementation that adds a type sha1_digest_t (that's 128 or 160 or however many bits), and an operator<< and to_str() method that renders the usual hex string...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I will fix it as your comment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why did you choose SHA1 and not XXHASH as the fingerprint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mykaul SHA1 is just one of option. I'm focusing on dedup operations such as flushing, reference counting. But, as s future work, I will add other hash algorithms.

osd_op.indata.claim_append(chunk_data);
osd_op.op.flags = osd_op.op.flags | CEPH_OSD_OP_FLAG_WITH_DEDUP;
// add refcount op
call.source = tgt_soid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the source is this object, not the target object (named as the sha1 hash)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

TYPE_NONE = 0,
TYPE_REDIRECT = 1,
TYPE_CHUNKED = 2,
TYPE_DEDUPED = 3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like the place to add it. REDIRECT means the whole object is elsewhere, CHUNKED means pieces of it are elsewhere. Dedup seems like a particular case of either. Can we stick with CHUNKED here? Some chunks maybe deduped chunks, some not; some may be refcouned, some not. The current types already allow us to express that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Thanks for your comment.

TYPE_DEDUPED = 3,
};
enum {
TYPE_SHA1_FINGERPRINT = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the important addition. It feels like a "policy" or preference, though: it does't tell us anything about the existing chunks (which may or may not use this hash, or any hash), only what we ought to do with any new chunks.

In fact, do we need to persist this at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas This is post-dedup, which means the fingerprint is not generated when client's data is come. The fingerprint will be generated at flushing step. So, we need to keep fingerprint info such as sha1, sha256.
Without fingerprint info, how do we know which fingerprint method is used?

}
void clear_flag(cflag_t f) {
flags = (cflag_t)(flags & ~f);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would add clear_flags() { flags = 0; } and set_flagss(cflag_t t) { flags = t; }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

break;
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand why this is helping. and regardless, i don't think we should be special casing something like this in the generic OP_CALL case!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas
Let assume that we flush the chunked object.
Operations look like below

  • op[2] = "write op", "chunk_get"

Write op will make a new object, but the object will not be stored on filesystem yet when chunk_get is called. So, chunk_get will fail and it receives ENOENT. In this case, we need to use chunk_set instead of chunk_get because chunk_get read existing value first and then add a new value.

To handle this case, I think there are two solutions. First is making a new OP such as CEPH_OSD_OP_WRITE_WITH_REF. Second is adding a special case as above.

What is your opinion?

@liewegas
Copy link
Member

liewegas commented Aug 17, 2018 via email

@liewegas
Copy link
Member

liewegas commented Aug 17, 2018 via email

@liewegas
Copy link
Member

liewegas commented Aug 17, 2018 via email

@myoungwon myoungwon force-pushed the wip-chunk-fingerprint branch from 286de2d to cc0695a Compare August 24, 2018 08:32
@myoungwon myoungwon changed the title WIP: osd,librados: add fingerprint functions for dedup tier osd,librados: add fingerprint functions for dedup tier Aug 24, 2018
@myoungwon myoungwon removed the DNM label Aug 24, 2018
@myoungwon
Copy link
Member Author

@liewegas I updated commits as your comments. Can you take a look?

@myoungwon myoungwon requested a review from liewegas August 24, 2018 11:06
@myoungwon myoungwon force-pushed the wip-chunk-fingerprint branch 2 times, most recently from 1e98641 to 46a4213 Compare August 25, 2018 03:07
default: return "unknown";
}
}
fingerprint_t fingerprint;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool_opts_t might be a better choice for this?

wait_for_finished_proposal(op, new Monitor::C_Command(mon, op, 0, rs,
get_last_committed() + 1));
return true;
} else if (prefix == "osd pool fingerprint") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a pool property set via 'osd pool set '. Look at something like 'compression_mode' as a model

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


break;

case CEPH_OSD_OP_CHUNK_WRITE_OR_GET:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks right, but I think this should live in the same class as the GET/PUT.

In fact, now that I think about it, since we already have a specialized GET/PUT method (for chunks) that is different than the regular refcount behavior, perhaps we should create a new class calls 'cas' that collects all of these methods for managing the objects in the cas pool: WRITE_OR_GET, PUT, and eventually things like enumerating backreferences to support scrub, etc. That will also let us collect the user interface into a cls_cas_client.h (or whatever) for non-tiering users of the CAS pool (for example, we expect that RGW will write chunks directly into the cas tier).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas I agree. A new class call is right way. I'll rebase this commit.

@myoungwon myoungwon force-pushed the wip-chunk-fingerprint branch from 46a4213 to 85fd749 Compare September 2, 2018 03:15
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
@myoungwon myoungwon force-pushed the wip-chunk-fingerprint branch from 85fd749 to 2d6ca55 Compare September 2, 2018 03:33
@myoungwon
Copy link
Member Author

@liewegas Updated. Can you take a look ?

@myoungwon myoungwon requested a review from liewegas September 2, 2018 05:20
@@ -0,0 +1,45 @@
// -*- mode:C; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should aim to have chunk_write_or_get, chunk_put, and chunk_read_refcount all in the same cls. When I suggested 'cas' I expected them all to be there, but I forgot that the manifest can also refcount a non-cas object.

The main thing that worries me is that the existing methods in cls_refcount are already used by rgw and we have to be careful not to break them. On the other hand, the refcounts I expect for the cas objects are a bit different, where I expect we'll have some objects will millions of refs and we want to do some clever things with the way those are efficiently represented and scrubbed. This makes me want a clean cls_cas... except for the non-fingerprint objects that aren't actually content-addressibly named. Maybe cls_chunk captures the spirit of both? And lets us pull all of the chunk refcounting into a single class without recursively calling into other cls_refcount methods.

I'm sorry to keep moving the goal posts around on this! If you want we can sketch this out in a bit more detail first before adjusting the code again. The main thing I want to ensure is that other consumers of the cas pool (besides the tiering code... probably rgw in the nearish future) have a simple and clean interface to consume.

WRITE_EQ_OPERATORS_1(errorcode32_t, code)
WRITE_CMP_OPERATORS_1(errorcode32_t, code)

struct sha1_fp_t {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we call this sha1_digest_t (fp == fingerprint i assume, but digest is clearer)

struct sha1_fp_t {
#define SHA1_DIGEST_SIZE 20
unsigned char v[SHA1_DIGEST_SIZE];
bool valid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the bool valid is necessary. The default ctor can just zero the sha1, and if we really need to represent an undefined value we can use boost::optional<sha1_digest_t>?


struct sha1_fp_t {
#define SHA1_DIGEST_SIZE 20
unsigned char v[SHA1_DIGEST_SIZE];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= {0}

Signed-off-by: Myoungwon Oh <omwmw@sk.com>
This commit change the inferface of set/get/clear flag
in order to handle bitwise operation

Signed-off-by: Myoungwon Oh <omwmw@sk.com>
cas class is introduced(cas class includes a write_or_get op)
This operation increase the reference count
if the chunk is already stored.

Signed-off-by: Myoungwon Oh <omwmw@sk.com>
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
@myoungwon myoungwon force-pushed the wip-chunk-fingerprint branch 2 times, most recently from 2ea5572 to 917062d Compare September 7, 2018 13:00
@myoungwon
Copy link
Member Author

@liewegas Updated. please review again. sha1_digest_t is already defined in rgw_common.h So, we need another name.

@myoungwon myoungwon requested a review from liewegas September 7, 2018 14:31
@liewegas
Copy link
Member

liewegas commented Sep 7, 2018

../src/rgw/rgw_common.h:using sha1_digest_t = \
../src/rgw/rgw_common.h:static inline sha1_digest_t
../src/rgw/rgw_common.h:  sha1_digest_t dest;

it's only used in 3 instances.. let's rename the rgw one to sha1_digest_array_t? Is that okay @rzarzynski ?

@rzarzynski
Copy link
Contributor

@liewegas: I don't see any reason why not. Let's do.

rename existing sha1_digest_t to sha1_digest_array_t
and add a new sha1_digest_t

Signed-off-by: Myoungwon Oh <omwmw@sk.com>
@myoungwon
Copy link
Member Author

@liewegas rename is done.

@liewegas liewegas merged commit da749d6 into ceph:master Sep 11, 2018
liewegas added a commit that referenced this pull request Sep 11, 2018
* refs/pull/22987/head:
	common,rgw: rename sha1_digest_t
	osd: decrement old chunk's reference count if the chunk has a reference.
	src/test: add a unit test
	osd: using fingerprint OID if fingerprint is set
	osd: add flag interfaces in chunk_info_t
	common/buffer.cc: add sha1 fingerprint
	osd: add fingerprint property
	mon: add a command to set fingerprint algorithm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants