osd,librados: add fingerprint functions for dedup tier by myoungwon · Pull Request #22987 · ceph/ceph

myoungwon · 2018-07-11T09:21:10Z

This PR introduces chunking and fingerprint implementation.

Please refer to #21999.

add fingerprint interfaces
implement post-dedup

Signed-off-by: Myoungwon Oh omwmw@sk.com

myoungwon · 2018-08-15T13:14:49Z

@liewegas These commits show the dedup implemetation with fixed chunking. Can you take a look?

I think set_chunk is very similar with fixed chunking. So, fingerprint and dedup process can be applied to the set_chunk op. Other chunking method with a new interface (e.g., contents defined chunking..) would be added later.
After finishing this work, the scrub process to fix leaking reference should be added. I think many test cases (not only unit test, but also stress tests) can be added at that time.
The key operation is that the base tier send write op + ref_count op when flushing is needed.

liewegas · 2018-08-16T19:55:36Z

src/osd/PrimaryLogPG.h

+  unsigned char fingerprint[CEPH_CRYPTO_SHA1_DIGESTSIZE + 1];
+public:
+  string do_fingerprint(bufferlist & list) {
+    char * ptr = list.c_str();


instead of c_str(), this should iterate over the bufferlist buffers. it may not alrady be a contiguous buffer and c_str() might trigger an expensive rebuild().

BTW this could be added as a bufferlist member, next to crc32c(). I'm not sure that's really where we want it long term, but they two can move together.

hmm, maybe an implementation that adds a type sha1_digest_t (that's 128 or 160 or however many bits), and an operator<< and to_str() method that renders the usual hex string...

I agree. I will fix it as your comment.

Out of curiosity, why did you choose SHA1 and not XXHASH as the fingerprint?

@mykaul SHA1 is just one of option. I'm focusing on dedup operations such as flushing, reference counting. But, as s future work, I will add other hash algorithms.

liewegas · 2018-08-16T20:00:45Z

src/osd/PrimaryLogPG.cc

+	  osd_op.indata.claim_append(chunk_data);
+	  osd_op.op.flags = osd_op.op.flags | CEPH_OSD_OP_FLAG_WITH_DEDUP;
+	  // add refcount op
+	  call.source = tgt_soid;


the source is this object, not the target object (named as the sha1 hash)

liewegas · 2018-08-16T20:07:54Z

src/osd/osd_types.h

    TYPE_NONE = 0,
    TYPE_REDIRECT = 1, 
    TYPE_CHUNKED = 2, 
+    TYPE_DEDUPED = 3, 


This doesn't seem like the place to add it. REDIRECT means the whole object is elsewhere, CHUNKED means pieces of it are elsewhere. Dedup seems like a particular case of either. Can we stick with CHUNKED here? Some chunks maybe deduped chunks, some not; some may be refcouned, some not. The current types already allow us to express that.

Right. Thanks for your comment.

liewegas · 2018-08-16T20:11:03Z

src/osd/osd_types.h

+    TYPE_DEDUPED = 3, 
+  };
+  enum {
+    TYPE_SHA1_FINGERPRINT = 1,


This seems like the important addition. It feels like a "policy" or preference, though: it does't tell us anything about the existing chunks (which may or may not use this hash, or any hash), only what we ought to do with any new chunks.

In fact, do we need to persist this at all?

@liewegas This is post-dedup, which means the fingerprint is not generated when client's data is come. The fingerprint will be generated at flushing step. So, we need to keep fingerprint info such as sha1, sha256.
Without fingerprint info, how do we know which fingerprint method is used?

liewegas · 2018-08-16T20:24:31Z

src/osd/osd_types.h

+  }
+  void clear_flag(cflag_t f) {
+    flags = (cflag_t)(flags & ~f);
+  }


i would add clear_flags() { flags = 0; } and set_flagss(cflag_t t) { flags = t; }

liewegas · 2018-08-16T20:27:12Z

src/osd/PrimaryLogPG.cc

+	      break;
+	    }
+	}
+


i don't understand why this is helping. and regardless, i don't think we should be special casing something like this in the generic OP_CALL case!

@liewegas
Let assume that we flush the chunked object.
Operations look like below

op[2] = "write op", "chunk_get"

Write op will make a new object, but the object will not be stored on filesystem yet when chunk_get is called. So, chunk_get will fail and it receives ENOENT. In this case, we need to use chunk_set instead of chunk_get because chunk_get read existing value first and then add a new value.

To handle this case, I think there are two solutions. First is making a new OP such as CEPH_OSD_OP_WRITE_WITH_REF. Second is adding a special case as above.

What is your opinion?

liewegas · 2018-08-17T12:30:09Z

If we really need to persist a checksum preference, let's add a field to object_info_t. I wonder, though, if it's necessary. The fingerprint, chunking policy, and so on is more likely going to be a pool property?

liewegas · 2018-08-17T12:31:54Z

I would make a new operation, chunk_write_or_get, that does it in one go. That's better anyway because you want to teh write part to turn into a no-op if the object is already there (or, maybe, to do a read and compare if we are feeling paranoid). Either way a single new op that doe sthe atomic write-or-get is better, I think!

liewegas · 2018-08-17T12:32:42Z

It might even be that the policy around how to chunk teh object and what hash to use is a property of the *target* pool so that multiple consumers will align their chunking strategies and dedup effectively. Maybe!

myoungwon · 2018-08-24T11:05:50Z

@liewegas I updated commits as your comments. Can you take a look?

liewegas · 2018-08-26T20:57:05Z

src/osd/osd_types.h

+    default: return "unknown";
+    }
+  }
+  fingerprint_t fingerprint;


pool_opts_t might be a better choice for this?

liewegas · 2018-08-26T20:57:44Z

src/mon/OSDMonitor.cc

    wait_for_finished_proposal(op, new Monitor::C_Command(mon, op, 0, rs,
 					      get_last_committed() + 1));
    return true;
+  } else if (prefix == "osd pool fingerprint") {


I think this should be a pool property set via 'osd pool set '. Look at something like 'compression_mode' as a model

liewegas · 2018-08-26T21:08:17Z

src/osd/PrimaryLogPG.cc


      break;

+    case CEPH_OSD_OP_CHUNK_WRITE_OR_GET:


This logic looks right, but I think this should live in the same class as the GET/PUT.

In fact, now that I think about it, since we already have a specialized GET/PUT method (for chunks) that is different than the regular refcount behavior, perhaps we should create a new class calls 'cas' that collects all of these methods for managing the objects in the cas pool: WRITE_OR_GET, PUT, and eventually things like enumerating backreferences to support scrub, etc. That will also let us collect the user interface into a cls_cas_client.h (or whatever) for non-tiering users of the CAS pool (for example, we expect that RGW will write chunks directly into the cas tier).

@liewegas I agree. A new class call is right way. I'll rebase this commit.

Signed-off-by: Myoungwon Oh <omwmw@sk.com>

myoungwon · 2018-09-02T05:20:06Z

@liewegas Updated. Can you take a look ?

liewegas · 2018-09-06T18:39:43Z

src/cls/refcount/cls_refcount_chunk.cc

@@ -0,0 +1,45 @@
+// -*- mode:C; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-


I think we should aim to have chunk_write_or_get, chunk_put, and chunk_read_refcount all in the same cls. When I suggested 'cas' I expected them all to be there, but I forgot that the manifest can also refcount a non-cas object.

The main thing that worries me is that the existing methods in cls_refcount are already used by rgw and we have to be careful not to break them. On the other hand, the refcounts I expect for the cas objects are a bit different, where I expect we'll have some objects will millions of refs and we want to do some clever things with the way those are efficiently represented and scrubbed. This makes me want a clean cls_cas... except for the non-fingerprint objects that aren't actually content-addressibly named. Maybe cls_chunk captures the spirit of both? And lets us pull all of the chunk refcounting into a single class without recursively calling into other cls_refcount methods.

I'm sorry to keep moving the goal posts around on this! If you want we can sketch this out in a bit more detail first before adjusting the code again. The main thing I want to ensure is that other consumers of the cas pool (besides the tiering code... probably rgw in the nearish future) have a simple and clean interface to consume.

liewegas · 2018-09-06T18:40:14Z

src/include/types.h

 WRITE_EQ_OPERATORS_1(errorcode32_t, code)
 WRITE_CMP_OPERATORS_1(errorcode32_t, code)

+struct sha1_fp_t {


nit: can we call this sha1_digest_t (fp == fingerprint i assume, but digest is clearer)

liewegas · 2018-09-06T18:41:15Z

src/include/types.h

+struct sha1_fp_t {
+#define SHA1_DIGEST_SIZE 20
+  unsigned char v[SHA1_DIGEST_SIZE];
+  bool valid;


I'm not sure that the bool valid is necessary. The default ctor can just zero the sha1, and if we really need to represent an undefined value we can use boost::optional<sha1_digest_t>?

liewegas · 2018-09-06T18:41:21Z

src/include/types.h


+struct sha1_fp_t {
+#define SHA1_DIGEST_SIZE 20
+  unsigned char v[SHA1_DIGEST_SIZE];


Signed-off-by: Myoungwon Oh <omwmw@sk.com>

This commit change the inferface of set/get/clear flag in order to handle bitwise operation Signed-off-by: Myoungwon Oh <omwmw@sk.com>

cas class is introduced(cas class includes a write_or_get op) This operation increase the reference count if the chunk is already stored. Signed-off-by: Myoungwon Oh <omwmw@sk.com>

Signed-off-by: Myoungwon Oh <omwmw@sk.com>

myoungwon · 2018-09-07T14:29:19Z

@liewegas Updated. please review again. sha1_digest_t is already defined in rgw_common.h So, we need another name.

liewegas · 2018-09-07T15:53:59Z

../src/rgw/rgw_common.h:using sha1_digest_t = \
../src/rgw/rgw_common.h:static inline sha1_digest_t
../src/rgw/rgw_common.h:  sha1_digest_t dest;

it's only used in 3 instances.. let's rename the rgw one to sha1_digest_array_t? Is that okay @rzarzynski ?

rzarzynski · 2018-09-07T16:22:19Z

@liewegas: I don't see any reason why not. Let's do.

rename existing sha1_digest_t to sha1_digest_array_t and add a new sha1_digest_t Signed-off-by: Myoungwon Oh <omwmw@sk.com>

myoungwon · 2018-09-07T18:39:06Z

@liewegas rename is done.

* refs/pull/22987/head: common,rgw: rename sha1_digest_t osd: decrement old chunk's reference count if the chunk has a reference. src/test: add a unit test osd: using fingerprint OID if fingerprint is set osd: add flag interfaces in chunk_info_t common/buffer.cc: add sha1 fingerprint osd: add fingerprint property mon: add a command to set fingerprint algorithm

myoungwon changed the title ~~WIP: osd,librados: add chunking and fingerprint~~ WIP: osd,librados: add chunking and fingerprint functions for dedup Jul 11, 2018

myoungwon changed the title ~~WIP: osd,librados: add chunking and fingerprint functions for dedup~~ WIP: osd,librados: add chunking and fingerprint functions for dedup tier Jul 11, 2018

batrick added core DNM labels Jul 11, 2018

myoungwon force-pushed the wip-chunk-fingerprint branch 2 times, most recently from a9147ae to c078684 Compare July 25, 2018 05:37

myoungwon force-pushed the wip-chunk-fingerprint branch from c078684 to 64b434b Compare August 11, 2018 16:49

myoungwon changed the title ~~WIP: osd,librados: add chunking and fingerprint functions for dedup tier~~ WIP: osd,librados: add fingerprint functions for dedup tier Aug 11, 2018

myoungwon added the feature label Aug 11, 2018

myoungwon force-pushed the wip-chunk-fingerprint branch from 64b434b to d883af0 Compare August 12, 2018 13:08

myoungwon requested a review from liewegas August 15, 2018 13:15

liewegas reviewed Aug 16, 2018

View reviewed changes

myoungwon force-pushed the wip-chunk-fingerprint branch from 286de2d to cc0695a Compare August 24, 2018 08:32

myoungwon changed the title ~~WIP: osd,librados: add fingerprint functions for dedup tier~~ osd,librados: add fingerprint functions for dedup tier Aug 24, 2018

myoungwon removed the DNM label Aug 24, 2018

myoungwon requested a review from liewegas August 24, 2018 11:06

myoungwon force-pushed the wip-chunk-fingerprint branch 2 times, most recently from 1e98641 to 46a4213 Compare August 25, 2018 03:07

liewegas reviewed Aug 26, 2018

View reviewed changes

myoungwon force-pushed the wip-chunk-fingerprint branch from 46a4213 to 85fd749 Compare September 2, 2018 03:15

myoungwon added 2 commits September 2, 2018 12:22

mon: add a command to set fingerprint algorithm

ad7096a

Signed-off-by: Myoungwon Oh <omwmw@sk.com>

osd: add fingerprint property

c897267

Signed-off-by: Myoungwon Oh <omwmw@sk.com>

myoungwon force-pushed the wip-chunk-fingerprint branch from 85fd749 to 2d6ca55 Compare September 2, 2018 03:33

myoungwon requested a review from liewegas September 2, 2018 05:20

liewegas reviewed Sep 6, 2018

View reviewed changes

src/include/types.h Outdated

struct sha1_fp_t {

#define SHA1_DIGEST_SIZE 20

unsigned char v[SHA1_DIGEST_SIZE];

Copy link

Member

liewegas Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= {0}

myoungwon added 5 commits September 7, 2018 20:57

common/buffer.cc: add sha1 fingerprint

7fa45e0

Signed-off-by: Myoungwon Oh <omwmw@sk.com>

osd: add flag interfaces in chunk_info_t

4efb0b6

This commit change the inferface of set/get/clear flag in order to handle bitwise operation Signed-off-by: Myoungwon Oh <omwmw@sk.com>

osd: using fingerprint OID if fingerprint is set

42e24a4

cas class is introduced(cas class includes a write_or_get op) This operation increase the reference count if the chunk is already stored. Signed-off-by: Myoungwon Oh <omwmw@sk.com>

src/test: add a unit test

f61c750

Signed-off-by: Myoungwon Oh <omwmw@sk.com>

osd: decrement old chunk's reference count if the chunk has a reference.

917062d

Signed-off-by: Myoungwon Oh <omwmw@sk.com>

myoungwon force-pushed the wip-chunk-fingerprint branch 2 times, most recently from 2ea5572 to 917062d Compare September 7, 2018 13:00

myoungwon requested a review from liewegas September 7, 2018 14:31

liewegas added the needs-qa label Sep 7, 2018

common,rgw: rename sha1_digest_t

da749d6

rename existing sha1_digest_t to sha1_digest_array_t and add a new sha1_digest_t Signed-off-by: Myoungwon Oh <omwmw@sk.com>

liewegas added the wip-sage-testing label Sep 10, 2018

liewegas approved these changes Sep 10, 2018

View reviewed changes

liewegas merged commit da749d6 into ceph:master Sep 11, 2018

myoungwon mentioned this pull request Sep 24, 2018

WIP: osd, librados: chunk scrub for the dedup tier #24230

Closed

4 tasks

		@@ -0,0 +1,45 @@
		// -- mode:C; tab-width:8; c-basic-offset:2; indent-tabs-mode:t --

Conversation

myoungwon commented Jul 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myoungwon commented Aug 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liewegas commented Aug 17, 2018 via email

Uh oh!

liewegas commented Aug 17, 2018 via email

Uh oh!

liewegas commented Aug 17, 2018 via email

Uh oh!

myoungwon commented Aug 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

myoungwon commented Sep 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

myoungwon commented Sep 7, 2018

Uh oh!

liewegas commented Sep 7, 2018

Uh oh!

rzarzynski commented Sep 7, 2018

Uh oh!

myoungwon commented Sep 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

myoungwon commented Jul 11, 2018 •

edited

Loading