Skip to content

Add command to stream contents of DB into another DB.#1463

Merged
martinmr merged 8 commits intomasterfrom
martinmr/stream-2007
Aug 26, 2020
Merged

Add command to stream contents of DB into another DB.#1463
martinmr merged 8 commits intomasterfrom
martinmr/stream-2007

Conversation

@martinmr
Copy link
Contributor

@martinmr martinmr commented Aug 17, 2020

For now this tool streams the contents into another DB with compression turned off.


This change is Reviewable

Copy link

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I think Ibrahim should also approve.

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got some comments.

Reviewable status: 0 of 3 files reviewed, 11 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, @manishrjain, and @martinmr)


db.go, line 1755 at r1 (raw file):

// Stream the contents of this DB to a new DB with options outOptions that will be
// created in outDir.
func (db *DB) StreamDB(outDir string , outOptions Options) error {

You don't need outDir. outOptions already contains the outDir.


db.go, line 1756 at r1 (raw file):

// created in outDir.
func (db *DB) StreamDB(outDir string , outOptions Options) error {
	if err := os.MkdirAll(outDir, 0700); err != nil {

We can remove this as well.


db_test.go, line 385 at r1 (raw file):

	require.NoError(t, err)
	defer removeDir(dir)
	opts := getTestOptions(dir)

Compression is disabled by default. You should enable it here so that we can disable it while streaming out.


db_test.go, line 403 at r1 (raw file):

	outDir, err := ioutil.TempDir("", "badger-test")
	require.NoError(t, err)
	outOpt := getTestOptions(outDir).WithCompression(options.None).WithReadOnly(false)

You can remove WithCompression and WithReadOnly. They're set correctly by default.


db_test.go, line 415 at r1 (raw file):

		key := []byte(fmt.Sprintf("key%d", i))
		val := []byte(fmt.Sprintf("val%d", i))
		txn := db.NewTransactionAt(1, false)

This should be outDB. Inserted into db, read from outDB.
or you could do get on both the DBs and compare the values.


badger/cmd/stream.go, line 32 at r1 (raw file):

	Short: "Stream DB into another DB with different options",
	Long: `
This command streams the contents of this DB into another DB with the given options.

Mention over here that outDir should be empty.

The stream writer will drop all data if the directory already contains data.
You can add two checks as well

  1. If the outDir exists, ensure it is empty.
  2. If the outDir exists and it is non-empty, abort.

We shouldn't modify an existing Badger DB. If there is data in outDir, the user can use a different directory or clean up the outDir.


badger/cmd/stream.go, line 42 at r1 (raw file):

	// TODO: Add more options.
	RootCmd.AddCommand(streamCmd)
	streamCmd.Flags().StringVarP(&outDir, "out", "o", "", "Path to input DB")

Path to output DB


badger/cmd/stream.go, line 45 at r1 (raw file):

	streamCmd.Flags().BoolVarP(&truncate, "truncate", "", false, "Option to truncate the DBs")
	streamCmd.Flags().BoolVarP(&readOnly, "read_only", "", true,
		"Option to open in DB in read-only mode")

Option to open input DB in read-only mode.


badger/cmd/stream.go, line 54 at r1 (raw file):

		WithTruncate(truncate).
		WithValueThreshold(1 << 10 /* 1KB */).
		WithNumVersionsToKeep(math.MaxInt32)

Add a flag for NumVersionsToKeep. Use math.MaxInt32 value if the flag is set to 0 (mention this in the flag description).


badger/cmd/stream.go, line 58 at r1 (raw file):

	// Options for output DB.
	outOpt := inOpt.WithDir(outDir).WithValueDir(outDir).
		WithCompression(options.None).WithReadOnly(false)

Allow user to specify different compression algorithm. You can have 3 values. 0 to disable, 1 for snappy and 2 for zstd.

This would allow us to switch/remove compression from a badger directory.


badger/cmd/stream.go, line 65 at r1 (raw file):

	}
	defer inDB.Close()
	return inDB.StreamDB(outDir, outOpt)

We don't need the outDir. It is set in the outOpt.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 11 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)


db.go, line 1755 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

You don't need outDir. outOptions already contains the outDir.

Done.


db.go, line 1756 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

We can remove this as well.

Done.


db_test.go, line 385 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Compression is disabled by default. You should enable it here so that we can disable it while streaming out.

Done.


db_test.go, line 403 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

You can remove WithCompression and WithReadOnly. They're set correctly by default.

Done.


db_test.go, line 415 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

This should be outDB. Inserted into db, read from outDB.
or you could do get on both the DBs and compare the values.

Done.


badger/cmd/stream.go, line 32 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Mention over here that outDir should be empty.

The stream writer will drop all data if the directory already contains data.
You can add two checks as well

  1. If the outDir exists, ensure it is empty.
  2. If the outDir exists and it is non-empty, abort.

We shouldn't modify an existing Badger DB. If there is data in outDir, the user can use a different directory or clean up the outDir.

Done.


badger/cmd/stream.go, line 42 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Path to output DB

Done.


badger/cmd/stream.go, line 45 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Option to open input DB in read-only mode.

Done.


badger/cmd/stream.go, line 54 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Add a flag for NumVersionsToKeep. Use math.MaxInt32 value if the flag is set to 0 (mention this in the flag description).

Done.


badger/cmd/stream.go, line 58 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Allow user to specify different compression algorithm. You can have 3 values. 0 to disable, 1 for snappy and 2 for zstd.

This would allow us to switch/remove compression from a badger directory.

Done.


badger/cmd/stream.go, line 65 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

We don't need the outDir. It is set in the outOpt.

Done.

@martinmr martinmr requested a review from jarifibrahim August 21, 2020 18:12
@martinmr martinmr dismissed jarifibrahim’s stale review August 21, 2020 18:12

Addressed comments

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I got one comment.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @jarifibrahim, @manishrjain, and @martinmr)


db.go, line 1756 at r2 (raw file):

// created in outDir.
func (db *DB) StreamDB(outOptions Options) error {
	if outOptions.Dir != outOptions.ValueDir {

We don't need them to match. Users can specify different out dir for vlog and ssts.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)


db.go, line 1756 at r2 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

We don't need them to match. Users can specify different out dir for vlog and ssts.

Done.

@martinmr martinmr force-pushed the martinmr/stream-2007 branch from 3ac07a9 to 9a7f62d Compare August 26, 2020 14:30
@martinmr martinmr changed the base branch from release/v20.07 to master August 26, 2020 14:30
@martinmr martinmr merged commit dc653b0 into master Aug 26, 2020
@martinmr martinmr deleted the martinmr/stream-2007 branch August 26, 2020 20:25
martinmr added a commit that referenced this pull request Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants