Offline data preparation tools#820

Merged

jsign merged 20 commits intomasterfrom

Apr 7, 2021

Contributor

jsign commented Apr 7, 2021 •

edited

Loading

This PR adds new Powergate CLI commands to prepare data to make offline deals.

We'll soon add offline-deal support in the Powergate APIs, but these preparing tools are useful to prepare data for this stage without having to run a Lotus or go-ipfs daemon. Also, the prepared data can be used with any Filecoin client.

Might be helpful to read the docs PR that explains how to use these new features: textileio/community#281

Also, it updates the Lotus client, devnet, and docker-image to v1.6.0.

jsign added the rd-minor label

jsign self-assigned this

jsign mentioned this pull request

pow: offline deals docs textileio/community#281

Merged

jsign added 16 commits

April 7, 2021 11:10


          prepare work

631ec98

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          create dag and car file

8f2e531

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          rename subcommand and extend cli output helpers

78abdb8

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          add preliminary commp calc

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          add commp subcommand

cf04a31

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          sketch gen car command

0a1d224

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          add car validation & remote go-ipfs support

7b3f1b0

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          mod tidy

7193a45

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          extract commp and dagify to lib

fbc91d9

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          add json support to the rest of commands

d403f50

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          untangle progress bar from lib in dagification

af09106

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          update to Lotus v1.6.0

bbccb4e

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          add test

0cd44a1

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          remove todo

ad98726

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          add lotus command generation

cd55d13

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          lints

83ac56d

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign force-pushed the jsign/lcl branch from 196d3c4 to 83ac56d Compare

April 7, 2021 14:10

jsign added 4 commits

April 7, 2021 11:13


          make docs

e3538e7

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          revert unrelated change

7b3002a

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          update deps

2d7df07

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>


          fix printf argument order

bb429f1

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign commented

View reviewed changes

cmd/pow/cmd/prepare/prepare.go

Comment on lines +259 to +260

		c.Message("Lotus offline-deal command:")
		c.Message("lotus client deal --manual-piece-cid=%s --manual-piece-size=%d %s <miner> <price> <duration>", pieceCid, pieceSize, dataCid)

Contributor Author

jsign Apr 7, 2021

When we add offline-deals support in Powergate, we'll add some Powergate CLI commands here.

cmd/pow/cmd/prepare/prepare.go

+              	Short: "Provides commands to prepare data for Filecoin onbarding",
+              	Long:  `Provides commands to prepare data for Filecoin onbarding`,
+              }

Contributor Author

jsign Apr 7, 2021

The commands logic doesn't have many interesting things apart from io.Pipe wirings and similar things.

The pow offline prepare command does the complete pipeline compared to pow offline commp or pow offline car that allow some power user to maybe do partial steps or similar. As I mentioned in the docs, pow offline prepare performs better than simply running both commands separately since already starts calculating Piece(size/cid) at the same time it generates the CAR file.

Anyway, the important logic is extracted in a lib that I'll comment.

cmd/pow/cmd/prepare/prepare.go

+              		if err != nil {
+              			c.Fatal(fmt.Errorf("parsing json flag: %s", err))
+              		}
+              		dataCid, dagService, cls, err := prepareDAGService(cmd, args, quiet)

Contributor Author

jsign Apr 7, 2021

This prepareDAGService is a temporary Badger-backed ipld.DAGService that's used while DAGifying the data.

cmd/pow/cmd/prepare/prepare.go

Comment on lines +225 to +226

		prCommP, pwCommP := io.Pipe()
		teeCAR := io.TeeReader(prCAR, pwCommP)

Contributor Author

jsign Apr 7, 2021

We do here some piping magic to, at the same time the CAR file is being streamed out and being saved in the outputfile (or stdout), we have a copy of the same stream to pipe to Piece(Size/Cid) calculation.

cmd/pow/cmd/prepare/prepare.go


		type closeFunc func() error

		func prepareDAGService(cmd *cobra.Command, args []string, quiet bool) (cid.Cid, ipld.DAGService, closeFunc, error) {

Contributor Author

jsign Apr 7, 2021

So here we create the DAGService, which is the storage layer for the blocks of the DAG.
Depending on the --ipfs-api flag, we create a Badger-based one (temporary), or we rely on a remote go-ipfs node that already has the DAG.

From the POV of the commands, they simply receive a ipld.DAGService... if that's Badger or go-ifps based, that's quite irrelevant for the rest of the logic.

cmd/pow/cmd/prepare/prepare.go

+              	fmt.Fprint(jsonOutput, string(out))
+              }
+              func dagify(ctx context.Context, dagService ipld.DAGService, path string, quiet bool) (cid.Cid, error) {

Contributor Author

jsign Apr 7, 2021

So, leveraging some go-ipfs package to dagify the data, have progress bars, and that pretty stuff.

cmd/pow/cmd/prepare/prepare_test.go

Comment on lines +14 to +38

+              // NOTE: Testing only the `prepare` subcommand will indirectly test
+              // the `car` and `commp` subcommands. This test simply prepares
+              // some data and compares the final piece-size and piece-cid to a
+              // known correct value. If anything in the process (DAGification, CARing)
+              // misbehaves, it will result in a different PieceCID since, at the end of
+              // the day, PieceCID is a fingerprint of the prepared data.
+              func TestOfflinePreparation(t *testing.T) {
+              	testCases := []struct {
+              		size int
+              		json string
+              	}{
+              		{size: 10000, json: `{"piece_size":16384,"piece_cid":"baga6ea4seaqjuk4uh5g7cu5znbvrr7wvfsn2l3xj47rbymvi63uiiroya44lkiy"}`},
+              		{size: 1000, json: `{"piece_size":2048,"piece_cid":"baga6ea4seaqadahcx4ct54tlbvgkqlhmif7kxxkvxz3yf3vr2e4puhvsxdbrgka"}`},
+              		{size: 100, json: `{"piece_size":256,"piece_cid":"baga6ea4seaqd4hgfl6texpf377k7igx2ga2mfwn3lb4c4kdpaq3g3oao2yftuki"}`},
+              	}
+              	for _, test := range testCases {
+              		test := test
+              		t.Run(strconv.Itoa(test.size), func(t *testing.T) {
+              			out, err := run(t, test.size)
+              			require.NoError(t, err)
+              			require.Equal(t, test.json, out)
+              		})
+              	}
+              }

Contributor Author

jsign Apr 7, 2021

We basically test the pow offline prepare CLI command directly.
I've prepared some deterministic test cases with various sizes and defined the expected output.

As mentioned in the comment, since pow offline prepare do all things (dagify, CAR file, and CommP of it) it's already testing everything important that could potentially lead to a wrong result.

cmd/pow/cmd/prepare/prepare_test.go

Comment on lines +59 to +65

+              	stdbuf := bytes.NewBuffer(nil)
+              	jsonOutput = stdbuf
+              	Cmd.SetArgs([]string{"prepare", "--json", f.Name()})
+              	if _, err := Cmd.ExecuteC(); err != nil {
+              		return "", fmt.Errorf("executing command: %s", err)
+              	}

Contributor Author

jsign Apr 7, 2021

Here we override the jsonOutput writer of the CLI so in the test we can plug a buffer to later inspect what was the output.

cmd/pow/common/common.go

               	CmdTimeout = time.Second * 60
               )
+              // FmtOutput allows to configure where Message(), Success(), and

Contributor Author

jsign Apr 7, 2021

Just adding flexibility here to let helper functions write to other places.
By default they still do the usual thing: write to stdout.

But for these commands that writes to stdout generated assets, they write to stderr other metadata output (loading bars, json output, etc). So this allows to set in the CLI command: FmtOutput = os.Stderr, and you know that all message helpers will write to stderr and not mess with stdout that is being used for other thing.

dataprep/dataprep.go

		@@ -0,0 +1,103 @@
		package dataprep

Contributor Author

jsign Apr 7, 2021

In this package we have the functions that do the Piece(size/cid) of data, and the DAGification.

In the CLI commands we are using these functions, we some extra piping wiring.

jsign marked this pull request as ready for review

April 7, 2021 18:59

jsign requested a review from asutula

April 7, 2021 19:10

jsign commented

View reviewed changes

go.mod

+              	github.com/ipfs/go-ds-badger v0.2.6
               	github.com/ipfs/go-ds-badger2 v0.1.1-0.20200708190120-187fc06f714e
+              	github.com/ipfs/go-graphsync v0.7.0 // indirect
+              	github.com/ipfs/go-ipfs v0.8.0

Contributor Author

jsign Apr 7, 2021

I might regret of adding this dependency... mostly because of indirect dependencies that also Louts brings in.
If at some point in the future this is too much of a mess, we might need to extract the subcommand into some other binary with it's own mod.

Doesn't seems like a problem now.

Contributor

sanderpick Apr 7, 2021

At some point, the adder logic could probably be pulled out of coreunix.NewAdder, I think that's what I did for the local bucket repo stuff... but yeah, nbd.

Contributor Author

jsign Apr 7, 2021

Yeah, actually go-ipfs end up using that package so can be narrowed down.

sanderpick approved these changes

View reviewed changes

Contributor

sanderpick left a comment

Nice! LGTM

go.mod

+              	github.com/ipfs/go-ds-badger v0.2.6
               	github.com/ipfs/go-ds-badger2 v0.1.1-0.20200708190120-187fc06f714e
+              	github.com/ipfs/go-graphsync v0.7.0 // indirect
+              	github.com/ipfs/go-ipfs v0.8.0

Contributor

sanderpick Apr 7, 2021

At some point, the adder logic could probably be pulled out of coreunix.NewAdder, I think that's what I did for the local bucket repo stuff... but yeah, nbd.

jsign merged commit a304bdc into master

jsign deleted the jsign/lcl branch

April 7, 2021 19:31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels