WIP: simplified smlbench - two containers (for comparison) by jimpick · Pull Request #74 · testground/testground

jimpick · 2019-10-20T07:01:38Z

Don't merge ... for discussion

I took #73 and modified the test so it would run in 2 separate processes.

It uses the Testground Sync SDK (which uses Redis) to figure out which node is the getter and which is the setter. It uses the Sync SDK to communicate peer IDs, multiaddrs, state and to pass the CID.

It doesn't work in Docker yet (because it's using 127.0.0.1 for the Swarm addresses) ... I'll look into how to fix that next.

jimpick · 2019-10-20T19:29:52Z

Looks like there is the possibility of passing in a 'swarmaddr' to IPTB, so we could set an IP address that is visible from the outside of the Docker container.

https://github.com/ipfs/iptb-plugins/blob/0183b37641a0bf07aafb07dbb778060855cc3e74/local/localipfs.go#L80

jimpick · 2019-10-20T23:34:16Z

I added some code to query the IPv4 address of eth0 and I pass that as the swarmaddr to IPTB. Now works between 2 Docker containers!

daviddias

Lot's of great progress here @jimpick! I made some comments/questions/observations. Would love to see this plan renamed to Plan 2 (as suggested at https://trello.com/c/6kEQv6Xn/36-impl-plan-2-data-transfer-of-random-datasets-bitswap-graphsync) and would love to see an end to end integration with Elastic Search + Kibana. Seeing that will create the template we need to then invite others to create more Test Plans :)

daviddias · 2019-10-21T04:54:11Z

manifests/smlbench.toml

@@ -1,51 +1,52 @@
 name = "smlbench"


@jimpick mind rebasing master onto this branch so that it doens't show things that are smlbench in the smlbench2 (aka Data Transfer Plan) in this PR?

My original thinking was that this might go into the smlbench test - although I stripped out the bit where it runs multiple iterations of loading different file sizes. It's not clear to me if that should be handled internally in the test (eg. spin up the peers first, then loop and try the different file sizes) or if it might be better to loop over the various file sizes at the test runner level, and have completely independent tests. We probably want to support both patterns.

But, as you say, this will be going into "plan 2"... I can easily just rename this and we can backport what we want into smlbench later. I think smlbench (or something similar) can serve as a simpler test case to show to new developers that want to develop a new test case.

daviddias · 2019-10-21T04:55:00Z

manifests/smlbench2.toml

+# seq 0
+[[testcases]]
+name = "simple-add-get"
+instances = { min = 1, max = 1, default = 1 }


If it runs with 2 instances, shouldn't these numbers be min=2, max=2 default = 2?

Going through the code and your update. I read that you used 2 docker containers, however, I don't see anywhere where that instruction happens. I'm guessing you pass --instances=2 when running and it simply doesn't error and does indeed spawn 2 instances, is that the case?

Yes, I pass --instances=2. Maybe I don't need that if the toml file is correct? I'll try that out.

daviddias · 2019-10-21T04:59:23Z

pkg/build/Dockerfile.template

    && CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o testplan ${TESTPLAN_EXEC_PKG}

+# Optionally install IPFS
+RUN if [ -n "${GO_IPFS_VERSION}" ]; then echo Install IPFS ${GO_IPFS_VERSION}; bash /install-ipfs.sh ${GO_IPFS_VERSION}; fi


I wonder if having a separate bash script (https://github.com/ipfs/testground/blob/master/pkg/build/install-ipfs.sh) that needs to be c&p into the container at build time just to do a wget is worth it. It just adds layers of indirection to be. (I know it already existed before this PR, just adding this comment for discussion)

That was my first attempt ... I was trying to keep a single Dockerfile.template that worked for both the DHT test and the smlbench test - Dockerfile doesn't support conditionals like "if" statements, so I needed to have a shell script for that. I'm not really happy with how it turned out, so I think it might make more sense to have multiple templates, or a template that is generated dynamically by testground.

daviddias · 2019-10-21T05:03:05Z

plans/smlbench/utils/testcase.go

@@ -1,11 +1,14 @@
-package smlbench
+package utils


file changes will disappear on rebase from master

daviddias · 2019-10-21T05:03:10Z

plans/smlbench/utils/metrics.go

@@ -1,4 +1,4 @@
-package smlbench
+package utils


file changes will disappear on rebase from master

daviddias · 2019-10-21T05:09:47Z

plans/smlbench2/main.go

+	}
+	addrInfo := &peer.AddrInfo{ID, addrs}
+
+	seq, err := writer.Write(sync.PeerSubtree, addrInfo)


So, it gets the its sequence number when it does the first write because Redis returns the value of an incremental counter as peers appear?

daviddias · 2019-10-21T05:10:13Z

plans/smlbench2/main.go

+        }
+        defer writer.Close()
+
+	client := localNode.Client()


There is never a "remote" in this test as it always spawns a node for itself

In the original test, there's an IPTB node called "adder" and another one called "getter". In this one, the node gets spawned before it knows what it's role is going to be, so I just called it "localNode" as it's local to the Docker container ... but it could be named anything. Maybe just calling it "node" is good? Naming things is hard.

just calling it node would be better indeed

daviddias · 2019-10-21T05:11:15Z

plans/smlbench2/main.go

+			return
+		}
+
+		runenv.EmitMetric(utils.MetricTimeToAdd, float64(time.Now().Sub(tstarted)/time.Millisecond))


Note that for this Plan. The time adding the file isn't important. Only the time transferring it. (+ the other things to measure, such as number of repeated blocks, etc as described in the spec for the plan)

I just copied that from the original smlbench test. I'm not even really sure yet what schema for the JSON is going to work well with graphing things in Kibana - I think we need to try to build some reports/graphing there really soon.

I'd love to dump as many metrics, events, traces as we possibly can, but tag things in an understandable way so we can easily generate reports with the most important metrics that we care about. It should be clear in the test what the primary metrics are vs. secondary metrics.

daviddias · 2019-10-21T05:13:30Z

plans/smlbench2/test/simple_add_get.go

@@ -0,0 +1,77 @@
+package test


It seams that this file is just c&p from smlbench as the test for smlbench2 is on main.go

Oh yeah, I was going to delete that file, I just forgot to. I was copying/pasting from it.

daviddias · 2019-10-21T05:14:14Z

plans/smlbench2/utils/files.go

+// specified directory.
+//
+// It is the callers responsibility to delete this file when done.
+func TempRandFile(runenv *runtime.RunEnv, dir string, size int64) *os.File {


This util will probably end up on the sdk as I'm sure it will be used many times across many plans :)

Once we write a few more tests, I think it's going to become clear what needs to go into the SDK. Already, I can see a lot of things that don't have to be boilerplate.

daviddias · 2019-10-22T22:43:11Z

@jimpick when you refactor and rename this smlbench2 plan, please do so on top of this folder https://github.com/ipfs/testground/tree/master/plans/data-transfer-datasets-random that has the Test Plan that you are implementing.

jimpick · 2019-10-22T23:50:04Z

@daviddias Sounds great. I should be starting that very soon.

jimpick · 2019-10-23T06:35:23Z

I got the 2 container test to work across Docker containers running on two different EC2 machines!

The setup was:

setup Redis on the first machine and export the REDIS_HOST environment variable with the IP address
build the testground Docker image for the test, and manually push it to an AWS ECR Docker registry
manually create a Docker network on each machine with a different subnet
create GRE tunnel between the two machines, with each end of the tunnel terminated in the Docker bridge subnets https://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.tunnel.gre.html
added routing on each EC2 machine for the subnets
add a high-priority ACCEPT rule in the iptables FORWARD table for the tunnel
match the IP range in the iptables masquerade tables to prevent NAT
small shell script on each machine to invoke the test

Here's the demo GIF showing it working - transferring 100MB between Docker instances on two different EC2 machines:

jimpick · 2019-10-23T17:17:04Z

I wrote up an explanation of how the test works yesterday. I'll link it from here so we can find it again.

#78 (comment)

daviddias · 2019-11-04T12:53:44Z

plans/smlbench2/doc.go

@@ -0,0 +1,3 @@
+// Package smlbench runs small benchmarks on IPFS.


What's this file for?

daviddias · 2019-11-04T12:55:27Z

plans/smlbench2/main.go

+        }
+        defer writer.Close()
+
+	client := localNode.Client()


just calling it node would be better indeed

daviddias · 2019-11-04T12:56:09Z

plans/smlbench2/main.go

+		defer os.Remove(file.Name())
+
+		tstarted := time.Now()
+		hash, err := adder.Add(file)


Suggested change

hash, err := adder.Add(file)

cidStr, err := adder.Add(file)

daviddias · 2019-11-04T12:56:25Z

plans/smlbench2/main.go

+			runenv.Abort(err)
+			return
+		}
+		fmt.Println("cid", hash)


Suggested change

fmt.Println("cid", hash)

fmt.Println("cid", cidStr)

daviddias · 2019-11-04T12:58:21Z

plans/smlbench2/main.go

+		}
+		fmt.Println("cid", hash)
+
+		c, err := cid.Parse(hash)


Does this cid. come from the go-cid package? (The lack of convention on naming imports in go breaks my brain)

daviddias · 2019-11-04T13:01:22Z

plans/smlbench2/main.go

+				runenv.Abort(err)
+				return
+			}
+			runenv.EmitMetric(utils.MetricTimeToGet, float64(time.Now().Sub(tstarted)/time.Millisecond))


Probably move the metrics to its own package so that it is metrics.TimeToGet

daviddias · 2019-11-06T22:53:27Z

Once #126 is merged, let's get this one to work with the cluster:runner as well

daviddias · 2019-11-07T10:15:27Z

Alright, #126 has been merged.

Next steps for this PR:

Rebase master onto this branch
Rename smlbench2 to data-transfer-of-random-datasets
Make this test work with the Docker Swarm cluster in AWS

daviddias · 2019-12-02T20:15:41Z

@jimpick anything left to be done in this PR?

jimpick · 2019-12-03T00:28:31Z

Previously you suggested using this as a base for: https://github.com/ipfs/testground/tree/master/plans/data-transfer-datasets-random

But it's also completely different than the test plan described in the README... not sure how things converge.

I'll go ahead and prepare the PR to apply it against that test plan as requested.

Also, now that the sidecar-based traffic shaping is working, that would go really nicely with this.

Include go-ipfs in Docker image when go_ipfs_version is defined If not defined, Docker build still works. Get smlbench working again Changed Docker base image to Debian because busybox or alpine don't work with our precompiled ipfs binaries. Also added a small delay when destroying the ensemble to avoid a race. Create new smlbench2 test suite Copied from smlbench - only use simple_add_get Use Testground Sync SDK ... create watcher, writer Get sequence number via Testground Sync SDK API (from Redis) Construct the AddrInfo from the PeerId from IPTB ... will add the multiaddrs on as well. Add random file on adder peer Wait until 'added' state is reached on getter peer I made a small change to the Sync SDK 'Barrier' method so that it doesn't error out if the state doesn't exist yet in redis. Signal and wait for "added" and "received" states Might need to add timeouts somewhere Add multiaddrs to redis Connect from getter to adder node Pass CID via subtree on Redis, download on getter node It works! Query for IPv4 address of eth0 and pass as swarmaddr to IPTB Make the testcase work between 2 docker instances! Remove unused file

Still fails at runtime.

jimpick · 2019-12-03T02:30:41Z

I've renamed the test and squashed/rebased to master, but it's failing when I run it:

./testground --vv run --builder docker:go --runner local:docker --build-cfg bypass_cache=true data-transfer-datasets-random/simple-add-get

"swarm/connect: connect QmRqBzsk6MWGxKp1T7uAfpvie2T4aN9XaLg8wAcWzK4c3V failure: failed to dial : all dials failed\n  * [/ip4/172.20.0.7/tcp/46459] dial tcp4 172.20.0.7:46459: connect: network is unreachable"

(possibly related to sidecar?)

The branch modifies the builder and Dockerfile under pkg/build/golang in an unclean way ... I think it needs a different approach.

daviddias · 2019-12-11T09:23:01Z

AFAIU, we won't be using the code present in this PR any longer. Closing it down, reopen if necessary.

jimpick · 2019-12-13T20:27:12Z

I still need this for reference purposes, so I'm keeping the branch alive over at: https://github.com/jimpick/testground/tree/smlbench2

jimpick mentioned this pull request Oct 20, 2019

Plan: smlbench #60

Merged

daviddias reviewed Oct 21, 2019

View reviewed changes

jimpick mentioned this pull request Oct 28, 2019

Using testground to test Bitswap in varying leech / seed combinations #79

Closed

daviddias requested a review from raulk October 30, 2019 09:34

daviddias mentioned this pull request Oct 30, 2019

Weekly Sync Notes 2019-10-29 #89

Merged

daviddias reviewed Nov 4, 2019

View reviewed changes

daviddias assigned jimpick Nov 7, 2019

daviddias added the P0 - Critical label Nov 7, 2019

daviddias added P1 - High and removed P0 - Critical labels Nov 11, 2019

daviddias added P2 - Medium and removed P1 - High labels Dec 2, 2019

jimpick force-pushed the plan-smlbench-two-containers branch 2 times, most recently from f3746e8 to 37e5c55 Compare December 3, 2019 01:25

jimpick force-pushed the plan-smlbench-two-containers branch from 37e5c55 to 2df0768 Compare December 3, 2019 01:32

Rename smlbench2 to data-transfer-datasets-random

028cc75

Still fails at runtime.

jimpick mentioned this pull request Dec 6, 2019

Provide an option to replace the base docker image #49

Closed

daviddias changed the title ~~simplified smlbench - two containers (for comparison)~~ WIP: simplified smlbench - two containers (for comparison) Dec 10, 2019

daviddias closed this Dec 11, 2019

daviddias deleted the plan-smlbench-two-containers branch December 11, 2019 09:23

		@@ -0,0 +1,3 @@
		// Package smlbench runs small benchmarks on IPFS.

Conversation

jimpick commented Oct 20, 2019

Uh oh!

jimpick commented Oct 20, 2019

Uh oh!

jimpick commented Oct 20, 2019

Uh oh!

daviddias left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daviddias commented Oct 22, 2019

Uh oh!

jimpick commented Oct 22, 2019

Uh oh!

jimpick commented Oct 23, 2019

Uh oh!

jimpick commented Oct 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daviddias commented Nov 6, 2019

Uh oh!

daviddias commented Nov 7, 2019

Next steps for this PR:

Uh oh!

daviddias commented Dec 2, 2019

Uh oh!

jimpick commented Dec 3, 2019

Uh oh!

jimpick commented Dec 3, 2019