Skip to content

WIP: simplified smlbench - two containers (for comparison)#74

Closed
jimpick wants to merge 2 commits intomasterfrom
plan-smlbench-two-containers
Closed

WIP: simplified smlbench - two containers (for comparison)#74
jimpick wants to merge 2 commits intomasterfrom
plan-smlbench-two-containers

Conversation

@jimpick
Copy link
Copy Markdown
Contributor

@jimpick jimpick commented Oct 20, 2019

Don't merge ... for discussion

I took #73 and modified the test so it would run in 2 separate processes.

It uses the Testground Sync SDK (which uses Redis) to figure out which node is the getter and which is the setter. It uses the Sync SDK to communicate peer IDs, multiaddrs, state and to pass the CID.

testground-simple-add-get

It doesn't work in Docker yet (because it's using 127.0.0.1 for the Swarm addresses) ... I'll look into how to fix that next.

@jimpick jimpick mentioned this pull request Oct 20, 2019
@jimpick
Copy link
Copy Markdown
Contributor Author

jimpick commented Oct 20, 2019

Looks like there is the possibility of passing in a 'swarmaddr' to IPTB, so we could set an IP address that is visible from the outside of the Docker container.

https://github.com/ipfs/iptb-plugins/blob/0183b37641a0bf07aafb07dbb778060855cc3e74/local/localipfs.go#L80

@jimpick
Copy link
Copy Markdown
Contributor Author

jimpick commented Oct 20, 2019

I added some code to query the IPv4 address of eth0 and I pass that as the swarmaddr to IPTB. Now works between 2 Docker containers!

Copy link
Copy Markdown
Contributor

@daviddias daviddias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot's of great progress here @jimpick! I made some comments/questions/observations. Would love to see this plan renamed to Plan 2 (as suggested at https://trello.com/c/6kEQv6Xn/36-impl-plan-2-data-transfer-of-random-datasets-bitswap-graphsync) and would love to see an end to end integration with Elastic Search + Kibana. Seeing that will create the template we need to then invite others to create more Test Plans :)

@@ -1,51 +1,52 @@
name = "smlbench"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimpick mind rebasing master onto this branch so that it doens't show things that are smlbench in the smlbench2 (aka Data Transfer Plan) in this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thinking was that this might go into the smlbench test - although I stripped out the bit where it runs multiple iterations of loading different file sizes. It's not clear to me if that should be handled internally in the test (eg. spin up the peers first, then loop and try the different file sizes) or if it might be better to loop over the various file sizes at the test runner level, and have completely independent tests. We probably want to support both patterns.

But, as you say, this will be going into "plan 2"... I can easily just rename this and we can backport what we want into smlbench later. I think smlbench (or something similar) can serve as a simpler test case to show to new developers that want to develop a new test case.

# seq 0
[[testcases]]
name = "simple-add-get"
instances = { min = 1, max = 1, default = 1 }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it runs with 2 instances, shouldn't these numbers be min=2, max=2 default = 2?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through the code and your update. I read that you used 2 docker containers, however, I don't see anywhere where that instruction happens. I'm guessing you pass --instances=2 when running and it simply doesn't error and does indeed spawn 2 instances, is that the case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I pass --instances=2. Maybe I don't need that if the toml file is correct? I'll try that out.

&& CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o testplan ${TESTPLAN_EXEC_PKG}

# Optionally install IPFS
RUN if [ -n "${GO_IPFS_VERSION}" ]; then echo Install IPFS ${GO_IPFS_VERSION}; bash /install-ipfs.sh ${GO_IPFS_VERSION}; fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if having a separate bash script (https://github.com/ipfs/testground/blob/master/pkg/build/install-ipfs.sh) that needs to be c&p into the container at build time just to do a wget is worth it. It just adds layers of indirection to be. (I know it already existed before this PR, just adding this comment for discussion)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first attempt ... I was trying to keep a single Dockerfile.template that worked for both the DHT test and the smlbench test - Dockerfile doesn't support conditionals like "if" statements, so I needed to have a shell script for that. I'm not really happy with how it turned out, so I think it might make more sense to have multiple templates, or a template that is generated dynamically by testground.

@@ -1,11 +1,14 @@
package smlbench
package utils
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file changes will disappear on rebase from master

@@ -1,4 +1,4 @@
package smlbench
package utils
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file changes will disappear on rebase from master

}
addrInfo := &peer.AddrInfo{ID, addrs}

seq, err := writer.Write(sync.PeerSubtree, addrInfo)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it gets the its sequence number when it does the first write because Redis returns the value of an incremental counter as peers appear?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

}
defer writer.Close()

client := localNode.Client()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is never a "remote" in this test as it always spawns a node for itself

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original test, there's an IPTB node called "adder" and another one called "getter". In this one, the node gets spawned before it knows what it's role is going to be, so I just called it "localNode" as it's local to the Docker container ... but it could be named anything. Maybe just calling it "node" is good? Naming things is hard.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just calling it node would be better indeed

return
}

runenv.EmitMetric(utils.MetricTimeToAdd, float64(time.Now().Sub(tstarted)/time.Millisecond))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for this Plan. The time adding the file isn't important. Only the time transferring it. (+ the other things to measure, such as number of repeated blocks, etc as described in the spec for the plan)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied that from the original smlbench test. I'm not even really sure yet what schema for the JSON is going to work well with graphing things in Kibana - I think we need to try to build some reports/graphing there really soon.

I'd love to dump as many metrics, events, traces as we possibly can, but tag things in an understandable way so we can easily generate reports with the most important metrics that we care about. It should be clear in the test what the primary metrics are vs. secondary metrics.

@@ -0,0 +1,77 @@
package test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seams that this file is just c&p from smlbench as the test for smlbench2 is on main.go

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I was going to delete that file, I just forgot to. I was copying/pasting from it.

// specified directory.
//
// It is the callers responsibility to delete this file when done.
func TempRandFile(runenv *runtime.RunEnv, dir string, size int64) *os.File {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This util will probably end up on the sdk as I'm sure it will be used many times across many plans :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we write a few more tests, I think it's going to become clear what needs to go into the SDK. Already, I can see a lot of things that don't have to be boilerplate.

@daviddias
Copy link
Copy Markdown
Contributor

@jimpick when you refactor and rename this smlbench2 plan, please do so on top of this folder https://github.com/ipfs/testground/tree/master/plans/data-transfer-datasets-random that has the Test Plan that you are implementing.

@jimpick
Copy link
Copy Markdown
Contributor Author

jimpick commented Oct 22, 2019

@daviddias Sounds great. I should be starting that very soon.

@jimpick
Copy link
Copy Markdown
Contributor Author

jimpick commented Oct 23, 2019

I got the 2 container test to work across Docker containers running on two different EC2 machines!

The setup was:

  • setup Redis on the first machine and export the REDIS_HOST environment variable with the IP address
  • build the testground Docker image for the test, and manually push it to an AWS ECR Docker registry
  • manually create a Docker network on each machine with a different subnet
  • create GRE tunnel between the two machines, with each end of the tunnel terminated in the Docker bridge subnets https://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.tunnel.gre.html
  • added routing on each EC2 machine for the subnets
  • add a high-priority ACCEPT rule in the iptables FORWARD table for the tunnel
  • match the IP range in the iptables masquerade tables to prevent NAT
  • small shell script on each machine to invoke the test

Here's the demo GIF showing it working - transferring 100MB between Docker instances on two different EC2 machines:

testground-two-ec2-vms-bridged-gre

@jimpick
Copy link
Copy Markdown
Contributor Author

jimpick commented Oct 23, 2019

I wrote up an explanation of how the test works yesterday. I'll link it from here so we can find it again.

#78 (comment)

@@ -0,0 +1,3 @@
// Package smlbench runs small benchmarks on IPFS.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this file for?

}
defer writer.Close()

client := localNode.Client()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just calling it node would be better indeed

defer os.Remove(file.Name())

tstarted := time.Now()
hash, err := adder.Add(file)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hash, err := adder.Add(file)
cidStr, err := adder.Add(file)

runenv.Abort(err)
return
}
fmt.Println("cid", hash)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt.Println("cid", hash)
fmt.Println("cid", cidStr)

}
fmt.Println("cid", hash)

c, err := cid.Parse(hash)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cid. come from the go-cid package? (The lack of convention on naming imports in go breaks my brain)

runenv.Abort(err)
return
}
runenv.EmitMetric(utils.MetricTimeToGet, float64(time.Now().Sub(tstarted)/time.Millisecond))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably move the metrics to its own package so that it is metrics.TimeToGet

@daviddias
Copy link
Copy Markdown
Contributor

Once #126 is merged, let's get this one to work with the cluster:runner as well

@daviddias
Copy link
Copy Markdown
Contributor

Alright, #126 has been merged.

Next steps for this PR:

  • Rebase master onto this branch
  • Rename smlbench2 to data-transfer-of-random-datasets
  • Make this test work with the Docker Swarm cluster in AWS

@daviddias
Copy link
Copy Markdown
Contributor

@jimpick anything left to be done in this PR?

@jimpick
Copy link
Copy Markdown
Contributor Author

jimpick commented Dec 3, 2019

Previously you suggested using this as a base for: https://github.com/ipfs/testground/tree/master/plans/data-transfer-datasets-random

But it's also completely different than the test plan described in the README... not sure how things converge.

I'll go ahead and prepare the PR to apply it against that test plan as requested.

Also, now that the sidecar-based traffic shaping is working, that would go really nicely with this.

@jimpick jimpick force-pushed the plan-smlbench-two-containers branch 2 times, most recently from f3746e8 to 37e5c55 Compare December 3, 2019 01:25
Include go-ipfs in Docker image when go_ipfs_version is defined

If not defined, Docker build still works.

Get smlbench working again

Changed Docker base image to Debian because busybox or alpine don't
work with our precompiled ipfs binaries.

Also added a small delay when destroying the ensemble to avoid a
race.

Create new smlbench2 test suite

Copied from smlbench - only use simple_add_get

Use Testground Sync SDK ... create watcher, writer

Get sequence number via Testground Sync SDK API (from Redis)

Construct the AddrInfo from the PeerId from IPTB ... will add the
multiaddrs on as well.

Add random file on adder peer

Wait until 'added' state is reached on getter peer

I made a small change to the Sync SDK 'Barrier' method so that
it doesn't error out if the state doesn't exist yet in redis.

Signal and wait for "added" and "received" states

Might need to add timeouts somewhere

Add multiaddrs to redis

Connect from getter to adder node

Pass CID via subtree on Redis, download on getter node

It works!

Query for IPv4 address of eth0 and pass as swarmaddr to IPTB

Make the testcase work between 2 docker instances!

Remove unused file
@jimpick jimpick force-pushed the plan-smlbench-two-containers branch from 37e5c55 to 2df0768 Compare December 3, 2019 01:32
@jimpick
Copy link
Copy Markdown
Contributor Author

jimpick commented Dec 3, 2019

I've renamed the test and squashed/rebased to master, but it's failing when I run it:

./testground --vv run --builder docker:go --runner local:docker --build-cfg bypass_cache=true data-transfer-datasets-random/simple-add-get
"swarm/connect: connect QmRqBzsk6MWGxKp1T7uAfpvie2T4aN9XaLg8wAcWzK4c3V failure: failed to dial : all dials failed\n  * [/ip4/172.20.0.7/tcp/46459] dial tcp4 172.20.0.7:46459: connect: network is unreachable"

(possibly related to sidecar?)

The branch modifies the builder and Dockerfile under pkg/build/golang in an unclean way ... I think it needs a different approach.

@daviddias daviddias changed the title simplified smlbench - two containers (for comparison) WIP: simplified smlbench - two containers (for comparison) Dec 10, 2019
@daviddias
Copy link
Copy Markdown
Contributor

AFAIU, we won't be using the code present in this PR any longer. Closing it down, reopen if necessary.

@daviddias daviddias closed this Dec 11, 2019
@daviddias daviddias deleted the plan-smlbench-two-containers branch December 11, 2019 09:23
@jimpick
Copy link
Copy Markdown
Contributor Author

jimpick commented Dec 13, 2019

I still need this for reference purposes, so I'm keeping the branch alive over at: https://github.com/jimpick/testground/tree/smlbench2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants