Skip to content

Wait for bundle scale: do not immediately exit when encountering error#493

Merged
huntc merged 1 commit intotypesafehub:masterfrom
fsat:scale-wait-improve
Jun 14, 2017
Merged

Wait for bundle scale: do not immediately exit when encountering error#493
huntc merged 1 commit intotypesafehub:masterfrom
fsat:scale-wait-improve

Conversation

@fsat
Copy link
Copy Markdown
Contributor

@fsat fsat commented Jun 14, 2017

Instead ignore the error for the first ten seconds to allow the bundle to start and attempt to rectify its error.

@fsat
Copy link
Copy Markdown
Contributor Author

fsat commented Jun 14, 2017

Marked as wip - manual test pending.

@fsat fsat force-pushed the scale-wait-improve branch from a2025db to b45de74 Compare June 14, 2017 04:16
@fsat
Copy link
Copy Markdown
Contributor Author

fsat commented Jun 14, 2017

Manual test is completed successfully.

The wait for bundle scale behaviour now ignores the bundle error for the first 10 seconds.

Setup

Create the following test bundle.

192-168-1-5:test-failing-bundle felixsatyaputra$ find test-failing-bundle -type f
test-failing-bundle/bundle.conf
test-failing-bundle/one/start-one

192-168-1-5:test-failing-bundle felixsatyaputra$ cat test-failing-bundle/bundle.conf
version                  = "1"
name                     = "failing-bundle"
system                   = "failing-bundle"
systemVersion            = "0.1.0"
compatibilityVersion     = "1"
nrOfCpus                 = 0.1
memory                   = 8000000
diskSpace                = 10000000
roles                    = ["test"]
components               = {
  "one" = {
    description      = "A script that echos 5 times, and exits with code 1"
    file-system-type = "universal"
    start-command    = ["one/start-one"]
    endpoints        = {
    }
  }
}

192-168-1-5:test-failing-bundle felixsatyaputra$ cat test-failing-bundle/one/start-one
#!/usr/bin/env bash

echo "Sleep - 1"
sleep 1
echo "Bailing out"
exit 1

Package the bundle using shazar

192-168-1-5:test-failing-bundle felixsatyaputra$ sh test.sh
+ shazar test-failing-bundle
./test-failing-bundle-13eac5ec7acae4691104dfec8847a202930929e30f6c54646bfd8f54085b5a9a.zip

Start the sandbox

+ sandbox run 2.1.0-alpha.1
|------------------------------------------------|
| Stopping ConductR                              |
|------------------------------------------------|
ConductR core pid 22855 stopped
ConductR agent pid 22956 stopped
ConductR has been successfully stopped
|------------------------------------------------|
| Starting ConductR                              |
|------------------------------------------------|
Extracting ConductR core to /Users/felixsatyaputra/.conductr/images/core
Extracting ConductR agent to /Users/felixsatyaputra/.conductr/images/agent
Starting ConductR core instance on 192.168.10.1..
Waiting for ConductR to start..
Starting ConductR agent instance on 192.168.10.1..
|------------------------------------------------|
| OCI-in-Docker support unavailable.             |
|------------------------------------------------|
|------------------------------------------------|
| To provide support ensure Docker is running    |
| and restart the sandbox                        |
|------------------------------------------------|
|------------------------------------------------|
| Starting logging feature based on eslite       |
|------------------------------------------------|
Deploying bundle eslite..
Retrieving bundle..
Loading bundle from cache typesafe/bundle/eslite
Bintray credentials loaded from /Users/felixsatyaputra/.lightbend/commercial.credentials
Retrieving from cache /Users/felixsatyaputra/.conductr/cache/bundle/eslite-2.1.0-57e432d0c647be2bbc83fa8e59ee469bb59d1f72df31f3d82cab0ad396130fe7.zip
Loading bundle to ConductR..
[##################################################] 100%
Bundle 57e432d0c647be2bbc83fa8e59ee469b is installed
Bundle loaded.
Bundle run request sent.
Bundle 57e432d0c647be2bbc83fa8e59ee469b waiting to reach expected scale 1
Bundle 57e432d0c647be2bbc83fa8e59ee469b has scale 0, expected 1...
Bundle 57e432d0c647be2bbc83fa8e59ee469b expected scale 1 is met
|------------------------------------------------|
| Summary                                        |
|------------------------------------------------|
|- - - - - - - - - - - - - - - - - - - - - - - - |
| ConductR                                       |
|- - - - - - - - - - - - - - - - - - - - - - - - |
ConductR has been started:
  core instance on 192.168.10.1
  agent instance on 192.168.10.1
ConductR service locator has been started on:
  192.168.10.1:9008
|- - - - - - - - - - - - - - - - - - - - - - - - |
| Proxy                                          |
|- - - - - - - - - - - - - - - - - - - - - - - - |
HAProxy has not been started
To enable proxying ensure Docker is running and restart the sandbox
|- - - - - - - - - - - - - - - - - - - - - - - - |
| Bundles                                        |
|- - - - - - - - - - - - - - - - - - - - - - - - |
Check latest bundle status with:
  conduct info
Current bundle status:
Licensed To: cc64df31-ec6b-4e08-bb6b-3216721a56b@lightbend
Max ConductR agents: 10
ConductR Version(s): 0.1.0, 2.1.*
Grants: akka-sbr, cinnamon, conductr

ID       NAME      TAG  #REP  #STR  #RUN  ROLES
57e432d  eslite  2.1.0     1     0     1  elasticsearch

Load and run the test bundle - this will eventually fail.

+ conduct load ./test-failing-bundle-13eac5ec7acae4691104dfec8847a202930929e30f6c54646bfd8f54085b5a9a.zip
Retrieving bundle..
Loading bundle to ConductR..
[##################################################] 100%
Bundle 13eac5ec7acae4691104dfec8847a202 is installed
Bundle loaded.
Start bundle with:        conduct run 13eac5e
Unload bundle with:       conduct unload 13eac5e
Print ConductR info with: conduct info
Print bundle info with:   conduct info 13eac5e

+ conduct run fai
Bundle run request sent.
Bundle 13eac5ec7acae4691104dfec8847a202 waiting to reach expected scale 1
Bundle 13eac5ec7acae4691104dfec8847a202 has scale 0, expected 1...................
Error: Failure to scale bundle 13eac5ec7acae4691104dfec8847a202

Check latest bundle events with:
  conduct events 13eac5ec7acae4691104dfec8847a202
Current bundle events:
TIME                          EVENT                                         DESC
Wed 2017-06-14T14:20:13+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:13+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1
Wed 2017-06-14T14:20:14+1000  conductr.launcher.bundleExited                Bundle exited: host=192.168.10.1, exitValue=143 - rescheduling its execution
Wed 2017-06-14T14:20:14+1000  conductr.scaleScheduler.scaleRescheduled      Scale of 1 rescheduled
Wed 2017-06-14T14:20:15+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:15+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1
Wed 2017-06-14T14:20:16+1000  conductr.launcher.bundleExited                Bundle exited: host=192.168.10.1, exitValue=143 - rescheduling its execution
Wed 2017-06-14T14:20:16+1000  conductr.scaleScheduler.scaleRescheduled      Scale of 1 rescheduled
Wed 2017-06-14T14:20:16+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:16+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1

Check latest bundle logs with:
  conduct logs 13eac5ec7acae4691104dfec8847a202
Current bundle logs:
TIME                          HOST                     LOG
Wed 2017-06-14T14:20:13+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:13+1000  192-168-1-5.tpgi.com.au  Sleep - 1
Wed 2017-06-14T14:20:14+1000  192-168-1-5.tpgi.com.au  Bailing out
Wed 2017-06-14T14:20:14+1000  192-168-1-5.tpgi.com.au  Component one exited with 1
Wed 2017-06-14T14:20:14+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:15+1000  192-168-1-5.tpgi.com.au  Sleep - 1
Wed 2017-06-14T14:20:16+1000  192-168-1-5.tpgi.com.au  Bailing out
Wed 2017-06-14T14:20:16+1000  192-168-1-5.tpgi.com.au  Component one exited with 1
Wed 2017-06-14T14:20:16+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:16+1000  192-168-1-5.tpgi.com.au  Sleep - 1

Error: Bundle 13eac5ec7acae4691104dfec8847a202 has error

Inspect the latest bundle events and logs using:
  conduct events 13eac5ec7acae4691104dfec8847a202
  conduct logs 13eac5ec7acae4691104dfec8847a202

The bundle has error (i.e. Error attribute equals to Yes)

192-168-1-5:test-failing-bundle felixsatyaputra$ conduct info fa
BUNDLE ATTRIBUTES
-----------------
Bundle Id              ! 13eac5e
Bundle Name            failing-bundle
Compatibility Version  1
System                 failing-bundle
System Version         0.1.0
Tags
Nr of CPUs             0.1
Memory                 8000000
Disk Space             10000000
Roles                  test
Bundle Digest          13eac5ec7acae4691104dfec8847a202930929e30f6c54646bfd8f54085b5a9a
Error                  Yes

BUNDLE SCALE
------------
Nr of Reschedules  13
Scale              1

BUNDLE INSTALLATIONS
--------------------
Host    192.168.10.1
Bundle  /Users/felixsatyaputra/.conductr/images/tmp/conductr/192.168.10.1/bundles/13eac5ec7acae4691104dfec8847a202930929e30f6c54646bfd8f54085b5a9a.zip

Test Result

Rerunning the same bundle now waits for 10 seconds before checking the error state.

192-168-1-5:test-failing-bundle felixsatyaputra$ conduct run fa
Bundle run request sent.
Bundle 13eac5ec7acae4691104dfec8847a202 waiting to reach expected scale 1
Bundle 13eac5ec7acae4691104dfec8847a202 has scale 0, expected 1....................
Error: Failure to scale bundle 13eac5ec7acae4691104dfec8847a202

Check latest bundle events with:
  conduct events 13eac5ec7acae4691104dfec8847a202
Current bundle events:
TIME                          EVENT                                         DESC
Wed 2017-06-14T14:20:40+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:40+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1
Wed 2017-06-14T14:20:41+1000  conductr.launcher.bundleExited                Bundle exited: host=192.168.10.1, exitValue=143 - rescheduling its execution
Wed 2017-06-14T14:20:41+1000  conductr.scaleScheduler.scaleRescheduled      Scale of 1 rescheduled
Wed 2017-06-14T14:20:41+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:42+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1
Wed 2017-06-14T14:20:43+1000  conductr.launcher.bundleExited                Bundle exited: host=192.168.10.1, exitValue=143 - rescheduling its execution
Wed 2017-06-14T14:20:43+1000  conductr.scaleScheduler.scaleRescheduled      Scale of 1 rescheduled
Wed 2017-06-14T14:20:43+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:43+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1

Check latest bundle logs with:
  conduct logs 13eac5ec7acae4691104dfec8847a202
Current bundle logs:
TIME                          HOST                     LOG
Wed 2017-06-14T14:20:40+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:40+1000  192-168-1-5.tpgi.com.au  Sleep - 1
Wed 2017-06-14T14:20:41+1000  192-168-1-5.tpgi.com.au  Bailing out
Wed 2017-06-14T14:20:41+1000  192-168-1-5.tpgi.com.au  Component one exited with 1
Wed 2017-06-14T14:20:41+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:42+1000  192-168-1-5.tpgi.com.au  Sleep - 1
Wed 2017-06-14T14:20:43+1000  192-168-1-5.tpgi.com.au  Bailing out
Wed 2017-06-14T14:20:43+1000  192-168-1-5.tpgi.com.au  Component one exited with 1
Wed 2017-06-14T14:20:43+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:43+1000  192-168-1-5.tpgi.com.au  Sleep - 1

Error: Bundle 13eac5ec7acae4691104dfec8847a202 has error

Inspect the latest bundle events and logs using:
  conduct events 13eac5ec7acae4691104dfec8847a202
  conduct logs 13eac5ec7acae4691104dfec8847a202

Instead ignore the error for the first ten seconds to allow the bundle to start and attempt to rectify its error.
@fsat fsat force-pushed the scale-wait-improve branch from b45de74 to 248cd2c Compare June 14, 2017 04:25
@fsat fsat changed the title WIP - DON'T MERGE - Wait for bundle scale: do not immediately exit when encountering error Wait for bundle scale: do not immediately exit when encountering error Jun 14, 2017
@fsat
Copy link
Copy Markdown
Contributor Author

fsat commented Jun 14, 2017

@huntc huntc merged commit 30063b7 into typesafehub:master Jun 14, 2017
@fsat
Copy link
Copy Markdown
Contributor Author

fsat commented Jun 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants