Skip to content

Fix the exit code return issue in shell scripts#1107

Merged
liat-grozovik merged 2 commits intosonic-net:masterfrom
wangxin:capture-shell-err
Sep 16, 2019
Merged

Fix the exit code return issue in shell scripts#1107
liat-grozovik merged 2 commits intosonic-net:masterfrom
wangxin:capture-shell-err

Conversation

@wangxin
Copy link
Copy Markdown
Collaborator

@wangxin wangxin commented Sep 10, 2019

Description of PR

Summary:
Fixes # (issue)

In shell script, by default exit code of the last command
in commands connected by pipe is returned. If previous
command returned none zero exit code, the error is
not captured. If ansible run such script in playbook, the
script issue is ignored and ansible is not aware of it.

For example, block in fdb.yml:

  - name: Remove existing IPs from PTF host
    script: roles/test/files/helpers/remove_ip.sh
    delegate_to: "{{ptf_host}}" 

fails to detect issue which looks in log like this:

02:09:48 TASK [test : Remove existing IPs from PTF host] ********************************
02:09:48 task path: /builds2/jenkins2/workspace/SONiC_Test-FDB/ansible/roles/test/tasks/fdb.yml:11
02:09:48 Thursday 25 July 2019  23:09:48 +0000 (0:00:00.297)       0:00:13.829 ********* 
02:09:48 <10.213.103.5> ESTABLISH SSH CONNECTION FOR USER: root
02:09:48 <10.213.103.5> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=120s)(-o)(UserKnownHostsFile=/dev/null)
02:09:48 <10.213.103.5> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
02:09:48 <10.213.103.5> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=root)
02:09:48 <10.213.103.5> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=30)
02:09:48 <10.213.103.5> SSH: PlayContext set ssh_common_args: ()
02:09:48 <10.213.103.5> SSH: PlayContext set ssh_extra_args: ()
02:09:48 <10.213.103.5> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r)
02:09:48 <10.213.103.5> SSH: EXEC sshpass -d14 ssh -C -vvv -o ControlMaster=auto -o ControlPersist=120s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o User=root -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r -tt 10.213.103.5 'mkdir -p "$( echo $HOME/.ansible/tmp/ansible-tmp-1564096188.49-246685053741557 )" && echo "$( echo $HOME/.ansible/tmp/ansible-tmp-1564096188.49-246685053741557 )"'
02:10:11 <10.213.103.5> PUT /builds2/jenkins2/workspace/SONiC_Test-FDB/ansible/roles/test/files/helpers/remove_ip.sh TO /root/.ansible/tmp/ansible-tmp-1564096188.49-246685053741557/remove_ip.sh
02:10:11 <10.213.103.5> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=120s)(-o)(UserKnownHostsFile=/dev/null)
02:10:11 <10.213.103.5> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
02:10:11 <10.213.103.5> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=root)
02:10:11 <10.213.103.5> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=30)
02:10:11 <10.213.103.5> SSH: PlayContext set ssh_common_args: ()
02:10:11 <10.213.103.5> SSH: PlayContext set sftp_extra_args: ()
02:10:11 <10.213.103.5> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r)
02:10:11 <10.213.103.5> SSH: EXEC sshpass -d14 sftp -b - -C -vvv -o ControlMaster=auto -o ControlPersist=120s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o User=root -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r '[10.213.103.5]'
02:10:11 <10.213.103.5> ESTABLISH SSH CONNECTION FOR USER: root
02:10:11 <10.213.103.5> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=120s)(-o)(UserKnownHostsFile=/dev/null)
02:10:11 <10.213.103.5> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
02:10:11 <10.213.103.5> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=root)
02:10:11 <10.213.103.5> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=30)
02:10:11 <10.213.103.5> SSH: PlayContext set ssh_common_args: ()
02:10:11 <10.213.103.5> SSH: PlayContext set ssh_extra_args: ()
02:10:11 <10.213.103.5> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r)
02:10:11 <10.213.103.5> SSH: EXEC sshpass -d14 ssh -C -vvv -o ControlMaster=auto -o ControlPersist=120s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o User=root -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r -tt 10.213.103.5 'chmod +rx /root/.ansible/tmp/ansible-tmp-1564096188.49-246685053741557/remove_ip.sh'
02:10:11 <10.213.103.5> ESTABLISH SSH CONNECTION FOR USER: root
02:10:11 <10.213.103.5> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=120s)(-o)(UserKnownHostsFile=/dev/null)
02:10:11 <10.213.103.5> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
02:10:11 <10.213.103.5> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=root)
02:10:11 <10.213.103.5> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=30)
02:10:11 <10.213.103.5> SSH: PlayContext set ssh_common_args: ()
02:10:11 <10.213.103.5> SSH: PlayContext set ssh_extra_args: ()
02:10:11 <10.213.103.5> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r)
02:10:11 <10.213.103.5> SSH: EXEC sshpass -d14 ssh -C -vvv -o ControlMaster=auto -o ControlPersist=120s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o User=root -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r -tt 10.213.103.5 'LANG=C LC_ALL=C LC_MESSAGES=C /root/.ansible/tmp/ansible-tmp-1564096188.49-246685053741557/remove_ip.sh '
02:10:11 <10.213.103.5> ESTABLISH SSH CONNECTION FOR USER: root
02:10:11 <10.213.103.5> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=120s)(-o)(UserKnownHostsFile=/dev/null)
02:10:11 <10.213.103.5> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
02:10:11 <10.213.103.5> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=root)
02:10:11 <10.213.103.5> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=30)
02:10:11 <10.213.103.5> SSH: PlayContext set ssh_common_args: ()
02:10:11 <10.213.103.5> SSH: PlayContext set ssh_extra_args: ()
02:10:11 <10.213.103.5> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r)
02:10:11 <10.213.103.5> SSH: EXEC sshpass -d14 ssh -C -vvv -o ControlMaster=auto -o ControlPersist=120s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o User=root -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r -tt 10.213.103.5 'rm -f -r /root/.ansible/tmp/ansible-tmp-1564096188.49-246685053741557/ > /dev/null 2>&1'
02:10:11 changed: [r-anaconda-10-t0 -> 10.213.103.5] => {"changed": true, "invocation": {"module_args": {"_raw_params": "roles/test/files/helpers/remove_ip.sh"}, "module_name": "script"}, "rc": 0, "stderr": "OpenSSH_7.2p2 Ubuntu-4ubuntu2.4, OpenSSL 1.0.2g  1 Mar 2016\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: /root/.ssh/config line 1: Applying options for *\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 43070\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\nShared connection to 10.213.103.5 closed.\r\n", "stdout": "Option \"-br\" is unknown, try \"ip -help\".\r\n", "stdout_lines": ["Option \"-br\" is unknown, try \"ip -help\"."]}
02:10:11 
02:10:11 TASK [test : Set unique MACs to PTF interfaces] ********************************
02:10:11 task path: /builds2/jenkins2/workspace/SONiC_Test-FDB/ansible/roles/test/tasks/fdb.yml:15
...

Error message is "Option "-br" is unknown, try "ip -help". But test continues because return code of the script is 0.

The fix is to add "set -o pipefail" in shell scripts. And put the commands embedded in for loop statement to a separate statement. Then the script can exit early and return none zero exit code if the "ip" command failed.

Type of change

  • Bug fix
  • [] Testbed and Framework(new/improvement)
  • [] Test case(new/improvement)

Approach

How did you do it?

The fix is to add "set -o pipefail" in shell scripts. And
put the commands embedded in for loop statement
to a separate statement.

How did you verify/test it?

Tested on Mellanox platform.

Any platform specific information?

No

Supported testbed topology if it's a new test case?

NA

Documentation

In shell script, by default exit code of the last command
in commands connected by pipe is returned. If previous
command returned none zero exit code, the error is
not captured. If ansible run such script in playbook, the
script issue is ignored and ansible is not aware of it.

The fix is to add "set -o pipefail" in shell scripts. And
put the commands embedded in for loop statement
to a separate statement.

Signed-off-by: Xin Wang <xinw@mellanox.com>
@wangxin wangxin changed the title Improve exit code return in shell scripts Fix the exit code return issue in shell scripts Sep 11, 2019
Copy link
Copy Markdown
Contributor

@keboliu keboliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as comments

@liat-grozovik liat-grozovik merged commit ec67257 into sonic-net:master Sep 16, 2019
@yxieca
Copy link
Copy Markdown
Collaborator

yxieca commented Sep 17, 2019

@wangxin can you create a separate PR for 201811 branch or advise dependencies to be picked? This PR doesn't cherry-pick cleanly.

@wangxin
Copy link
Copy Markdown
Collaborator Author

wangxin commented Sep 17, 2019

@yxieca I have created a separate PR for 201811 branch: #1117

@yxieca
Copy link
Copy Markdown
Collaborator

yxieca commented Sep 17, 2019 via email

@wangxin wangxin deleted the capture-shell-err branch March 28, 2020 03:09
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…omatically (#24670)

#### Why I did it
src/sonic-swss-common
```
* 36f40a1 - (HEAD -> 202405, origin/202405) Automated agent pool migration for branch 202405 (sonic-net#1107) (21 hours ago) [yijingyan2]
```
#### How I did it
#### How to verify it
#### Description for the changelog
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
Revert "Revert " [201911]show interface counters for multi ASIC devices
(sonic-net#1104)""
 Revert "Revert "Pfcstat (sonic-net#1097)""
  [show] Fix 'show int neighbor expected' (sonic-net#1106)
   Update argument for docker exec it->i (sonic-net#1118)
     Update to make config load/reload backward compatible. (sonic-net#1115)
     Handling deletion of Port Channel before deletion of its members
     (sonic-net#1062)
    Skip default route present in ASIC-DB but not in APP-DB. (sonic-net#1107)
     [CLI][PFCWD][Multi-ASIC] Added multi ASIC support to 'pfcwd' CLI
     (sonic-net#1102)
       [201911]  Multi asic platform config interface portchannel, show
       transceiver  (sonic-net#1087)
       [drop counters] Fix configuration for counters with lowercase
       names (sonic-net#1103)

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants