Skip to content

RPC execution stuck when NETCONF server closes session unexpectedly #914

@nitishv

Description

@nitishv

Expected Behavior

RPC invocation must exit with an error if the underlying NETCONF session is closed by the server.

Current Behavior

NETCONF session has been established using the NetconfServiceProvider, then an RPC is invoked which gets stuck when NETCONF server is killed or crashed.
The python script never exits, neither does it dump any error messages. The last log message seen is "============= Sending RPC to device ============="

Steps to Reproduce

In the test environment, NCA app container connects to NSO to modify a device config, which is managed by this NSO.
The NSO server is running on 192.168.123.11 port 2022

Steps to get into the test setup:

  1. ssh cw-admin@172.20.80.59 (unicast nvashish@cisco.com for password)
  2. Step into NCA container
    kubectl exec -it $(kubectl get pods|grep robot-nca|grep Running|cut -d' ' -f1) bash
  3. Run the attached script
  4. Kill the NSO NETCONF server when the script is sleeping.

Your Script

test_session_close.py.txt

#!/usr/bin/env python

from ydk.providers import NetconfServiceProvider
from ydk.path import Repository
from ydk.path import Codec
from ydk.types import EncodingFormat
import logging
from time import sleep

log = logging.getLogger('ydk')
log.setLevel(logging.DEBUG)
handler = logging.FileHandler('test_session_close.log')
formatter = logging.Formatter(("%(asctime)s - %(name)s - %(levelname)s - %(message)s"))
handler.setFormatter(formatter)
log.addHandler(handler)

router_ip = '192.168.123.11'
DEVICE_CONFIG_TMPL = '''<devices xmlns="http://tail-f.com/ns/ncs"><device><name>%s</name><config>%s</config></device></devices>'''
config_xml = '''<interface xmlns="http://tail-f.com/ned/cisco-ios-xr"><GigabitEthernet><id>0/0/0/1</id><description>test</description></GigabitEthernet></interface>'''
provider = NetconfServiceProvider(repo=Repository('/opt/robot/data/cache'), address='192.168.123.111', username='nso', password='cisco123', port=2022)

log.info("RUNNING start-transaction")
rpc = provider.get_session().get_root_schema().create_rpc("tailf-netconf-transactions:start-transaction")
rpc.get_input_node().create_datanode("target/running")
data = (rpc)(provider.get_session())

log.info("RUNNING edit-config")
rpc = provider.get_session().get_root_schema().create_rpc("ietf-netconf:edit-config")
rpc.get_input_node().create_datanode("target/running")
rpc.get_input_node().create_datanode("config", DEVICE_CONFIG_TMPL%(router_ip,config_xml))
data = (rpc)(provider.get_session())

# kill the NSO netconf server process when this process is sleeping
log.info("************************* SLEEP *************************")
sleep(5)


log.info("RUNNING prepare-transaction")
rpc = provider.get_session().get_root_schema().create_rpc("tailf-netconf-transactions:prepare-transaction")

# this RPC should get stuck
result = (rpc)(provider.get_session())


log.info("RUNNING commit-transaction")
rpc = provider.get_session().get_root_schema().create_rpc("tailf-netconf-transactions:commit-transaction")
result = (rpc)(provider.get_session())

Logs

test_session_close.log

System Information

Linux robot-nca-74f49fcd9-487rz 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Python 2.7.15rc1

ydk==0.7.2.post1
ydk-models-cisco-ios-xr==6.3.2
ydk-models-ietf==0.1.5
ydk-models-ned-ios==5.9.2
ydk-models-ned-ios-xr==6.6.1
ydk-models-openconfig==0.1.2
ydk-models-tailf==6.4.1

Reported by

Nitish Vashishtha (nvashish@cisco.com)
SPNAC / Crosswork / NCA

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions