Conversation
The more common spelling is cloneable, but we can't do anything about the crate name.
We still want dataflows to be able to exit normally, so nodes should not restart after manual stop commands or when all the node's inputs were already closed.
Starting a dataflow involves creating timer tasks etc, so we only want to do it once.
|
One of the breaking change that could be nice is to restart node that failed before starting the dataflow. As it is quite common with power cycle or occasional networking issue. We could like try 3 times to respawn the nodes when it fails before giving up. @oortlieb pointed this issue out |
|
One idea that popped up in our meeting was to reuse the
Reusing this event to also notify nodes about restarts that already happened is not a good idea imo. |
So this sounds like errors that happen after the node is spawned, but before initializing the Dora node API? If so, they should also be restarted as part of this PR if they have a restart policy set. Or were you talking about failures to spawn, e.g. because the executable doesn't exist? I can also implement retries in that case, but I'm not sure if that is really something that is fixable by a retry. |
Yes exactly
No errors like, the executable starts but something fail due to some power cycle or usb error ( fairly common with multiple USB ) and so the dataflow is not yet started |
|
Failures after start are all treated the same by this PR, no matter if the dataflow was started already or not. Could you try whether it fixes your issues? |
|
I see!
Just tried and I think it's awesome! I don't see the difference between on-failure and always and I guess it's something to do with like node failing before and like automatic stop right? On the naming this is docker naming: See: https://github.com/compose-spec/compose-spec/blob/main/deploy.md#restart_policy Could be neat to copy them. :) |
|
Actually after further documentation, I think the current policy follows more closely Kubernetes and systemctl which make sense so we can keep it as is! |
|
Could be nice to have some better error logging as on my test, we don't see the error log spawning before restarting but it's probably WIP |
|
Very excited to merge this PR as we will be able to restart working on "reloading" and "hot-reloading" but this time for custom node and graphs |
The difference is the exit code. |
Could you give some more details on that? The error messages and the node failure should be reported as usual, so it should be visible in the logs. |
|
In the following example: ~/D/w/d/e/python-log ❯❯❯ dora run dataflow.yaml --uv ✘ 1 restart-failed-nodes ✭ ✱ ◼
2025-12-30T14:14:03.252219Z INFO dora_core::descriptor::validate: skipping path check for node with build command
2025-12-30T14:14:03.252245Z INFO dora_core::descriptor::validate: skipping path check for node with build command
2025-12-30T14:14:03.252408Z INFO zenoh::net::runtime: Using ZID: c53c0f992598e35924d9088a4eb38716
2025-12-30T14:14:03.253212Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[2a01:cb08:67:900:1806:329:6bcf:5ea]:63504
2025-12-30T14:14:03.253222Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[2a01:cb08:67:900:c47:12a2:8dd0:8d00]:63504
2025-12-30T14:14:03.253224Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::1]:63504
2025-12-30T14:14:03.253225Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::1c95:bf60:48f8:e4ca]:63504
2025-12-30T14:14:03.253227Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::270f:606e:f50e:5599]:63504
2025-12-30T14:14:03.253228Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::fe6e:ef27:c7ff:573c]:63504
2025-12-30T14:14:03.253229Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::ce81:b1c:bd2c:69e]:63504
2025-12-30T14:14:03.253261Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::102b:5c43:ecda:af33]:63504
2025-12-30T14:14:03.253271Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::3412:6aff:fef8:619b]:63504
2025-12-30T14:14:03.253272Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::3412:6aff:fef8:619b]:63504
2025-12-30T14:14:03.253274Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::e293:2bbc:9833:63f2]:63504
2025-12-30T14:14:03.253276Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::5208:6a08:1aa0:e35d]:63504
2025-12-30T14:14:03.253277Z INFO zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/192.168.1.28:63504
2025-12-30T14:14:03.253354Z INFO zenoh::net::runtime::orchestrator: zenohd listening scout messages on 224.0.0.224:7446
15:14:03 DEBUG receive_data_with_sleep: daemon::spawner spawning node
15:14:03 DEBUG send_data: daemon::spawner spawning node
15:14:03 INFO receive_data_with_sleep: spawner spawning: uv run python -u /Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py
15:14:03 INFO send_data: spawner spawning: uv run python -u /Users/xaviertao/Documents/work/dora/examples/python-log/send_data.py
15:14:03 INFO dora daemon finished building nodes, spawning...
15:14:03 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:03 DEBUG receive_data_with_sleep: spawner spawned node with pid 61459
15:14:03 INFO send_data: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:03 DEBUG send_data: spawner spawned node with pid 61462
15:14:03 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:03 INFO receive_data_with_sleep: daemon node is ready
15:14:03 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:03 INFO send_data: daemon node is ready
15:14:03 INFO daemon all nodes are ready, starting dataflow
15:14:03 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:03 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:03 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:03 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:03 stdout receive_data_with_sleep: main()
15:14:03 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:03 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:03 stdout receive_data_with_sleep: ^^^^^
15:14:03 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:03 stdout receive_data_with_sleep:
15:14:03 stdout receive_data_with_sleep:
15:14:03 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:03 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:03 DEBUG receive_data_with_sleep: spawner spawned node with pid 61467
15:14:03 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:03 INFO receive_data_with_sleep: daemon node is ready
15:14:03 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:03 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:03 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:03 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:03 stdout receive_data_with_sleep: main()
15:14:03 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:03 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:03 stdout receive_data_with_sleep: ^^^^^
15:14:03 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:03 stdout receive_data_with_sleep:
15:14:03 stdout receive_data_with_sleep:
15:14:03 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:03 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:03 DEBUG receive_data_with_sleep: spawner spawned node with pid 61471
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:04 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:04 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:04 DEBUG receive_data_with_sleep: spawner spawned node with pid 61475
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:04 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:04 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:04 DEBUG receive_data_with_sleep: spawner spawned node with pid 61479
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:04 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:04 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:04 DEBUG receive_data_with_sleep: spawner spawned node with pid 61485
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:04 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:04 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:04 DEBUG receive_data_with_sleep: spawner spawned node with pid 61489
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:04 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:04 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:04 DEBUG receive_data_with_sleep: spawner spawned node with pid 61493
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:04 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:04 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:04 DEBUG receive_data_with_sleep: spawner spawned node with pid 61497
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:04 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:04 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:04 DEBUG receive_data_with_sleep: spawner spawned node with pid 61501
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:04 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:04 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:04 DEBUG receive_data_with_sleep: spawner spawned node with pid 61505
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 DEBUG receive_data_with_sleep: daemon skipping CloseOutputs because node might restart
15:14:04 DEBUG receive_data_with_sleep: daemon keeping outputs open because node might restart
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 WARN receive_data_with_sleep: daemon restarting node after failure
15:14:04 INFO receive_data_with_sleep: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
15:14:04 DEBUG receive_data_with_sleep: spawner spawned node with pid 61509
15:14:04 INFO opentelemetry Global meter provider is set. Meters can now be created using global::meter() or global::meter_with_scope().
15:14:04 INFO receive_data_with_sleep: daemon node is ready
15:14:04 stdout send_data:
15:14:04 stdout send_data:
15:14:04 DEBUG send_data: daemon handling node stop with exit status Success
15:14:04 INFO send_data: daemon send_data finished successfully
15:14:04 stdout receive_data_with_sleep: Traceback (most recent call last):
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
15:14:04 stdout receive_data_with_sleep: main()
15:14:04 stdout receive_data_with_sleep: File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
15:14:04 stdout receive_data_with_sleep: assert False, "This is an assertion error"
15:14:04 stdout receive_data_with_sleep: ^^^^^
15:14:04 stdout receive_data_with_sleep: AssertionError: This is an assertion error
15:14:04 stdout receive_data_with_sleep:
15:14:04 stdout receive_data_with_sleep:
15:14:04 INFO receive_data_with_sleep: daemon not restarting node because all inputs are already closed
15:14:04 DEBUG receive_data_with_sleep: daemon handling node stop with exit status ExitCode(1)
15:14:04 ERROR receive_data_with_sleep: daemon exited with code 1 with stderr output:
---------------------------------------------------------------------------------
[...]AssertionError: This is an assertion error
Traceback (most recent call last):
File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
main()
File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
assert False, "This is an assertion error"
^^^^^
AssertionError: This is an assertion error
---------------------------------------------------------------------------------
15:14:04 INFO daemon dataflow finished on machine `01d955ea-4e1b-4391-830a-03d08f5f1081`
2025-12-30T14:14:04.902506Z INFO run_inner: dora_daemon: exiting daemon because all required dataflows are finished self.daemon_id=DaemonId { machine_id: None, uuid: 01d955ea-4e1b-4391-830a-03d08f5f1081 }
2025-12-30T14:14:04.902538Z INFO run_inner: zenoh::api::session: close session zid=c53c0f992598e35924d9088a4eb38716 self.daemon_id=DaemonId { machine_id: None, uuid: 01d955ea-4e1b-4391-830a-03d08f5f1081 }
[ERROR]
Dataflow failed:
Node `receive_data_with_sleep` failed: exited with code 1 with stderr output:
---------------------------------------------------------------------------------
[...]AssertionError: This is an assertion error
Traceback (most recent call last):
File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 23, in <module>
main()
File "/Users/xaviertao/Documents/work/dora/examples/python-log/receive_data.py", line 10, in main
assert False, "This is an assertion error"
^^^^^
AssertionError: This is an assertion error
---------------------------------------------------------------------------------
Location:
binaries/cli/src/common.rs:33:17I think the error message only appeared once when I would have expected the daemon to raise it each time the node failed. |
|
I can double check why |
We still want to the the errors in the logs
|
Thanks for clarifying! I pushed 724dc7d to log the node output as before, i.e. print the node error to the logs even if it's going to be restarted. |
|
That's great thanks! |
|
I think as a follow up PR we could try to have regex to detect python, rust panic or rust eyre error and format them in a way that is easy to debug. We could also then avoid to double log stderr: 17:49:49 stdout send_data: Traceback (most recent call last):
17:49:49 DEBUG send_data: daemon skipping CloseOutputs because node might restart
17:49:49 DEBUG send_data: daemon keeping outputs open because node might restart
17:49:49 WARN receive_data_with_sleep: dora THIS IS A WARNING
17:49:49 stdout send_data: File "/Users/xaviertao/Documents/work/dora/examples/python-log/send_data.py", line 23, in <module>
17:49:49 stdout send_data: assert False
17:49:49 stdout send_data: ^^^^^
17:49:49 stdout send_data: AssertionError
17:49:49 stdout send_data:
17:49:49 stdout send_data:
17:49:49 WARN send_data: daemon restarting node after failure
17:49:49 DEBUG send_data: daemon handling node stop with exit status ExitCode(1) (restart: true)
17:49:49 INFO send_data: spawner spawning `uv` in `/Users/xaviertao/Documents/work/dora/examples/python-log`
17:49:49 ERROR send_data: daemon exited with code 1 with stderr output:
---------------------------------------------------------------------------------
Sent data: 30304092390791
Traceback (most recent call last):
File "/Users/xaviertao/Documents/work/dora/examples/python-log/send_data.py", line 23, in <module>
assert False
^^^^^
AssertionError
--------------------------------------------------------------------------------- |
PreparedNodeclonable and prepare for node restartingProposed in https://github.com/orgs/dora-rs/discussions/1181