Skip to content

Expose AllInputClosed message as a Stop message#1026

Merged
haixuanTao merged 5 commits intomainfrom
expose-allinputclosed
Jun 23, 2025
Merged

Expose AllInputClosed message as a Stop message#1026
haixuanTao merged 5 commits intomainfrom
expose-allinputclosed

Conversation

@haixuanTao
Copy link
Copy Markdown
Collaborator

@haixuanTao haixuanTao commented Jun 17, 2025

This PR makes it possible for nodes to stop when all input is closed. This is necessary because when merging two different event stream and we want to stop when all event from dora finishes we can rely on the stop messages to stop.

It also removes the complexity of the event stream code.

@haixuanTao haixuanTao force-pushed the expose-allinputclosed branch from 72e760d to b72680b Compare June 17, 2025 16:14
Copy link
Copy Markdown
Collaborator

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use a new event type for this rather than reusing the Stop event.

tracing::error!("{err:?}");
Event::Error(err.wrap_err("internal error").to_string())
}
NodeEvent::AllInputsClosed => Event::Stop,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making AllInputsClosed the same as Stop means that nodes can no longer find out what triggered the stop. So they can no longer handle 'ctrl-c' stops in a different way (and e.g. do some additional cleanup).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that it means to have additional code for all the nodes where most of the time it's probably going to do the same thing no?

I think also that it might be confusing that the stop event does not cover all stop event but only manual stop event.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes that don't care about the stop reason can just wait until the event stream is closed. The stop event currently signals a manual stop (ctrl-c or dora stop), so nodes that want to do something special in this case can react to it.

With merged event streams, the waiting for stream closure does no longer work, so we need some new way to handle this. But we shouldn't change the behavior for normal dora event streams.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

  • we rename the Stop event to ManualStop (we keep a deprecated alias to avoid breakage)
  • we forward the AllInputsClosed event
  • we add an additional Done event that is sent just before the event stream is closed (this might depend on the node config)

This way, nodes with merged event streams could listen for the Done event and break

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I will not be able to remember all this logic.

I would very much prefer to have a stop event that encapsulates all logic.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original intention for the Stop event was that it corresponds to dora stop calls. So if there is no dora stop invoked, no Stop event occurs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I, myself, never knew that it was meant to be only for manual stop, and always expected it as a stop ( catch all ) messages.

I think that it is very hard without context to understand what an event done mean, as it could mean a lot of different things. Knowing that we will have Done and Stop which is very much confusing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened this PR exactly because I thought that stop -> break as implemented within openai-proxy-server did not work.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with any new name (e.g. Done, Finished, EventStreamClosed, EndOfInput, EndOfEvents), but changing the behavior of the existing Stop event seems like unnecessary breakage. If anyone was relying on this behavior, their code wouldn't work anymore after updating dora and they wouldn't even get any error or other signal.

We can also rename the Stop event if you find it too confusing, or even deprecate and remove it if it's not needed.


For example, consider a node that writes all incoming images to disk. The disk might be slow, so it uses an internal buffer. If the dataflow is stopped early, it assumes that you're no longer interested in the remaining images and clears the buffer without writing it out (for faster abort, to avoid getting killed after the grace duration).

After updating dora, this example node would always silently discard the last few images, even when no dora stop is used.

Copy link
Copy Markdown
Collaborator Author

@haixuanTao haixuanTao Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually way more worried about the guy like me who would have believe that stop is a Catch all and that assuming that I can count on it to stop is enough.

Instead, I have to add conditions for every type of stoppin and remember additional keywords... I really don't see why I have to remember to deal with both Stop AND EndOfInput AND any future event to just be able to simply deal with stopping the dataflow.

I don't think people implement Stop handler in the expectation that it is only going to be used for Ctrl-C messages, and that they assume that any Stop message can happen and that if there is data that should be kept they don't have a process after the event loop to do so.

@haixuanTao haixuanTao force-pushed the expose-allinputclosed branch from 995d6b9 to c64d664 Compare June 18, 2025 15:31
@phil-opp
Copy link
Copy Markdown
Collaborator

Meeting summary:

We didn't reach a clear consensus on this, but we all agreed that the new behavior should be as simple and intuitive as possible as possible. And that we need to provide clear documentation for the behavior.

Arguments made:

  • multiple different stop events can be confusing
    • on the other hand, users might be interested in the cause of the stop event
  • special handling of manual stop events (triggered by ctrl-c or dora stop) might result in code that is seldomly tested
    • it's better to make node stop behavior always the same, indedependent on whether the stop was manual or automatic
  • there is an open PR (Fix: Allow source nodes to wait for stop signal (Fixes #500) #992) that is related to this PR, as the stop event would mean something different after this PR
  • it might be easier to treat source nodes in a special way, rather than further complicate the YAML file with more boolean flags
  • we are still a young project, so breakage might be ok if we communicate it quickly
  • on the other hand, breakage could be easily avoided if we just choose a different name for the new event
  • the Stop event name might be also confusing because users might assume that it's related to dora stop
    • but it will also occur after all source nodes have exited with this PR

@phil-opp
Copy link
Copy Markdown
Collaborator

phil-opp commented Jun 18, 2025

New Proposal

  • Don't send AllInputsClosed to source nodes
  • Add a cause field to the Stop event
    • values can be ManualStopCommand and AllInputsClosed
    • we mark the cause field as #[non_exhaustive] to allow adding more stop types in the future (e.g. in case we want to add some kind of node migration in the future)
    • users interested in the exact cause can look at the cause field, all others can just ignore it
    • the event stream will close after any Stop command, independent of the cause
    • the new cause field will lead to a compile error in exsting Rust nodes, which makes the breaking behavior change explicit -> this is better than changing the behavior silently
    • (if desired in the future, we could add some kind of dora_ignore_stop_event function to the dora API to leave the event stream open)

@haixuanTao How does this sound to you?

@haixuanTao
Copy link
Copy Markdown
Collaborator Author

haixuanTao commented Jun 21, 2025

New Proposal

  • Don't send AllInputsClosed to source nodes

  • Add a cause field to the Stop event

    • values can be ManualStopCommand and AllInputsClosed
    • we mark the cause field as #[non_exhaustive] to allow adding more stop types in the future (e.g. in case we want to add some kind of node migration in the future)
    • users interested in the exact cause can look at the cause field, all others can just ignore it
    • the event stream will close after any Stop command, independent of the cause
    • the new cause field will lead to a compile error in exsting Rust nodes, which makes the breaking behavior change explicit -> this is better than changing the behavior silently
    • (if desired in the future, we could add some kind of dora_ignore_stop_event function to the dora API to leave the event stream open)

@haixuanTao How does this sound to you?

I'm ok with adding a string field in stop to be able to match the cause.

I would largely prefer if we could reuse some field like value so that people don't have to learn a new terminology on matching stop.

For source node, I think if the choice is between sending nothing and being empty from the beginning or having only one message which is a stop message. I would rather have only one message which is a stop message with cause no input for example as it's easier to debug. So I would not want to make it an exemption node.

On the other note, if it is possible to not close the event stream for source node in the beginning that would be great but it does not fit this PR in my opinion.

@phil-opp
Copy link
Copy Markdown
Collaborator

I would largely prefer if we could reuse some field like value so that people don't have to learn a new terminology on matching stop.

I'm not sure what you mean with reusing the value field? Which value field do you mean?

Right now our event enum looks like this: https://docs.rs/dora-node-api/latest/dora_node_api/enum.Event.html
Some events have additional fields, others don't. The field names are different for each variant. None of the fields are named value.

I don't really care about the name of the field, but it should be something that is understandable without reading the docs in my opinion. Something like cause or reason seems more intuitive than value to me.

@haixuanTao
Copy link
Copy Markdown
Collaborator Author

I have added a Cause enum with non_exhaustive pattern matching the root cause of the stopping.

Yeah sorry wasn't super clear. I was talking about how to expose it in python and what i have done is I have put it in the id key, value so that people can query the key value on it.

@phil-opp
Copy link
Copy Markdown
Collaborator

Yeah sorry wasn't super clear. I was talking about how to expose it in python and what i have done is I have put it in the id key, value so that people can query the key value on it.

Ah, sounds good to me!

@haixuanTao haixuanTao force-pushed the expose-allinputclosed branch from a5e7f1a to bb12249 Compare June 23, 2025 09:20
@haixuanTao haixuanTao force-pushed the expose-allinputclosed branch from bb12249 to c79251b Compare June 23, 2025 09:50
@haixuanTao haixuanTao force-pushed the expose-allinputclosed branch from c79251b to d155ee7 Compare June 23, 2025 11:00
@haixuanTao haixuanTao merged commit 561a319 into main Jun 23, 2025
149 checks passed
@haixuanTao haixuanTao deleted the expose-allinputclosed branch June 23, 2025 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants