Skip to content

PersistentTTLNode fails to remove node #1258

@chevaris

Description

@chevaris

PersistentTTLNode (Curator 5.8.0 and probably previous versions) has a corner case that prevents that ZNode is deleted when program running the recipe is stop in certain situations.

This is the sequence:

  • Start the recipe with TTL of 30secs -> Container Node is created
  • Stop the program (or the program crashes in production) that runs the recipe before Touch TTL node is created. This is NOT deterministic and basically a background thread is scheduled to run TTL/2 (by default). In worse case scenario de TTL node could take up to 15 secs in this example to be created

When this is happening the CONTAINER node is never deleted. One option is to increase the touchScheduleFactor, BUT still this solution looks not correct for me.

In my view the recipe should watch the Container Node itself and just when the node is created, the recipe could trigger TOUCH node creation to minimize the opportunity window in which the problem happens.

I attach a test case that shows the problem, and I include a fixed recipe that solves it.

chevaris@6da7725

Anyhow, no matter how fast the touch ZNode is added the race condition will be always there, and in my view this is a limitation on the strategy used for this recipe that should be documented.

Regards,

Cheva

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions