Skip to content

Phil/new pub conflicts#1623

Merged
psFried merged 6 commits intomasterfrom
phil/new-pub-conflicts
Sep 16, 2024
Merged

Phil/new pub conflicts#1623
psFried merged 6 commits intomasterfrom
phil/new-pub-conflicts

Conversation

@psFried
Copy link
Copy Markdown
Contributor

@psFried psFried commented Sep 10, 2024

Description:

Introduce "touch" publications, and use them to update specs in response to changes in their dependencies.

This implements a portion of the work that's required for #1520. The remainder of that work will come in subsequent PRs.

A "touch" is a new way to publish a spec in cases where the model itself is unchanged, and only the built_spec needs to be updated. Unlike a normal publication, a touch does not modify the last_pub_id or the spec, and no publication_specs are created. This means that touches do not appear in the publication history in the UI. Only the built_spec and last_build_id are modified, in order to embed the built_specs of dependencies.

Controllers are updated to use a touch operation when responding to updates of their dependencies. For example, a materialization controller will use a touch to update the materialization built_spec in response to a change the last_pub_id of any of it's source collections or sourceCapture.

In order to determine whether dependencies have changed, a new dependency_hash column has been added. This is needed because the mechanism we used previously could miss updates to dependencies in some cases. The hash is computed during validation by hashing the (catalog_name, last_pub_id) of each dependency of the spec. Note that this is based on last_pub_id and not last_build_id. This means that a touch of a dependency (e.g. collection) does not change the dependency_hash of any dependent specs (e.g. a materialization).1

Another benefit of touch publications to reduce the noise in the publication history. We'll no longer clog up the _/details/history UI with a bunch of in response to changes in one or more dependencies publications. The controller status does still record touch publications, though, as they can be important for debugging various scenarios. In order to cut down on the noise in the status history, consecutive touch publications are now folded into a single entry in the history as long as they have the same result. The count field tracks the number of publications in each entry, and the detail and completed fields always reflect that of the most recent publication.

Example materialization controller status showing is_touch publications being collapsed in the history.

{
  "type": "Materialization",
  "source_capture": {
    "up_to_date": true
  },
  "publications": {
    "max_observed_pub_id": "0ead0b8e1a0003dc",
    "history": [
      {
        "id": "0ead0a8e0c0003dc",
        "created": "2024-09-11T14:36:24.472050767Z",
        "completed": "2024-09-11T14:38:36.787707336Z",
        "detail": "",
        "result": {
          "type": "success"
        },
        "is_touch": true,
        "count": 6
      },
      {
        "id": "0ead082a130003dc",
        "created": "2024-09-11T14:31:11.142596754Z",
        "completed": "2024-09-11T14:31:12.461721809Z",
        "detail": "adding binding(s) to match the sourceCapture: [phil/a/wd-2]",
        "result": {
          "type": "success"
        },
        "is_touch": false
      },
      {
        "id": "0ead04d71b0003dc",
        "created": "2024-09-11T14:23:55.446274518Z",
        "completed": "2024-09-11T14:30:56.615646164Z",
        "detail": "",
        "result": {
          "type": "success"
        },
        "is_touch": true,
        "count": 7
      }
    ]
  },
  "activation": {
    "last_activated": "0ead0b53890003dc"
  }
}

Workflow steps:

I've been testing using a webhook capture and a shell script ingest documents with new properties in a loop. This generates synthetic load by causing the inferred schema to be updated repeatedly. While that's running, I've been able to publish the materialization via the UI, which previously would have failed due to the last_pub_id being updated. Publishing the Capture via the UI will still fail with an ExpectPubIdNotMatched error because the last_pub_id of the Collection is still updated whenever the inferred schema is updated. A future PR will enable us to stop setting expect_pub_id for collections, which should mitigate these errors.

I tested the upgrade path locally, and didn't find any issues with the normal process. Just apply the migration and then start deploying the new agents. The new agent processes will set controller_version = 2, which will prevent the old agents from clobbering their changes.

Documentation links affected: n/a

Notes for reviewers:


This change is Reviewable

Footnotes

  1. An early apparent complication to this was storage mappings. There's no explicit dependency between live specs and their storage mappings, and we always use the latest mapping whenever we publish a given collection, and so we might worry that storage mappings could be updated and fail to propagate through to the built_specs of dependent captures and materializations. This turned out to be a non-issue because we create a new (regular, not touch) publication of all specs whenever a storage mapping is updated, which increases the last_pub_id of all affected collections. Thus all dependent tasks will respond with a touch that updates their own built_spec to include the updated storage mappings.

Adds a `PartialEq` implementation for all models. This sets the stage
for future work in publications and validation, which needs to determine
whether models have changed.
…ames

Updates `tables::DraftCatalog::all_catalog_names` to include
materialization `sourceCapture`s. This is required in order to correctly
track spec dependencies during validation, which will be introduced in a
subsequent commit. It also makes things consistent, as now a
`sourceCapture` is always considered a dependency of a materialization,
whereas before it depended on the situation.
@psFried psFried force-pushed the phil/new-pub-conflicts branch from 8df5d9d to 22d0f96 Compare September 10, 2024 23:15
Lays some groundwork for "touch" publications and computing dependency
hashes. These changes need to be paired together, becuase the use of
touch publications is incompatible with our current approach to
dependency change detection.

**Touch operations:**

Are defined as publications where the draft row sets `is_touch: true`
and has a model that is identical to the live model. A touch does not
update the `last_pub_id`, though the agent changes for implementing that
will come in a subsequent commit. The intent is to use a touch in order
to update the `built_spec` in response to changes in dependencies.

**Dependency hash:**

A dependency hash is an opaque value that is computed from the names and
`last_pub_id`s of every spec that a given spec depends on. The concept
of dependencies was introduced along with controllers, but the
implementation of change detection was not fully correct. Using a hash
guarantees (% the infinitesimal chance of hash collision) correctnes of
change detection. This introduces the computation of the hash as part of
validation.

**Other changes:**

Threads through `last_build_id`, and lays some of the groundwork (in
`validation::walk_transition`) for using `last_build_id` instead of
`last_pub_id` for optimistic locking.

Rename `BuiltSpec::is_unchanged` to `is_passthrough`, since the
meaning of `is_unchanged` became unclear with the introduction of touch
operations, which explicitly don't change the model.
@psFried psFried force-pushed the phil/new-pub-conflicts branch 2 times, most recently from e7ebcf0 to fa9f9b0 Compare September 11, 2024 11:30
Updates the agent to support `is_touch` in publications and the
`dependency_hash` in publications and controllers.

For specs published with `is_touch=true`, the agent will _not_ add new
`publication_specs`, update the `last_pub_id`, or update the `spec`
(the drafted spec must be identical).

The dependency hash column is added, and persisted as part of committing
a publication. Controllers compute the current hash value whenever they
run, and compare it to the persisted value in order to determine whether
any dependencies have changed.

When controllers detect a change in dependencies, their behavior
generally depends on whether any dependencies have been deleted or not.
As long as no dependencies were deleted, then controllers will publish
their spec using `is_touch = true`. If any dependencies have been
deleted, then task controllers will use a normal publication and update
the model to remove the deleted dependencies, which matches the previous
behavior.

Controllers will attempt to collapse multiple subsequent touch
publications into their publication histories in an attempt to limit the
noise of many publications in response to dependency changes.

The `ActivationStatus` has been updated to use `last_build_id` instead
of `last_pub_id` as the high water mark for activations. The previous
behavior was never fully correct, but `last_build_id` must increase
with every modification of the `built_spec`, so this new behavior is
guaranteed not to miss activating any changes.

Going along with the change to `ActivationStatus`, the
`CONTROLLER_VERSION` constant was incremented in order to prevent old
agent versions from clobbering changes to the activation status during
agent rollout.
@psFried psFried force-pushed the phil/new-pub-conflicts branch from fa9f9b0 to e6f1a59 Compare September 11, 2024 12:20
@psFried psFried marked this pull request as ready for review September 11, 2024 16:44
Updates controllers to set the `detail` for touch publications, to make
debugging easier. Also cleans up a few comments, and skip serializing
`is_touch` in publication histories when it's `false`.
Copy link
Copy Markdown
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This looks great. I have quibbles below about whether dependency_hash is a String or a u64, and its Option-ality, but this looks correct & solid to me.

// Built specification of this capture as-of `last_pub_id`.
val spec: proto_flow::flow::CaptureSpec,
// Hash of the last_pub_ids of all the dependencies that were used to build the capture
val dependency_hash: Option<String>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a u64 digest instead of String?

Also, why is this Option? Under the control plane data-model, it seems a dependency digest is a property of every live spec that must exist.

(Yes, there's probably a migration involved, but initializing with a digest of String("") or u64(0) would work, right? As in it might update once due to "mismatch" and then be good thereafter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking for using a string instead of a u64 is that it's more obviously something that should be treated as opaque.

The Option just seemed convenient as a way to represent a spec that has no dependencies. For example, a collection, or a capture with no bindings.

I don't feel especially strongly about either of those things, and could get behind a u64, where a 0 means it has no dependencies. It's convenient to have a simple constant representing "no dependencies", though, because for example when we validate collection specs, we know that they can't have dependencies and so don't consult the Dependencies::compute_hash function at all. That sound good to you?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Option just seemed convenient as a way to represent a spec that has no dependencies. For example, a collection, or a capture with no bindings.

I suppose I don't see "zero bindings" as different from "one or more bindings". It's all the same continuum, an empty set of bindings can be represented in the same hash space as non-empty ones, and thus why jump through the extra Some-check hoops to model the distinction. Or, if it remains a string, an empty string could also serve as the current None.

I've said my piece, though, and it's so minor I'm okay any way you want to roll, including merging as-as.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. I'm inclined to just merge as-is, though, as it doesn't seem like we'll actually regret the current representation.

// Expected last build ID for optimistic concurrency.
val expect_build_id: models::Id,
// Hash of the last_pub_ids of all the dependencies that were used to build the capture
val dependency_hash: Option<String>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this doesn't seem like it should be Option.

I suppose an argument would be when representing deleted specs (spec is None). Still, I think I'd expect the value to be digest([ /* empty slice of (name, last_pub_id) */])

anyhow::bail!("missing spec for expanded row: {:?}", exp.catalog_name);
};
let scope = tables::synthetic_scope(spec_type, &exp.catalog_name);
// TODO(phil): currently we set `is_touch: false`, which is consistent with the prior
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, i believe these should be touches too

@psFried psFried added control-plane change:planned This is a planned change labels Sep 16, 2024
Use consistent field ordering in `tables` and make `is_touch` part of
the return value of `into_parts`.
@psFried psFried force-pushed the phil/new-pub-conflicts branch from 03d9928 to b26c774 Compare September 16, 2024 15:22
@psFried psFried merged commit 15c7c3a into master Sep 16, 2024
@psFried psFried deleted the phil/new-pub-conflicts branch September 16, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:planned This is a planned change control-plane

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants