Phil/new pub conflicts by psFried · Pull Request #1623 · estuary/flow

psFried · 2024-09-10T22:03:55Z

Description:

Introduce "touch" publications, and use them to update specs in response to changes in their dependencies.

This implements a portion of the work that's required for #1520. The remainder of that work will come in subsequent PRs.

A "touch" is a new way to publish a spec in cases where the model itself is unchanged, and only the built_spec needs to be updated. Unlike a normal publication, a touch does not modify the last_pub_id or the spec, and no publication_specs are created. This means that touches do not appear in the publication history in the UI. Only the built_spec and last_build_id are modified, in order to embed the built_specs of dependencies.

Controllers are updated to use a touch operation when responding to updates of their dependencies. For example, a materialization controller will use a touch to update the materialization built_spec in response to a change the last_pub_id of any of it's source collections or sourceCapture.

In order to determine whether dependencies have changed, a new dependency_hash column has been added. This is needed because the mechanism we used previously could miss updates to dependencies in some cases. The hash is computed during validation by hashing the (catalog_name, last_pub_id) of each dependency of the spec. Note that this is based on last_pub_id and not last_build_id. This means that a touch of a dependency (e.g. collection) does not change the dependency_hash of any dependent specs (e.g. a materialization).¹

Another benefit of touch publications to reduce the noise in the publication history. We'll no longer clog up the _/details/history UI with a bunch of in response to changes in one or more dependencies publications. The controller status does still record touch publications, though, as they can be important for debugging various scenarios. In order to cut down on the noise in the status history, consecutive touch publications are now folded into a single entry in the history as long as they have the same result. The count field tracks the number of publications in each entry, and the detail and completed fields always reflect that of the most recent publication.

Example materialization controller status showing is_touch publications being collapsed in the history.

{
  "type": "Materialization",
  "source_capture": {
    "up_to_date": true
  },
  "publications": {
    "max_observed_pub_id": "0ead0b8e1a0003dc",
    "history": [
      {
        "id": "0ead0a8e0c0003dc",
        "created": "2024-09-11T14:36:24.472050767Z",
        "completed": "2024-09-11T14:38:36.787707336Z",
        "detail": "",
        "result": {
          "type": "success"
        },
        "is_touch": true,
        "count": 6
      },
      {
        "id": "0ead082a130003dc",
        "created": "2024-09-11T14:31:11.142596754Z",
        "completed": "2024-09-11T14:31:12.461721809Z",
        "detail": "adding binding(s) to match the sourceCapture: [phil/a/wd-2]",
        "result": {
          "type": "success"
        },
        "is_touch": false
      },
      {
        "id": "0ead04d71b0003dc",
        "created": "2024-09-11T14:23:55.446274518Z",
        "completed": "2024-09-11T14:30:56.615646164Z",
        "detail": "",
        "result": {
          "type": "success"
        },
        "is_touch": true,
        "count": 7
      }
    ]
  },
  "activation": {
    "last_activated": "0ead0b53890003dc"
  }
}

Workflow steps:

I've been testing using a webhook capture and a shell script ingest documents with new properties in a loop. This generates synthetic load by causing the inferred schema to be updated repeatedly. While that's running, I've been able to publish the materialization via the UI, which previously would have failed due to the last_pub_id being updated. Publishing the Capture via the UI will still fail with an ExpectPubIdNotMatched error because the last_pub_id of the Collection is still updated whenever the inferred schema is updated. A future PR will enable us to stop setting expect_pub_id for collections, which should mitigate these errors.

I tested the upgrade path locally, and didn't find any issues with the normal process. Just apply the migration and then start deploying the new agents. The new agent processes will set controller_version = 2, which will prevent the old agents from clobbering their changes.

Documentation links affected: n/a

Notes for reviewers:

This change is

An early apparent complication to this was storage mappings. There's no explicit dependency between live specs and their storage mappings, and we always use the latest mapping whenever we publish a given collection, and so we might worry that storage mappings could be updated and fail to propagate through to the built_specs of dependent captures and materializations. This turned out to be a non-issue because we create a new (regular, not touch) publication of all specs whenever a storage mapping is updated, which increases the last_pub_id of all affected collections. Thus all dependent tasks will respond with a touch that updates their own built_spec to include the updated storage mappings. ↩

Adds a `PartialEq` implementation for all models. This sets the stage for future work in publications and validation, which needs to determine whether models have changed.

…ames Updates `tables::DraftCatalog::all_catalog_names` to include materialization `sourceCapture`s. This is required in order to correctly track spec dependencies during validation, which will be introduced in a subsequent commit. It also makes things consistent, as now a `sourceCapture` is always considered a dependency of a materialization, whereas before it depended on the situation.

Lays some groundwork for "touch" publications and computing dependency hashes. These changes need to be paired together, becuase the use of touch publications is incompatible with our current approach to dependency change detection. **Touch operations:** Are defined as publications where the draft row sets `is_touch: true` and has a model that is identical to the live model. A touch does not update the `last_pub_id`, though the agent changes for implementing that will come in a subsequent commit. The intent is to use a touch in order to update the `built_spec` in response to changes in dependencies. **Dependency hash:** A dependency hash is an opaque value that is computed from the names and `last_pub_id`s of every spec that a given spec depends on. The concept of dependencies was introduced along with controllers, but the implementation of change detection was not fully correct. Using a hash guarantees (% the infinitesimal chance of hash collision) correctnes of change detection. This introduces the computation of the hash as part of validation. **Other changes:** Threads through `last_build_id`, and lays some of the groundwork (in `validation::walk_transition`) for using `last_build_id` instead of `last_pub_id` for optimistic locking. Rename `BuiltSpec::is_unchanged` to `is_passthrough`, since the meaning of `is_unchanged` became unclear with the introduction of touch operations, which explicitly don't change the model.

Updates the agent to support `is_touch` in publications and the `dependency_hash` in publications and controllers. For specs published with `is_touch=true`, the agent will _not_ add new `publication_specs`, update the `last_pub_id`, or update the `spec` (the drafted spec must be identical). The dependency hash column is added, and persisted as part of committing a publication. Controllers compute the current hash value whenever they run, and compare it to the persisted value in order to determine whether any dependencies have changed. When controllers detect a change in dependencies, their behavior generally depends on whether any dependencies have been deleted or not. As long as no dependencies were deleted, then controllers will publish their spec using `is_touch = true`. If any dependencies have been deleted, then task controllers will use a normal publication and update the model to remove the deleted dependencies, which matches the previous behavior. Controllers will attempt to collapse multiple subsequent touch publications into their publication histories in an attempt to limit the noise of many publications in response to dependency changes. The `ActivationStatus` has been updated to use `last_build_id` instead of `last_pub_id` as the high water mark for activations. The previous behavior was never fully correct, but `last_build_id` must increase with every modification of the `built_spec`, so this new behavior is guaranteed not to miss activating any changes. Going along with the change to `ActivationStatus`, the `CONTROLLER_VERSION` constant was incremented in order to prevent old agent versions from clobbering changes to the activation status during agent rollout.

Updates controllers to set the `detail` for touch publications, to make debugging easier. Also cleans up a few comments, and skip serializing `is_touch` in publication histories when it's `false`.

jgraettinger

LGTM! This looks great. I have quibbles below about whether dependency_hash is a String or a u64, and its Option-ality, but this looks correct & solid to me.

crates/sources/src/merge.rs

jgraettinger · 2024-09-12T19:56:43Z

crates/tables/src/lib.rs

        // Built specification of this capture as-of `last_pub_id`.
        val spec: proto_flow::flow::CaptureSpec,
+        // Hash of the last_pub_ids of all the dependencies that were used to build the capture
+        val dependency_hash: Option<String>,


What about a u64 digest instead of String?

Also, why is this Option? Under the control plane data-model, it seems a dependency digest is a property of every live spec that must exist.

(Yes, there's probably a migration involved, but initializing with a digest of String("") or u64(0) would work, right? As in it might update once due to "mismatch" and then be good thereafter?

My thinking for using a string instead of a u64 is that it's more obviously something that should be treated as opaque.

The Option just seemed convenient as a way to represent a spec that has no dependencies. For example, a collection, or a capture with no bindings.

I don't feel especially strongly about either of those things, and could get behind a u64, where a 0 means it has no dependencies. It's convenient to have a simple constant representing "no dependencies", though, because for example when we validate collection specs, we know that they can't have dependencies and so don't consult the Dependencies::compute_hash function at all. That sound good to you?

The Option just seemed convenient as a way to represent a spec that has no dependencies. For example, a collection, or a capture with no bindings.

I suppose I don't see "zero bindings" as different from "one or more bindings". It's all the same continuum, an empty set of bindings can be represented in the same hash space as non-empty ones, and thus why jump through the extra Some-check hoops to model the distinction. Or, if it remains a string, an empty string could also serve as the current None.

I've said my piece, though, and it's so minor I'm okay any way you want to roll, including merging as-as.

That's a fair point. I'm inclined to just merge as-is, though, as it doesn't seem like we'll actually regret the current representation.

jgraettinger · 2024-09-12T19:58:46Z

crates/tables/src/lib.rs

+        // Expected last build ID for optimistic concurrency.
+        val expect_build_id: models::Id,
+        // Hash of the last_pub_ids of all the dependencies that were used to build the capture
+        val dependency_hash: Option<String>,


Same as above, this doesn't seem like it should be Option.

I suppose an argument would be when representing deleted specs (spec is None). Still, I think I'd expect the value to be digest([ /* empty slice of (name, last_pub_id) */])

crates/tables/src/lib.rs

crates/tables/src/dependencies.rs

crates/validation/src/errors.rs

jgraettinger · 2024-09-12T20:40:14Z

crates/agent/src/publications/handler.rs

                anyhow::bail!("missing spec for expanded row: {:?}", exp.catalog_name);
            };
            let scope = tables::synthetic_scope(spec_type, &exp.catalog_name);
+            // TODO(phil): currently we set `is_touch: false`, which is consistent with the prior


agreed, i believe these should be touches too

Use consistent field ordering in `tables` and make `is_touch` part of the return value of `into_parts`.

psFried added 2 commits September 10, 2024 09:50

models: impl PartialEq for all models

85320fc

Adds a `PartialEq` implementation for all models. This sets the stage for future work in publications and validation, which needs to determine whether models have changed.

psFried force-pushed the phil/new-pub-conflicts branch from 8df5d9d to 22d0f96 Compare September 10, 2024 23:15

psFried force-pushed the phil/new-pub-conflicts branch 2 times, most recently from e7ebcf0 to fa9f9b0 Compare September 11, 2024 11:30

psFried force-pushed the phil/new-pub-conflicts branch from fa9f9b0 to e6f1a59 Compare September 11, 2024 12:20

psFried requested a review from jgraettinger September 11, 2024 16:44

psFried marked this pull request as ready for review September 11, 2024 16:44

agent: set detail message for touch publications

9c94e72

Updates controllers to set the `detail` for touch publications, to make debugging easier. Also cleans up a few comments, and skip serializing `is_touch` in publication histories when it's `false`.

psFried force-pushed the phil/new-pub-conflicts branch from 3f7eabd to 9c94e72 Compare September 11, 2024 17:32

psFried mentioned this pull request Sep 12, 2024

inferred schema updates blocking interactive publications of captures and materializations #1520

Closed

6 tasks

jgraettinger approved these changes Sep 12, 2024

View reviewed changes

psFried added control-plane change:planned This is a planned change labels Sep 16, 2024

tables: minor tweaks from pr review

b26c774

Use consistent field ordering in `tables` and make `is_touch` part of the return value of `into_parts`.

psFried force-pushed the phil/new-pub-conflicts branch from 03d9928 to b26c774 Compare September 16, 2024 15:22

psFried merged commit 15c7c3a into master Sep 16, 2024

psFried deleted the phil/new-pub-conflicts branch September 16, 2024 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phil/new pub conflicts#1623

Phil/new pub conflicts#1623
psFried merged 6 commits intomasterfrom
phil/new-pub-conflicts

psFried commented Sep 10, 2024 •

edited

Loading

Uh oh!

jgraettinger left a comment

Uh oh!

Uh oh!

jgraettinger Sep 12, 2024

Uh oh!

psFried Sep 12, 2024

Uh oh!

jgraettinger Sep 12, 2024

Uh oh!

psFried Sep 16, 2024

Uh oh!

jgraettinger Sep 12, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jgraettinger Sep 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

psFried commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

jgraettinger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgraettinger Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

psFried Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

jgraettinger Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

psFried Sep 16, 2024

Choose a reason for hiding this comment

Uh oh!

jgraettinger Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jgraettinger Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

psFried commented Sep 10, 2024 •

edited

Loading