Pipeline inspector hook one-pager#7025
Conversation
📝 WalkthroughWalkthroughAdds a design document for a Function Pipeline Inspector: an InspectingFunctionRunner wrapper that emits sanitized RunFunctionRequest/Response JSON plus StepMeta over gRPC to a sidecar PipelineInspectorService, with fail‑open behavior, credential redaction, config/Helm notes, metrics, and a phased rollout plan. (50 words) Changes
Sequence DiagramsequenceDiagram
participant Composition as Composition
participant IFR as InspectingFunctionRunner
participant FR as Underlying FunctionRunner
participant Inspector as PipelineInspector Sidecar
Composition->>IFR: RunFunction(request)
IFR->>IFR: Build StepMeta (trace_id, step_index, iteration, composition id, timestamp)
IFR->>IFR: Sanitize request JSON (strip creds, redact secrets)
IFR->>Inspector: EmitRequest(JSON payload, StepMeta) [fire-and-forget, 100ms timeout]
IFR->>FR: RunFunction(request)
FR-->>IFR: response
IFR->>IFR: Sanitize response JSON
IFR->>Inspector: EmitResponse(JSON payload, error?, StepMeta) [fire-and-forget, 100ms timeout]
IFR-->>Composition: response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Would you like the diagram expanded to show optional OpenTelemetry span correlation and sidecar health/fail‑open branching? 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
jbw976
left a comment
There was a problem hiding this comment.
This looks great @phisco! i like how it will finally provide a foundation for function pipeline debugging and I like how it's using a proven pattern we've already rolled out with change logs.
A few comments for your consideration 🙇♂️
|
|
||
| ### Security: Credential Stripping | ||
|
|
||
| The `RunFunctionRequest` includes a `credentials` field that may contain sensitive data (secrets, connection details). **This field must be cleared before emission**: |
There was a problem hiding this comment.
another possible source of sensitive data to strip would be the observed connection details of the resources. Those get passed in the request also and we should consider stripping those.
There was a problem hiding this comment.
Looks like you've also added stripping out any Secret composed resources, which makes sense given that is the recommended path for exposing XR connection details in v2.
However, I was referring to the connection details that are included for every observed resource, e.g. the connection_details field in the protobuf. That should get stripped too to be consistent, right?
| service PipelineInspectorService { | ||
| // EmitRequest receives the function request before execution. | ||
| // This is a fire-and-forget call; errors do not affect pipeline execution. | ||
| rpc EmitRequest(EmitRequestRequest) returns (EmitRequestResponse); |
There was a problem hiding this comment.
This will be using unary RPC right? https://grpc.io/docs/what-is-grpc/core-concepts/#unary-rpc
I think the EmitRequest() call in the InspectingFunctionRunner would technically be blocking then - the client gRPC call will block until the server sends a response or the context times out.
This likely will not cause any problems (haven't heard of it causing any problems in changelogs that are in the middle of the managed reconciler), because it is just over a local unix socket and responses should have very minimal latency. But it may still be good to acknowledge that this isn't technically non-blocking.
we could consider to use a pretty short context/timeout then just for the gRPC calls, so they can't ever have too much of slow down effect on the overall function pipeline.
do i have that right about the blocking nature of unary gRPC? what do you think about a shorter context for them?
There was a problem hiding this comment.
Good catch! I initially thought of handling them from a separate goroutine, but I ended up deciding not to. Just continuing on error could be enough indeed, I'll update the doc accordingly.
There was a problem hiding this comment.
yep that makes sense to continue on error. i did want to point out, just to be sure you've seen it, the approach taken for change logs that uses a separate shorter context instead of the outer parent context:
https://github.com/crossplane/crossplane-runtime/blob/main/pkg/reconciler/managed/changelogger.go#L114-L117
Perhaps this is what you were referring to later in the doc with this?
The default emit timeout is 100ms. If the sidecar doesn't respond within this time, the emit fails and pipeline execution continues.
There was a problem hiding this comment.
Yes, that's what I meant 👍
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@design/one-pager-pipeline-inspector.md`:
- Around line 234-236: Update the wording for clarity: replace the phrase “On
top of that” with a shorter alternative like “Additionally” or “Also”, and fix
the sentence fragment at the later mention of RunFunctionRequest by adding a
subject (e.g., “It can be used…” or “This field can be used…”) so the sentence
about clearing composed Secret resource `data` reads as a complete sentence;
ensure references to `RunFunctionRequest`, its `credentials` and connection
fields, and composed Secret `data` are preserved and unchanged except for the
clarified wording.
- Around line 9-22: Reflow the long Markdown paragraphs in the "Crossplane
Compositions are defined as pipelines" section and the "Current State" section
(including the bullet points for `crossplane render`, `crossplane beta trace`,
and "Function outputs (logs, events, conditions)") to wrap lines at
approximately 100 columns; ensure sentences are broken between words (not
mid-word), preserve existing punctuation, lists, and code spans, and apply the
same wrapping style to the other long paragraphs noted in the review so the file
reads consistently.
🧹 Nitpick comments (1)
design/one-pager-pipeline-inspector.md (1)
234-249: Ensure secret data is stripped from both observed and desired resources.
RunFunctionRequestcan carry desired resources too, which may include Secretdataor connection details. The doc and sample code only mention observed resources; please clarify that desired resources are sanitized as well (or explain why not).
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@design/one-pager-pipeline-inspector.md`:
- Around line 408-416: The text currently claims splitting RPCs keeps messages
within the default 4MB gRPC limit but omits that JSON encoding can inflate
payload size; update the paragraph referencing RunFunctionRequest,
RunFunctionResponse and the EmitRequest/EmitResponse examples to explicitly note
that JSON marshaling may increase message size versus protobuf, and recommend
that operators either increase the sidecar flag (--max-recv-msg-size) or enable
compression on the sidecar/clients as mitigation; adjust the “Sidecar
configuration” sentence to mention both bumping --max-recv-msg-size and optional
compression as recommended defenses.
- Around line 394-395: The documentation line for `composite_resource_uid` is a
fragment; update the description to a full sentence by adding an explicit
subject (e.g., change "Can be used to group all function calls for the same
resource." to "It can be used to group all function calls for the same
resource."), modifying the `composite_resource_uid` bullet in the doc to improve
readability.
- Around line 303-317: Update SocketPipelineInspector.EmitRequest to sanitize
the desired state as well as observed: after cloning RunFunctionRequest and
clearing Credentials and observed resources' ConnectionDetails/Secret.data, also
clear sanitizedReq.GetDesired().GetComposite().GetResource().ConnectionDetails
and iterate sanitizedReq.GetDesired().GetResources() to set each
Resource.ConnectionDetails = nil and, if the resource Kind/Type indicates a
Secret, clear its data field. Ensure the documentation text mentions sanitizing
both observed and desired resources and Secrets before emission.
Signed-off-by: Philippe Scorsolini <p.scorsolini@gmail.com>
Signed-off-by: Philippe Scorsolini <p.scorsolini@gmail.com>
4f3adf4 to
9d6e440
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@design/one-pager-pipeline-inspector.md`:
- Around line 107-127: When step.BuildMetadata(ctx, name, req) returns an error
in Runner.RunFunction, ensure we preserve the fail‑open behavior by populating a
safe, non‑nil StepMeta rather than leaving meta nil: after the error path set
meta to a minimal partial step.StepMeta (e.g., with FunctionName/name and any
safe defaults or empty values) and keep logging the BuildMetadata error; then
call inspector.EmitRequest and inspector.EmitResponse with that safe meta (or
alternatively make EmitRequest/EmitResponse tolerate nil by adding guards),
ensuring no nil deref occurs while preserving the current proceed-on-error
semantics.
♻️ Duplicate comments (5)
design/one-pager-pipeline-inspector.md (5)
9-22: Wrap long Markdown lines to ~100 columns.A few paragraphs and bullets here exceed the 100‑column guideline; could we
reflow them for readability? Thank you! As per coding guidelines, please wrap
Markdown lines at ~100 columns unless a link would be less readable.
303-317: Sanitize desired resources (and Secret data) before emission.The snippet strips observed resources only; desired resources can also carry
connection details and Secret data. Could we extend the sanitization text and
example to cover desired state as well? Thanks!
408-416: Note JSON expansion vs 4MB gRPC limit.Even with split RPCs, JSON marshaling can inflate payloads beyond the default
4MB limit. Could we mention this and suggest max‑recv‑msg‑size or compression as
mitigation? Appreciate it!
303-306: Minor wording polish for clarity.Could we replace “On top of that” with a tighter phrasing (e.g., “Also” or
“Additionally”)? Thanks!
394-395: Complete the sentence forcomposite_resource_uid.Line 394 reads as a fragment; consider adding a subject (e.g., “It can be used
to group…”). Thanks!
Signed-off-by: Philippe Scorsolini <p.scorsolini@gmail.com>
9d6e440 to
39d74c8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@design/one-pager-pipeline-inspector.md`:
- Around line 285-287: The `iteration` field is ambiguous; pick a single
semantics and update the doc comment and any related descriptions to match:
either make `iteration` a reconciliation-wide monotonically increasing counter
(shared by all steps in the same reconciliation) or make it a per-step retry
counter (unique to each step). Update the comment on the `iteration` field (and
any mentions elsewhere in the document) to clearly state which choice was made,
e.g., "Monotonically increasing reconciliation-wide counter..." or "Per-step
retry counter, reset for each step...", and ensure all examples and references
to iteration semantics (shared vs per-step) consistently use that definition.
- Around line 302-307: Update the "Security: Credential Stripping" section to
explicitly require sanitization of the pipeline RunFunctionRequest.context in
addition to clearing RunFunctionRequest.credentials and Secret.data: define the
chosen strategy (allowlist, denylist, or opt‑in emission) and the exact behavior
(which keys are emitted, which are redacted, and default policy), state who
controls the policy and how to opt in/out, and add a brief example of
sanitization rules so implementers know to filter context keys before emitting
RunFunctionRequest.
🧹 Nitpick comments (1)
design/one-pager-pipeline-inspector.md (1)
79-82: Would a cache‑hit indicator help downstream debugging?Since the wrapper runs before the cached runner, a cache hit will still emit req/resp without an actual function call. Would it be useful to include a
cache_hit(orresponse_source) field inStepMetato help consumers distinguish cached vs executed steps?
| ### Security: Credential Stripping | ||
|
|
||
| The `RunFunctionRequest` includes `credentials` and connection details fields that may contain sensitive data, **this field must be cleared before emission**. | ||
|
|
||
| On top of that, if any composed resource is a Secret, its `data` field must also be cleared before emission. | ||
|
|
There was a problem hiding this comment.
Consider redacting/controlling pipeline context data.
Functions can write arbitrary data into the pipeline context, which is included in RunFunctionRequest. If that data can be sensitive, emitting it verbatim risks leaking secrets/PII. Should we define a sanitization strategy (e.g., allowlist, denylist, or opt‑in emission for context keys) in this section? Thanks for considering.
✏️ Suggested doc addition
-The `RunFunctionRequest` includes `credentials` and connection details fields that may contain sensitive data, **this field must be cleared before emission**.
+The `RunFunctionRequest` includes `credentials` and connection details fields that may contain sensitive data, **this field must be cleared before emission**.
+Additionally, the pipeline `context` can contain arbitrary data written by functions; consider a
+clear policy for sanitizing or selectively emitting context keys to avoid leaking secrets/PII.🧰 Tools
🪛 LanguageTool
[style] ~305-~305: ‘On top of that’ might be wordy. Consider a shorter alternative.
Context: ...eld must be cleared before emission**. On top of that, if any composed resource is a Secret, ...
(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)
🤖 Prompt for AI Agents
In `@design/one-pager-pipeline-inspector.md` around lines 302 - 307, Update the
"Security: Credential Stripping" section to explicitly require sanitization of
the pipeline RunFunctionRequest.context in addition to clearing
RunFunctionRequest.credentials and Secret.data: define the chosen strategy
(allowlist, denylist, or opt‑in emission) and the exact behavior (which keys are
emitted, which are redacted, and default policy), state who controls the policy
and how to opt in/out, and add a brief example of sanitization rules so
implementers know to filter context keys before emitting RunFunctionRequest.
Signed-off-by: Philippe Scorsolini <p.scorsolini@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@design/one-pager-pipeline-inspector.md`:
- Around line 347-349: The design currently emits the pipeline "context" field
as-is which creates a security risk; update the document to treat context as
potentially sensitive by (1) proposing an allowlist approach for safe context
keys and an explicit API/configuration flag to control context emission, (2)
requiring that the sidecar or pipeline implementation sanitize or redact context
entries (esp. user-written values) before emitting composed resources, and (3)
adding clear guidance for authors about what may never be written to context
(secrets, tokens, PII) and documenting default behavior and override mechanism;
reference the "context" field, composed resource handling for Secrets (clear
`data`), and the sidecar/pipeline emission path in the text so reviewers can
find and apply these changes.
- Around line 204-217: The sanitizer must detect Kubernetes Secrets by checking
the extracted metadata: if meta.CompositeResourceKind (from BuildMetadata)
equals "Secret" (compare case-insensitively and optionally verify
apiVersion/group indicates core/v1), then treat it as sensitive; implement the
sanitizer branch to "drop data" by clearing the Secret payload fields (set the
Secret's data and stringData fields to an empty map or nil) and avoid returning
any secret values (you can also remove binaryData if present), and
record/annotate that data was stripped so callers know content was redacted.
Ensure the check references meta.CompositeResourceKind and
meta.CompositeResourceAPIVersion and the sanitizer function that currently
contains the Secret placeholders so the logic replaces the // ... with these
explicit checks and field clears.
🧹 Nitpick comments (2)
design/one-pager-pipeline-inspector.md (2)
403-406: Consider expanding observability metrics for the inspector itself.Currently only one error counter is defined. For operational visibility, would it be helpful to add:
- Success metrics:
function_run_function_pipeline_inspector_emits_total(to see if it's working at all)- Latency metrics: Histogram of emit duration to detect when sidecars are slow
- Sidecar health: A gauge indicating whether the sidecar connection is healthy
These would help operators distinguish between "feature disabled," "sidecar down," and "sidecar slow" scenarios. Since this is an observability feature, having good observability of the feature itself seems valuable!
What do you think?
479-513: Consider adding operational guidance for sidecar lifecycle.The Helm configuration looks good! One thing I'm curious about: what happens if the sidecar crashes or isn't ready when Crossplane starts emitting data?
The fail-open behavior handles runtime failures well (continues on timeout), but should the design document mention:
- Whether the sidecar needs readiness/liveness probes
- How Crossplane handles initial connection failures (retry logic?)
- Graceful behavior during sidecar restarts
This might be worth a brief note in the "Fail-Open Behavior" section or here, to help operators understand the operational characteristics. Not a blocker, just wondering if we should set expectations!
| // Extract XR metadata from the request | ||
| xr := req.GetObserved().GetComposite().GetResource() | ||
| if xr != nil { | ||
| meta.CompositeResourceAPIVersion = getStringField(xr, "apiVersion") | ||
| meta.CompositeResourceKind = getStringField(xr, "kind") | ||
| if metadata := getStructField(xr, "metadata"); metadata != nil { | ||
| meta.CompositeResourceName = getStringField(metadata, "name") | ||
| meta.CompositeResourceNamespace = getStringField(metadata, "namespace") | ||
| meta.CompositeResourceUID = getStringField(metadata, "uid") | ||
| } | ||
| } | ||
|
|
||
| return &meta, nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for existing Secret kind detection patterns in the codebase
rg -nP --type=go -C3 '\bkind.*[Ss]ecret|[Ss]ecret.*kind' | head -100Repository: crossplane/crossplane
Length of output: 1903
🏁 Script executed:
# Also search for how resources are typically checked for their kind/type in Go
rg -nP --type=go -C2 'GetKind\(\)|\.Kind' | head -80Repository: crossplane/crossplane
Length of output: 4685
🏁 Script executed:
# Read the specific lines in the design document to understand the context
cat -n design/one-pager-pipeline-inspector.md | sed -n '200,220p'Repository: crossplane/crossplane
Length of output: 1092
🏁 Script executed:
# Read the sanitization section mentioned in the review
cat -n design/one-pager-pipeline-inspector.md | sed -n '360,380p'Repository: crossplane/crossplane
Length of output: 892
🏁 Script executed:
# Read the BuildMetadata section completely to understand what it extracts
cat -n design/one-pager-pipeline-inspector.md | sed -n '150,230p'Repository: crossplane/crossplane
Length of output: 3730
Clarify the Secret detection mechanism in the sanitization section.
Good catch that BuildMetadata extracts the CompositeResourceKind at line 208! However, the sanitization logic at lines 362-363 and 371-372 is shown as placeholders (// ...) for detecting and clearing Secret data. Could you briefly explain how the sanitizer will identify which resources are Secrets (e.g., by checking kind == "Secret")? Also, clarifying what "drop data" means for Secrets would be helpful—does it mean clearing the entire data field in the Kubernetes Secret structure, or specific sub-fields?
🤖 Prompt for AI Agents
In `@design/one-pager-pipeline-inspector.md` around lines 204 - 217, The sanitizer
must detect Kubernetes Secrets by checking the extracted metadata: if
meta.CompositeResourceKind (from BuildMetadata) equals "Secret" (compare
case-insensitively and optionally verify apiVersion/group indicates core/v1),
then treat it as sensitive; implement the sanitizer branch to "drop data" by
clearing the Secret payload fields (set the Secret's data and stringData fields
to an empty map or nil) and avoid returning any secret values (you can also
remove binaryData if present), and record/annotate that data was stripped so
callers know content was redacted. Ensure the check references
meta.CompositeResourceKind and meta.CompositeResourceAPIVersion and the
sanitizer function that currently contains the Secret placeholders so the logic
replaces the // ... with these explicit checks and field clears.
There was a problem hiding this comment.
the implementation will address that.
There was a problem hiding this comment.
@phisco, understood! It makes sense to work out those implementation details when coding the actual sanitizer. The design document establishes the intent to redact Secret data, and the specifics of detection and field handling can be refined during development.
Thanks for clarifying! 👍
🧠 Learnings used
Learnt from: CR
Repo: crossplane/crossplane PR: 0
File: contributing/README.md:0-0
Timestamp: 2025-12-02T14:41:54.485Z
Learning: Review code for functional correctness, proper documentation and intent explanation, adherence to Coding Style guidelines, idiomatic alignment with existing Crossplane APIs, sufficient test coverage, DCO sign-off, and passing CI jobs
Learnt from: CR
Repo: crossplane/crossplane PR: 0
File: contributing/README.md:0-0
Timestamp: 2025-12-02T14:41:54.485Z
Learning: Discuss changes in a GitHub issue before starting implementation; new features can only be merged during the active development period of a Crossplane release cycle
| Additionally, if any composed resource (observed or desired) is a Secret, its `data` field must | ||
| also be cleared before emission. The `context` field is passed through as-is, since functions | ||
| already have access to it and it typically contains non-sensitive configuration data. |
There was a problem hiding this comment.
Reconsider the security boundary for the context field.
The current design passes the pipeline context field through as-is with the reasoning that "functions already have access to it and it typically contains non-sensitive configuration data." However, I'm concerned about this for a few reasons:
-
Different trust boundaries: Functions run as part of the pipeline with certain security constraints, while the sidecar is a separate component that may have different access patterns or be implemented by third parties.
-
User-defined data: Functions can write arbitrary data to the context, and we can't guarantee it's non-sensitive. For example, a custom function might cache intermediate computed values, tokens, or other data.
-
"Typically" isn't "always": The phrase "typically contains non-sensitive configuration data" acknowledges there could be exceptions.
Could you clarify the decision here? Options might include:
- Define an allowlist of context keys that are safe to emit
- Add a configuration option to control context emission
- Document that implementations should sanitize context
- Provide guidance on what should/shouldn't be written to context
Past reviews flagged this concern; I want to make sure we've thought through the security implications. Thanks!
Based on learnings, security review is critical for Crossplane.
🤖 Prompt for AI Agents
In `@design/one-pager-pipeline-inspector.md` around lines 347 - 349, The design
currently emits the pipeline "context" field as-is which creates a security
risk; update the document to treat context as potentially sensitive by (1)
proposing an allowlist approach for safe context keys and an explicit
API/configuration flag to control context emission, (2) requiring that the
sidecar or pipeline implementation sanitize or redact context entries (esp.
user-written values) before emitting composed resources, and (3) adding clear
guidance for authors about what may never be written to context (secrets,
tokens, PII) and documenting default behavior and override mechanism; reference
the "context" field, composed resource handling for Secrets (clear `data`), and
the sidecar/pipeline emission path in the text so reviewers can find and apply
these changes.
There was a problem hiding this comment.
that's a fair concern, but there is little we can do here unfortunately, i think that's where we should draw the boundary between what we need to solve on crossplane side and what downstream implementations should solve.
There was a problem hiding this comment.
@phisco, that makes sense—I appreciate the clarification on where the boundary lies. Following the pattern of minimal upstream hooks with downstream implementations handling the specifics is architecturally consistent with other Crossplane designs like change logs.
Since the responsibility for sanitizing the context field falls to downstream implementations, would it make sense to add a note in the design document highlighting this? Something like:
Note on Context Security: The pipeline
contextfield is emitted as-is. Downstream sidecar implementations are responsible for sanitizing or filtering context entries based on their security requirements and trust boundaries. Platform teams should review what data functions write to the context and configure their inspector implementations accordingly.
This would help ensure downstream implementers are aware of the security considerations when building their sidecars. But if you feel the current documentation is sufficient, I'm good with the decision as-is. Thanks for thinking through the design boundary with me! 🙏
✏️ Learnings added
Learnt from: phisco
Repo: crossplane/crossplane PR: 7025
File: design/one-pager-pipeline-inspector.md:347-349
Timestamp: 2026-01-22T17:27:31.137Z
Learning: In the Pipeline Inspector design for Crossplane, the pipeline `context` field is intentionally emitted as-is to downstream implementations. The security boundary is drawn such that Crossplane provides the minimal hook mechanism, and downstream sidecar implementations are responsible for sanitizing or filtering the context based on their security requirements.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: crossplane/crossplane PR: 0
File: contributing/README.md:0-0
Timestamp: 2025-12-02T14:41:54.485Z
Learning: Review code for functional correctness, proper documentation and intent explanation, adherence to Coding Style guidelines, idiomatic alignment with existing Crossplane APIs, sufficient test coverage, DCO sign-off, and passing CI jobs
Learnt from: CR
Repo: crossplane/crossplane PR: 0
File: contributing/README.md:0-0
Timestamp: 2025-12-02T14:41:54.485Z
Learning: Discuss changes in a GitHub issue before starting implementation; new features can only be merged during the active development period of a Crossplane release cycle
|
@haarchri suggested that it'd be useful to also know whether the response was cached, unfortunately at a quick look it doesn't look so easy, as the context only flows down, not up, so we can't leverage that directly to pass the information up from the cached runner to the inspected wrapping runner. So, I'd prefer to proceed with this and then add that in the future. |
| // ... | ||
|
|
||
| // Inject step metadata into context | ||
| stepCtx := step.ContextWithStepMeta(ctx, traceID, stepIndex, compositionName, 0) |
There was a problem hiding this comment.
The iteration index, starts from zero, it's increased if unsatisfied requirements are received and another call is needed.
There was a problem hiding this comment.
dropped from the signature, given that it's always going to be initialized to 0 there
| } | ||
|
|
||
| // Generate a unique span_id for this function invocation | ||
| meta.SpanID = uuid.NewString() |
There was a problem hiding this comment.
Worth using ulid or some other ordered UID for span and trace IDs?
There was a problem hiding this comment.
They have no such guarantee in OTEL, and although I agree it would be nice, I don't want to give more guarantees than required. It should be completely opaque for now.
| // Name of the Composition defining this pipeline. | ||
| string composition_name = 6; | ||
|
|
||
| // UID of the composite resource being reconciled. | ||
| string composite_resource_uid = 7; | ||
|
|
||
| // Name of the composite resource being reconciled. | ||
| string composite_resource_name = 8; | ||
|
|
||
| // Namespace of the composite resource (empty for cluster-scoped resources). | ||
| string composite_resource_namespace = 9; | ||
|
|
||
| // API version of the composite resource (e.g., "example.org/v1"). | ||
| string composite_resource_api_version = 10; | ||
|
|
||
| // Kind of the composite resource (e.g., "XDatabase"). | ||
| string composite_resource_kind = 11; |
There was a problem hiding this comment.
What if it's an operation function? Worth having distinct sub-messages for composition and operation?
| // The original function request as JSON bytes (with credentials stripped for security). | ||
| // This allows consumers to parse the request without needing the proto schema. | ||
| bytes request = 1; |
There was a problem hiding this comment.
The tradeoff of JSON bytes is more space on the wire and potentially on disk, right? Is expecting consumers to have the protobuf schema unreasonable?
I don't think they'd need the same schema as Crossplane. Though to be fair if they didn't they might miss fields that were actually in the request (which JSON wouldn't).
There was a problem hiding this comment.
This is going to the sidecar, so the wire size is less of an issue. Then if the sidecar wants to convert it back to a more storage/network optimized encoding for its own purposes, it can. I think it makes sense to use plain json here and have as little assumptions as possible for sidecars.
|
|
||
| ### gRPC Message Size Limits | ||
|
|
||
| By default, each `EmitRequest` and `EmitResponse` call stays within the 4MB gRPC limit. However, users who have increased the `--max-recv-msg-size` on their functions may have larger payloads. |
There was a problem hiding this comment.
Does it? It's more likely to be within the limit, but I imagine a request very close to the limit + the step meta could push it over.
Description of your changes
This PR adds the Pipeline Inspector design document, which proposes a mechanism to capture
RunFunctionRequestandRunFunctionResponsedata for debugging and observability of function pipeline executions.Fixes #
I have:
Run(design doc only)earthly +reviewableto ensure this PR is ready for review.Added or updated unit tests.(design doc only)Added or updated e2e tests.(design doc only)Linked a PR or a docs tracking issue to document this change.(design doc only)Added(design doc only)backport release-x.ylabels to auto-backport this PR.Followed the API promotion workflow if this PR introduces, removes, or promotes an API.(design doc only)Need help with this checklist? See the cheat sheet.