Skip to content

fix(field-mask): Allow FieldMasking on oneof variants#43924

Merged
julianwiedmann merged 2 commits intocilium:mainfrom
mereta:fix/fieldmask
Jan 23, 2026
Merged

fix(field-mask): Allow FieldMasking on oneof variants#43924
julianwiedmann merged 2 commits intocilium:mainfrom
mereta:fix/fieldmask

Conversation

@mereta
Copy link
Copy Markdown
Contributor

@mereta mereta commented Jan 22, 2026

Problem

While testing field aggregation with source.namespace, destination.namespace, and L4/L7 protocol fields, I discovered that flows were not aggregating correctly. When a field mask included multiple oneof variants (e.g., l4.TCP.destination_port and l4.UDP.destination_port), the field mask Copy() method would create spurious empty structures for ALL specified variants, not just the active one.

This caused TCP flows to incorrectly include empty UDP structures in their aggregation keys, preventing proper aggregation even when the actual flow data was identical.

Solution

1st commit:
Modified FieldMask.Copy() to detect oneof fields using protobuf reflection (fd.ContainingOneof()) and only copy the active variant by checking WhichOneof(). This ensures that only the actual protocol present in the source flow is copied to the destination, preventing spurious structures.
This enables users to specify fine-grained oneof variant fields (e.g., l4.TCP.destination_port, l4.UDP.destination_port) in field masks for flow aggregation and export without creating incorrect aggregation keys.

Addition in 2nd commit:
Enrich Aggregated hubble flow logs with timestamp to have temporal context.
Aggregation only keeps and aggregates on the fields specified in the fieldAggregate.
There is no way to preserve a timestamp, and specifying time in fieldAggretate defeats aggregation.

The solution is to preserve the 1st occurring timestamp in the processedFlow after the aggregation key is generated.
The aggregation logic is not affected and temporal context is preserved.

Testing

  • Added unit tests in fieldmask_test.go demonstrating the fix works for L4 (TCP/UDP/SCTP/ICMP) and L7 (HTTP/DNS) oneof variants
  • Added integration test in aggregator_test.go showing TCP+HTTP flows now correctly aggregate when the field mask includes both TCP/UDP and HTTP/DNS paths
  • All existing tests continue to pass

Manual Testing Steps:

Manual Verrification:
BYO CNI Cilium Cluster

  • helm upgrade cilium cilium/cilium --version 1.19-dev
  • Build local images and update cluster
  • Deploy a workload that produces alot of pings to itself
  • Test aggreagtion via various combinations or below.
   --set "hubble.export.dynamic.config.content[1].aggregationInterval=25s" \
   --set "hubble.export.dynamic.config.content[1].fieldMask={time,source.namespace,source.pod_name,destination.namespace,destination.pod_name,l4,IP,node_name,is_reply,verdict,drop_reason_desc}" \
   --set "hubble.export.dynamic.config.content[1].fieldAggregate={l4.TCP.destination_port,l4.UDP.destination_port,l7.http.code,l7.dns.rcode,source.workloads,destination.workloads}" \
   --set "hubble.export.dynamic.config.content[1].name=fmfix" \
   --set "hubble.export.dynamic.config.content[1].filePath=/var/run/cilium/hubble/events-fmfix.log" \
   --set "hubble.export.dynamic.config.content[1].includeFilters[0].source_pod[0]=default/" \
   --set "hubble.export.dynamic.config.content[1].includeFilters[1].destination_pod[0]=default/"
  • Verification of cilium-flowlog-config ConfigMap
  • Verification of logs in multiple files via node-shell
    • var/run/cilium/hubble
    • check that file names match CM
    • check for aggregation counts
    • check for config reloads
    • check setting to 0s disables aggregation

Both UDP and TCP are logged:
image

{"flow":{"time":"2026-01-22T16:47:11.987497620Z","l4":{"TCP":{"destination_port":40876}},"source":{"workloads":[{"name":"kapinger-bad","kind":"Deployment"}]},"destination":{"workloads":[{"name":"kapinger-good","kind":"Deployment"}]},"l7":{},"aggregate":{"egress_flow_count":3}},"time":"2026-01-22T16:47:11.987497620Z"}
{"flow":{"time":"2026-01-22T16:47:14.677500407Z","l4":{"UDP":{"destination_port":42749}},"source":{"workloads":[{"name":"coredns","kind":"Deployment"}]},"destination":{"workloads":[{"name":"kapinger-good","kind":"Deployment"}]},"l7":{},"aggregate":{"egress_flow_count":1}},"time":"2026-01-22T16:47:14.677500407Z"}

FieldMask Changes:

New ability to mask nested oneof fields.
No other changes in FM functionality.

Hubble Export FieldMask - Introduce functionality to specify multiple 'oneof' variants like l4.TCP/l4.UDP
Hubble Export Aggregation - Enrich aggregated flow logs with timestamp to preserve temporal context

@mereta mereta requested a review from a team as a code owner January 22, 2026 11:57
@mereta mereta requested a review from devodev January 22, 2026 11:57
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 22, 2026
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jan 22, 2026
@mereta mereta force-pushed the fix/fieldmask branch 2 times, most recently from f615425 to 6ffc875 Compare January 22, 2026 12:18
@maintainer-s-little-helper
Copy link
Copy Markdown

Commit ae47b24 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jan 22, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jan 22, 2026
@mereta
Copy link
Copy Markdown
Contributor Author

mereta commented Jan 22, 2026

@devodev Hey! Could you have a look at this? I ran in to an awkward scenario while testing! :)

@mereta mereta changed the title Fix/fieldmask fix(field-mask): Allow FieldMasking on oneof variants Jan 22, 2026
@mereta mereta force-pushed the fix/fieldmask branch 3 times, most recently from 6027796 to 3eef569 Compare January 22, 2026 13:12
@devodev devodev self-assigned this Jan 22, 2026
@devodev devodev added release-note/bug This PR fixes an issue in a previous release of Cilium. area/hubble Impacts hubble server or relay and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jan 22, 2026
@devodev devodev added this to the 1.20 milestone Jan 22, 2026
@devodev devodev added needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch kind/bug This is a bug in the Cilium logic. labels Jan 22, 2026
Copy link
Copy Markdown
Contributor

@devodev devodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great, and awesome catch BTW!
Especially if we can make it before the v1.19.0 release 💪

I left some comments but nothing major.
One thing I may ask is to update your commit messages to describe the issue at hand and the actual fix.

The PR description already explains very well the issue regarding fieldmask. As for the timestamp, I would explain why timestamping at Add time vs Export time matters and also why the field should not be part of the aggregationKey (essentially that we always want to have the time field in aggregated events).

Lastly we should update the release-notes to be brief but descriptive of what we fixed and why instead of the actual change made. What I mean is its not clear to me as a lambda user of Hubble export what "Allow FieldMasking on oneof variants" means. I would also try and include context such as "Hubble export aggregation" so its easier to grok than simply "aggregation" or "aggregated flows".

Thank you again for the contribution :)

@mereta mereta force-pushed the fix/fieldmask branch 3 times, most recently from 7dee7af to 219151f Compare January 22, 2026 16:22
@mereta mereta requested a review from devodev January 22, 2026 16:49
@anubhabMajumdar
Copy link
Copy Markdown
Contributor

/test

@devodev
Copy link
Copy Markdown
Contributor

devodev commented Jan 22, 2026

@mereta I'm not sure the commit messages are descriptive enough. If I don't have the PR in front of me, I would need to analyze the added tests to figure out what was the issue and how it was fixed exactly. Could you explain in the body of the commit what was the issue observed and how it was fixed (the actual change to the fieldmask copy logic).

Similarly, I'd like to see more meat for "aggregation: Enrich aggregated flows with timestamp for temporal ctx".

Also make sure the title provides enough context. I would prefix both with hubble.

@mereta
Copy link
Copy Markdown
Contributor Author

mereta commented Jan 23, 2026

Also make sure the title provides enough context. I would prefix both with hubble.

Oh sorry, yes i can add that I misunderstood. I'll replace the fix() & chore() with 'hubble' as the commit message is limited to 75 chars. :)

….TCP)

When multiple oneof variants are specified in a field mask (e.g., both
TCP and UDP), the previous implementation would copy all variants,
creating spurious structures for inactive ones. This caused identical
flows to generate different aggregation keys.

Added oneof-aware copying using WhichOneof() to only copy the active
variant, preventing spurious structures and ensuring consistent keys.

Signed-off-by: mereta <mereta.degutyte@hotmail.co.uk>
…ontext

Aggregation only keeps and aggregates on the fields specified in the fieldAggregate.
There is no way to preserve a timestamp, and specifying time in fieldAggregate defeats aggregation.
The solution is to preserve the 1st occurring timestamp in the processedFlow after the aggregation key is generated.
The aggregation logic is not affected and temporal context is preserved.

Signed-off-by: mereta <mereta.degutyte@hotmail.co.uk>
@devodev
Copy link
Copy Markdown
Contributor

devodev commented Jan 23, 2026

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 23, 2026
@julianwiedmann julianwiedmann added this pull request to the merge queue Jan 23, 2026
Merged via the queue into cilium:main with commit 4549ee8 Jan 23, 2026
76 checks passed
@mhofstetter mhofstetter mentioned this pull request Jan 26, 2026
9 tasks
@mhofstetter mhofstetter added backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. and removed needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Jan 26, 2026
@github-actions github-actions bot added backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. and removed backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. labels Jan 26, 2026
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/hubble Impacts hubble server or relay backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. kind/bug This is a bug in the Cilium logic. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

5 participants