Skip to content

Capture partial frames for syscalls which exceed bpf LOOP/CHUNK limits #1755

@benkilimnik

Description

@benkilimnik

Problem Description
Pixie is unable to capture syscalls with an iovcnt >42 or a message size >120 KiB .

These variables are set conservatively to keep the instruction count below BPF's limit for version 4 kernels (4096 per probe). These limits, however, result in data loss and incomplete syscall tracing. For example, in a community-shared NodeJS application transferring just 10kB of data, the iovec array contained 257 entries, which is well beyond the current LOOP_LIMIT of 42. We've also seen the message size (CHUNK_LIMIT) exceeded in k8ssandra deployments. This is very likely an issue across all protocols.

Proposed Solutions

  1. Dynamically increase the loop limit for newer kernels with higher instruction limits (1 million for kernels > 5.1). This could mitigate the issue, though it would likely persist for large messages/iovecs. (This approach could be combined with option 2. to increase both the amount of data and number of frames Pixie can trace)
  • Each PEM now tracks its kernel version due to changes introduced in #1685. We could pass this version in as a compile time flag / preprocessor directive.
  • Note that even if bpf were able to trace everything for large messages, Pixie would still truncate the data (e.g. based on FLAGS_max_body_bytes for HTTP). However, capturing complete metadata could still be invaluable, conveying headers, response codes, and other important information.

  1. For each event where data loss occurs due to LOOP/CHUNK limits, pass metadata to the event parser, which attempts to process a partial frame. For this to work, protocol parsers must be modified to work lazily, parsing as far as possible and returning a new parseState kPartialFrame when they've processed enough bytes to capture essential metadata.

parse_call_stack

  • After our LOOP/CHUNK limit is reached, the event parser will eventually receive a contiguous head of the data stream buffer that ends with a gap representing the bytes we missed. Note that there could be any number of valid frames before the gap because Pixie's sampling frequency is greater than its push frequency (sampling is used loosely here, as Pixie receives every event and not a subset of them). Moreover, the application itself could be batching messages such that an incomplete chunk could contain a number of valid frames before the gap.

contiguous_head_with_incomplete_chunk

  • In BPF, we can determine the full message size and keep track of how many bytes were missed if the LOOP/CHUNK limit is reached. We can pass this information through the event to the datastream buffer, so that the event parser knows when to expect an incomplete chunk. A DCHECK would enforce that for a given call to ParseFramesLoop with a contiguous head, a partial frame is pushed at most once since we expect to only reach the gap once.

  • To avoid potential side effects from using the kPartialFrame state (i.e. prevent it from masking other errors), we could use a heuristic to determine if this partial frame was caused by a lack of bytes. We could store the max size of fields that we could possibly parse in the metadata of a frame. If this is greater than the number of bytes remaining, then we hit our gap, so it makes sense to push a partial frame. If however, we have sufficient bytes remaining to parse these fields, then a different error likely occurred and we don't want to push the partial frame.


  1. One alternative option is to use tail calls to start a new bpf program where the other left off. This might be an invasive solution with some performance trade-offs (the upper nesting limit is 33 calls).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/datacollectorIssues related to Stirling (datacollector)kind/featureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions