Skip to content

fix(helm): Enable log_ingestor by default (fixes #1912); gate log_ingestor templates to S3 logs_input only.#1913

Merged
junhaoliao merged 6 commits into
y-scope:mainfrom
junhaoliao:helm-enable-log-ingestor
Jan 29, 2026
Merged

fix(helm): Enable log_ingestor by default (fixes #1912); gate log_ingestor templates to S3 logs_input only.#1913
junhaoliao merged 6 commits into
y-scope:mainfrom
junhaoliao:helm-enable-log-ingestor

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Jan 26, 2026

Copy link
Copy Markdown
Member

Description

Note

This PR is part of the ongoing work for #1309. More PRs will be submitted to complete the Helm chart.

Enables default clpConfig.log_ingestor values in Helm values.yaml so the chart aligns with the
default CLP configs and avoids the API crash loop when log_ingestor is null. Log-ingestor Helm
resources now render only when logs_input.type is s3.

This PR updates:

  • Default clpConfig.log_ingestor values in tools/deployment/package-helm/values.yaml.
  • Log-ingestor Deployment/Service/PV/PVC templates to require logs_input.type == "s3".

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Test with fs input

cd tools/deployment/package-helm
./set-up-test.sh

Observed:

...
# No log-ingestor pods were started
...
All jobs completed and services are ready.

Test with S3 input

Updated tools/deployment/package-helm/values.yaml to set:

logs_input:
  type: "s3"
  aws_authentication:
    type: "credentials"
    credentials:
      access_key_id: "xxx"
      secret_access_key: "xxxx"

Ran:

cd tools/deployment/package-helm
./set-up-test.sh

Observed:

...
test-clp-log-ingestor-774f89bd89-dfmw4           1/1     Running           0          63s
...
All jobs completed and services are ready.

Ran:

curl --location 'localhost:30302/s3_scanner' \
--header 'Content-Type: application/json' \
--data '{"bucket_name": "xxxxx", "key_prefix": "/", "region": "us-east-2"}'

Observed log-ingestor logs:

[2026-01-26  17:00:54]	{"timestamp":"2026-01-26T22:00:54.489283Z","level":"INFO","fields":{"message":"Server started at 0.0.0.0:3002"},"filename":"components/log-ingestor/src/bin/log_ingestor.rs","line_number":103}
[2026-01-26  17:10:42]	{"timestamp":"2026-01-26T22:10:42.156910Z","level":"INFO","fields":{"message":"Create S3 scanner ingestion job.","config":"S3ScannerConfig { base: BaseConfig { bucket_name: NonEmptyString(\"xxxxx\"), key_prefix: NonEmptyString(\"/\"), region: Some(NonEmptyString(\"us-east-2\")), endpoint_url: None, dataset: None, timestamp_key: None, unstructured: false }, scanning_interval_sec: 30, start_after: None }"},"filename":"components/log-ingestor/src/routes.rs","line_number":167}
# redacted some seemingly unrelated AWS SDK warnings
[2026-01-26  17:10:45]	{"timestamp":"2026-01-26T22:10:45.165321Z","level":"INFO","fields":{"message":"Created S3 scanner ingestion job.","job_id":"9169b873-e6e4-4400-8017-735db7c61fb8"},"filename":"components/log-ingestor/src/routes.rs","line_number":175}
[2026-01-26  17:10:45]	{"timestamp":"2026-01-26T22:10:45.421531Z","level":"INFO","fields":{"message":"Scanned new object metadata on S3.","object":"ObjectMetadata { bucket: NonEmptyString(\"xxxxx\"), key: NonEmptyString(\"/0316c8cf-0a37-4fb6-a0ef-29ef14258498_0_30.jsonl\"), size: 8164 }"},"filename":"components/log-ingestor/src/ingestion_job/s3_scanner.rs","line_number":121}
...

Summary by CodeRabbit

  • Chores

    • Updated Helm chart version to 0.1.3-dev.2.
  • Bug Fixes

    • Log ingestor now only deploys when S3 is explicitly selected, preventing unintended deployments.
    • Tightened validation across log ingestor components to require proper configuration before rendering resources.
    • Added explicit default configuration values for log ingestor settings (port, buffers, channels, logging level).

✏️ Tip: You can customize this high-level summary in your review settings.

…e `log_ingestor` templates to S3 `logs_input` only.
@junhaoliao junhaoliao requested a review from a team as a code owner January 26, 2026 22:57
@coderabbitai

coderabbitai Bot commented Jan 26, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

This pull request increments the Helm chart version, tightens rendering conditions for log-ingestor resources to require both clpConfig.log_ingestor and clpConfig.logs_input.type == "s3", and replaces the null log_ingestor values with a populated settings block.

Changes

Cohort / File(s) Summary
Version bump
tools/deployment/package-helm/Chart.yaml
Helm chart version updated from 0.1.3-dev.1 to 0.1.3-dev.2.
Template condition tightening
tools/deployment/package-helm/templates/log-ingestor-deployment.yaml, .../log-ingestor-logs-pv.yaml, .../log-ingestor-logs-pvc.yaml, .../log-ingestor-service.yaml
Each template's top-level conditional changed from a single log_ingestor/enable check to a conjunction requiring clpConfig.log_ingestor (truthy) AND clpConfig.logs_input.type == "s3". Review rendering guards and any downstream references to logs_input.
Configuration update
tools/deployment/package-helm/values.yaml
clpConfig.logs_input (previously null/commented) replaced with an explicit log_ingestor/logs_input block: port: 30302, buffer_flush_timeout: 300, buffer_flush_threshold: 268435456, channel_capacity: 10, logging_level: "INFO". Verify defaults align with templates.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Dev as Developer (values.yaml)
participant Helm as Helm template renderer
participant Templates as log-ingestor templates
participant K8s as Kubernetes API
participant S3cfg as clpConfig.logs_input (type)

Dev->>Helm: supply values (including clpConfig.logs_input.type)
Helm->>Templates: evaluate condition (clpConfig.log_ingestor AND logs_input.type == "s3")
alt condition true
    Templates->>K8s: render and apply Deployment, Service, PV, PVC
else condition false
    Templates-->>Helm: skip rendering log-ingestor resources
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main changes: enabling log_ingestor by default and gating templates to S3 input only, matching the core modifications in values.yaml and template files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tools/deployment/package-helm/values.yaml`:
- Line 166: The inline comment on the buffer_flush_threshold setting currently
has only one space before the comment; update the line defining
buffer_flush_threshold so there are exactly two spaces before the inline comment
(e.g., ensure "buffer_flush_threshold: 268435456  # 256 MiB") to satisfy
linting; edit the value line in values.yaml where buffer_flush_threshold is
declared.

Comment thread tools/deployment/package-helm/values.yaml Outdated

@hoophalab hoophalab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One comment

Validation: log ingestor starts correctly

Comment on lines +160 to +170
log_ingestor:
port: 30302
# The timeout (in seconds) after which the log buffer is flushed for compression if no new input
# arrives.
buffer_flush_timeout: 300
# The log buffer size (in bytes) that triggers a flush for compression.
buffer_flush_threshold: 268435456 # 256 MiB
# The capacity of the internal channel used for communication between an ingestion job and the
# log buffer.
channel_capacity: 10
logging_level: "INFO"

@hoophalab hoophalab Jan 28, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting up log_ingestor and setting log_inputs as s3 might cause confusion. Maybe we should still keep those values as commented?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to change the default config of the log ingestor in the Helm chart, it might be better to change the config in the package (docker compose flow) as well.

i think we can add some log message in the Helm chart with some printf template function. do you think it will help with the confusion that a user has configured the log ingestor but didn't end up seeing any log ingestor deployment?

either way, i think we can merge this PR first and add more such log messages in a future PR. @coderabbitai please create a feature request issue about enhancing the logging in the Helm chart with printf and fail template functions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@junhaoliao, I'll create a feature request issue for enhancing the Helm chart logging.


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for Helm chart logging enhancement
gh issue create \
  --repo y-scope/clp \
  --title "Enhance Helm chart with template logging using printf and fail functions" \
  --body "## Description

Add template logging and validation messages to the Helm chart using \`printf\` and \`fail\` template functions to provide better user feedback and prevent configuration confusion.

## Context

Currently, when a user configures \`log_ingestor\` in the Helm chart but sets \`logs_input.type\` to something other than \"s3\", no log-ingestor deployment is created. This can cause confusion as users may expect to see the log-ingestor pods running.

## Proposed Enhancement

Add informative messages using Helm template functions:
- Use \`printf\` to log warnings or informational messages when certain configurations are set but their corresponding resources won't be deployed
- Use \`fail\` to validate incompatible configurations and provide clear error messages

Example scenarios:
- Warn when \`log_ingestor\` is configured but \`logs_input.type\` is not \"s3\"
- Validate other configuration dependencies and provide helpful error messages

## Related Discussion

- PR: https://github.com/y-scope/clp/pull/1913
- Comment: https://github.com/y-scope/clp/pull/1913#discussion_r2739010265
- Requested by: `@junhaoliao`

## Benefits

- Improved user experience with clearer feedback
- Reduced confusion about why certain resources aren't deployed
- Better configuration validation at deployment time" \
  --assignee junhaoliao

Length of output: 42


@junhaoliao, I've created issue #1941 to track the enhancement of Helm chart logging with printf and fail template functions. The issue includes the context from this discussion and has been assigned to you.


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1856
File: tools/deployment/package-helm/values.yaml:94-94
Timestamp: 2026-01-26T22:19:16.092Z
Learning: In the Helm chart's `values.yaml`, the default configuration has `log_ingestor: null`, which causes the v0.8.0 chart to deploy an API server that continuously crashes and restarts due to Rust parsing issues.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1818
File: tools/deployment/package-helm/templates/configmap.yaml:12-12
Timestamp: 2025-12-19T05:03:42.629Z
Learning: In the y-scope/clp Helm chart, the `api_server.host` configuration field in the ConfigMap (tools/deployment/package-helm/templates/configmap.yaml) is unused because no other k8s internal services need to reach the API server—it's only accessed from outside the cluster via NodePort.

@junhaoliao junhaoliao merged commit e8aa09c into y-scope:main Jan 29, 2026
21 checks passed
@junhaoliao junhaoliao deleted the helm-enable-log-ingestor branch May 7, 2026 19:46
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…e `log_ingestor` templates to S3 `logs_input` only. (y-scope#1913)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants