feat(clp-rust-utils): Define all CLP package config required by log-ingestor; Define base config for log-ingestor.#1729
Conversation
WalkthroughAdds log ingestion and archive output configuration types, extends Config with those fields, enables serde deserialization for AWS auth structs, and introduces a CompressionJobStatus enum plus job_config module re-exports. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
components/clp-rust-utils/src/clp_config/package/config.rs(4 hunks)components/clp-rust-utils/src/clp_config/s3_config.rs(2 hunks)components/clp-rust-utils/src/job_config.rs(1 hunks)components/clp-rust-utils/src/job_config/compression.rs(1 hunks)components/clp-rust-utils/src/job_config/ingestion.rs(1 hunks)components/log-ingestor/Cargo.toml(1 hunks)components/log-ingestor/src/ingestion_job.rs(1 hunks)components/log-ingestor/src/ingestion_job/s3_scanner.rs(4 hunks)components/log-ingestor/src/ingestion_job/sqs_listener.rs(4 hunks)components/log-ingestor/src/ingestion_job_manager.rs(1 hunks)components/log-ingestor/src/lib.rs(1 hunks)components/log-ingestor/tests/test_ingestion_job.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/clp-rust-utils/src/clp_config/package/config.rs
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Applied to files:
components/log-ingestor/Cargo.toml
🧬 Code graph analysis (2)
components/log-ingestor/src/ingestion_job.rs (2)
components/log-ingestor/src/ingestion_job/s3_scanner.rs (2)
shutdown_and_join(184-187)get_id(193-195)components/log-ingestor/src/ingestion_job/sqs_listener.rs (2)
shutdown_and_join(219-222)get_id(228-230)
components/log-ingestor/src/ingestion_job_manager.rs (3)
components/log-ingestor/src/ingestion_job.rs (2)
from(47-49)from(53-55)components/log-ingestor/src/ingestion_job/s3_scanner.rs (1)
spawn(152-171)components/log-ingestor/src/ingestion_job/sqs_listener.rs (1)
spawn(187-206)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: build (macos-15)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: rust-checks
🔇 Additional comments (32)
components/log-ingestor/src/lib.rs (1)
4-4: LGTM!The new module exposure properly expands the public API surface for ingestion job management.
components/clp-rust-utils/src/job_config.rs (1)
4-10: LGTM!The new module organization and re-exports follow the established pattern and properly expand the job_config API surface for compression and ingestion types.
components/log-ingestor/src/ingestion_job.rs (1)
8-56: LGTM!The
IngestionJobenum provides a clean abstraction overS3ScannerandSqsListenerwith proper delegation, ergonomicFromconversions, and thorough documentation. The implementation correctly forwards lifecycle methods to the underlying variants.components/log-ingestor/src/ingestion_job/sqs_listener.rs (2)
14-22: LGTM!Consolidating the S3-related configuration into
S3IngestionBaseConfigreduces duplication betweenSqsListenerConfigandS3ScannerConfigwhile maintaining clear separation of concerns.
144-163: LGTM!All references to bucket name and key prefix have been correctly updated to access the fields via
base.bucket_nameandbase.key_prefix.components/log-ingestor/src/ingestion_job/s3_scanner.rs (2)
12-18: LGTM!Consistent with the refactoring in
SqsListenerConfig, this consolidation reduces configuration duplication.
92-120: LGTM!All references have been correctly updated to use
base.bucket_nameandbase.key_prefix.components/log-ingestor/Cargo.toml (1)
22-22: Thethiserror = "2.0.17"version is valid and current.Version 2.0.17 is the latest stable release as of December 2025. Verify that
Cargo.lockremains in sync withCargo.tomlusing a non-mutating check (e.g.,cargo check --locked) to maintain deterministic builds, rather than allowing dependency updates during validation.components/log-ingestor/tests/test_ingestion_job.rs (3)
7-7: LGTM!Import correctly updated to include
S3IngestionBaseConfigfrom thejob_configmodule.
166-185: LGTM!The
SqsListenerConfigcorrectly uses the new consolidatedS3IngestionBaseConfigstructure. All required fields are properly initialized, and the optional fields are appropriately set toNoneor default values for testing purposes.
252-264: LGTM!The
S3ScannerConfigcorrectly uses the consolidatedS3IngestionBaseConfigstructure, consistent with theSqsListenerConfigtest above.components/clp-rust-utils/src/clp_config/s3_config.rs (6)
1-2: LGTM!Good use of the
secrecycrate for handling sensitive credentials.
13-19: LGTM!Adding
DeserializetoAwsAuthenticationenables proper configuration deserialization while maintaining the existing serialization capability.
21-26: Good security practice usingSecretStringfor credentials.This ensures the secret access key is automatically redacted in debug output and requires explicit exposure when needed.
28-37: LGTM!Custom
PartialEqandEqimplementations are necessary sinceSecretStringdoesn't implement these traits by default. The implementation correctly exposes the secret only for comparison purposes.
39-49: Review the security implications of exposing secrets during serialization.The custom
Serializeimplementation exposes the secret access key in the serialized output. While this may be intentional for configuration persistence, ensure that:
- Serialized output is never logged or stored in plaintext.
- This aligns with your security requirements for credential handling.
If serialization of credentials to external systems is not required, consider removing
SerializefromAwsCredentialsto prevent accidental secret exposure.
51-79: LGTM!Test correctly validates the custom
Serializeimplementation forAwsCredentials.components/clp-rust-utils/src/clp_config/package/config.rs (6)
3-3: LGTM!Public re-export of
AwsAuthenticationmakes it accessible through this module's API.
14-42: LGTM!The
Configstruct is properly extended with the new ingestion-related fields, and theDefaultimplementation correctly initializes them.
192-212: LGTM!
LogIngestorstruct has sensible default values. The buffer size threshold of 50MB and channel capacity of 10 appear reasonable for ingestion workloads.
242-264: LGTM!
S3IngestionandFsIngestionstructs are well-defined. The default directory of"/"forFsIngestionaligns with the note about syncing with Python defaults.
266-281: LGTM!The
LogsInputenum correctly uses serde's tagged enum with#[serde(flatten)]for a cleaner JSON structure, consistent withStreamOutputStoragein this file.
283-342: LGTM!Tests provide good coverage for both
LogsInputvariants, validating the JSON deserialization paths for S3 and filesystem configurations.components/log-ingestor/src/ingestion_job_manager.rs (9)
19-30: LGTM!Error enum is well-defined with clear, descriptive variants using
thiserrorfor automatic error message formatting.
41-63: LGTM!The method correctly extracts the base config, creates the appropriate client manager, and delegates to
create_s3_ingestion_jobwith a closure for job creation.
65-91: LGTM!Consistent with
create_s3_scanner_job, this method properly handles SQS listener creation while reusing the common job registration logic.
106-119: Consider error handling for partial shutdown failures.The job is removed from the table before
shutdown_and_joinis called. Ifshutdown_and_joinfails on either the ingestion job or listener, the job entry is already removed, potentially leaving resources in an inconsistent state.Depending on your requirements:
- If removal-before-shutdown is intentional (e.g., to prevent double-removal attempts), document this behavior.
- If transactional semantics are needed, consider removing only after successful shutdown.
147-205: LGTM!The prefix conflict detection logic is correct:
- Conflicts are detected when region, bucket, and dataset all match AND the prefixes overlap.
- The UUID generation loop is safe in practice (UUID v4 collision is astronomically unlikely).
- Lock management is appropriate since client creation happens before this method is called.
207-235: Hardcoded AWS endpoint format may limit flexibility.The endpoint URLs are hardcoded to
https://s3.{region}.amazonaws.comandhttps://sqs.{region}.amazonaws.com. This works for standard AWS commercial regions but won't support:
- LocalStack or other mock services for testing
- AWS GovCloud or China regions (different endpoint formats)
- Custom VPC endpoints
Consider making the endpoint configurable or deriving it from the AWS SDK's region resolver.
237-243: LGTM!The unused
_ingestion_job_configparameter is intentional per the comment at line 187-188, which notes that the listener is designed to be shared among multiple ingestion jobs in the future.
246-254: LGTM!
IngestionJobTableEntrycorrectly captures all metadata needed for conflict detection and job lifecycle management.
266-272: LGTM!The
is_mutually_prefix_freefunction correctly handles all edge cases:
- Empty strings (correctly returns
falseas empty string is a prefix of everything)- Equal strings (correctly returns
falseas they conflict)- Disjoint prefixes (correctly returns
true)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
214-240: Missing#[serde(default)]attribute onArchiveOutput.Unlike
LogIngestor,Database, and other config structs in this file,ArchiveOutputdoesn't have#[serde(default)]. This means deserialization will fail if any field is missing in the input configuration.If partial configuration with defaults is intended, add the attribute:
/// Mirror of `clp_py_utils.clp_config.ArchiveOutput`. /// /// # NOTE /// /// * This type is partially defined: unused fields are omitted and discarded through /// deserialization. /// * The default values must be kept in sync with the Python definition. #[derive(Clone, Debug, Deserialize, Eq, PartialEq)] +#[serde(default)] pub struct ArchiveOutput {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/clp-rust-utils/src/clp_config/package/config.rs(4 hunks)components/clp-rust-utils/src/clp_config/s3_config.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/clp-rust-utils/src/clp_config/package/config.rs
🧬 Code graph analysis (1)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (3)
AwsAuthentication(547-577)StreamOutput(709-717)ArchiveOutput(693-706)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: build (macos-15)
- GitHub Check: rust-checks
🔇 Additional comments (6)
components/clp-rust-utils/src/clp_config/s3_config.rs (1)
1-1: LGTM! Deserialize support added for AWS authentication types.The addition of
DeserializetoAwsAuthenticationandAwsCredentialsproperly enables deserialization of these types for the new log ingestion configuration.Also applies to: 13-13, 21-21
components/clp-rust-utils/src/clp_config/package/config.rs (5)
3-3: LGTM! Config structure properly extended with new fields.The new
log_ingestor,logs_input, andarchive_outputfields are correctly added toConfigwith appropriate default initializations.Also applies to: 19-19, 22-23, 33-33, 36-39
192-212: LGTM! LogIngestor configuration properly defined.The struct follows the established pattern with
#[serde(default)]and aDefaultimplementation. The default values are sensible for a log ingestion service.
266-281: LGTM! LogsInput enum properly defined with tagged deserialization.The enum structure correctly uses
#[serde(tag = "type")]for tagged deserialization and#[serde(flatten)]to inline configuration fields. This pattern is consistent withStreamOutputStoragein the same file.
283-338: LGTM! Comprehensive tests for LogsInput deserialization.The tests adequately cover both
S3andFsvariants ofLogsInput, verifying that the tagged deserialization with flattened configuration works correctly. The assertions properly validate the deserialized values.
242-246: Verify AWS authentication type compatibility with Python definition.The Rust
AwsAuthenticationimplementation should be confirmed to handle all authentication types that the Pythonclp_py_utils.clp_config.AwsAuthenticationcan serialize. If Python supports authentication types beyondcredentials(such asprofile,ec2, orenv_vars), the Rust deserialization will fail if these types are sent from Python. Verify the enum variants in the Rust implementation and test the actual serialization/deserialization flow end-to-end.
hoophalab
left a comment
There was a problem hiding this comment.
lgtm. one minor comment
how about the following title
feat(clp-rust-utils): Define all CLP package config required by log-ingestor; Add config for log-ingestor.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
249-264: Revisit/as the defaultFsIngestiondirectory; it is overly broad and risky.Using the filesystem root (
"/") as the defaultFsIngestion::directoryeffectively makes the defaultlogs_inputinConfig::default()point at the entire filesystem, which is a substantial risk surface (unexpected data ingestion, permission issues, and potential leakage). A more constrained default (for example a log‑specific directory) or forcing explicit configuration viaOption<String>would be safer and more predictable.Since this was already raised earlier, consider either:
- Changing the default to a log‑scoped path consistent with the rest of the config, or
- Making
directorymandatory (noDefaultforFsIngestion) so callers must opt into a path consciously.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-rust-utils/src/clp_config/package/config.rs(4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/clp-rust-utils/src/clp_config/package/config.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: rust-checks
- GitHub Check: build (macos-15)
- GitHub Check: build (ubuntu-24.04)
🔇 Additional comments (4)
components/clp-rust-utils/src/clp_config/package/config.rs (4)
3-4: AwsAuthentication import is consistent with new S3Ingestion usage.Pulling
AwsAuthenticationfromcrate::clp_configmatches the JSON shape used in the tests and keeps the S3-specific config centralised; no issues here.
19-24: Confirm newConfigdefaults match PythonClpConfigsemantics.The additions of
log_ingestor,logs_input(defaulting toFsIngestion::default()), andarchive_outputintoConfig::default()look structurally sound and will make these blocks optional when deserialisingConfig. Please double‑check that:
- The default host/port for
log_ingestor.- The choice of
LogsInput::Fsas the implicit default.- The archive sizing and compression defaults
all match the corresponding Python
clp_py_utils.clp_config.ClpConfigdefaults, so cross‑language config behaviour stays in sync.Also applies to: 33-40
214-241:ArchiveOutputdefault and serde configuration now look correct.Adding
#[serde(default)]plus the explicitDefaultimplementation bringsArchiveOutputin line with other nested config structs and allows partially specified archive configs to deserialize cleanly while filling in sensible defaults. No further issues from my side.
243-247: S3Ingestion keeps S3 concerns focused on authentication.Having
S3Ingestiononly carryaws_authenticationand leaving other S3 details to higher‑level config mirrors a clean separation of concerns, especially withAwsAuthenticationreused fromcrate::clp_config. The struct is minimal but coherent given the test coverage for the credentials path.
| #[derive(Clone, Debug, Deserialize, Eq, PartialEq)] | ||
| #[serde(default)] | ||
| pub struct LogIngestor { | ||
| pub host: String, | ||
| pub port: u16, | ||
| pub buffer_timeout_sec: u64, | ||
| pub buffer_size_threshold: u64, | ||
| pub channel_capacity: usize, | ||
| } | ||
|
|
||
| impl Default for LogIngestor { | ||
| fn default() -> Self { | ||
| Self { | ||
| host: "localhost".to_owned(), | ||
| port: 3333, | ||
| buffer_timeout_sec: 300, | ||
| buffer_size_threshold: 50 * 1024 * 1024, | ||
| channel_capacity: 10, | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Clarify whether LogIngestor mirrors a Python config or is Rust‑only.
Other structs in this file explicitly state when they mirror Python config types, while ApiServer is documented as Rust‑native. For LogIngestor, it would help future maintainers if you add a short doc comment making clear whether it mirrors a Python definition (and should stay in sync) or is a Rust‑only config surface, similar to ApiServer. Based on learnings, this distinction has mattered before.
🤖 Prompt for AI Agents
In components/clp-rust-utils/src/clp_config/package/config.rs around lines 192
to 212, add a short doc comment above the LogIngestor struct that explicitly
states whether this config mirrors a Python counterpart (and must be kept in
sync) or is Rust-native only (like ApiServer); follow the same wording and style
used for other structs in this file (e.g., mention "Mirrors Python config:
<path>" if it mirrors, or "Rust-native config, not present in Python" if not),
and place the comment immediately above the #[derive(...)] so future maintainers
can quickly see the intent.
| /// Mirror of `clp_py_utils.clp_config.ClpConfig.logs_input`. | ||
| #[derive(Debug, Clone, PartialEq, Eq, Deserialize)] | ||
| #[serde(tag = "type")] | ||
| pub enum LogsInput { | ||
| #[serde(rename = "fs")] | ||
| Fs { | ||
| #[serde(flatten)] | ||
| config: FsIngestion, | ||
| }, | ||
|
|
||
| #[serde(rename = "s3")] | ||
| S3 { | ||
| #[serde(flatten)] | ||
| config: S3Ingestion, | ||
| }, | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Serde setup for LogsInput is sound; consider a Default impl as a convenience.
The #[serde(tag = "type")] plus #[serde(flatten)] pattern cleanly matches the JSON in the tests for both fs and s3 variants. Given that Config::default() already chooses LogsInput::Fs { config: FsIngestion::default() }, you may want to add:
impl Default for LogsInput {
fn default() -> Self {
LogsInput::Fs {
config: FsIngestion::default(),
}
}
}This would make LogsInput easier to use standalone and documents the intended default more explicitly, but it’s not strictly required.
🤖 Prompt for AI Agents
In components/clp-rust-utils/src/clp_config/package/config.rs around lines 267
to 282, add a Default impl for LogsInput so it returns LogsInput::Fs { config:
FsIngestion::default() }; implement this by adding an impl Default for LogsInput
with fn default() -> Self { LogsInput::Fs { config: FsIngestion::default() } }
to make LogsInput usable standalone and align with Config::default(); ensure
FsIngestion implements Default (or adjust accordingly).
| #[test] | ||
| fn deserialize_logs_input_s3_config() { | ||
| const ACCESS_KEY_ID: &str = "YSCOPE"; | ||
| const SECRET_ACCESS_KEY: &str = "IamSecret"; | ||
| let logs_input_config_json = serde_json::json!({ | ||
| "type": "s3", | ||
| "aws_authentication": { | ||
| "type": "credentials", | ||
| "credentials": { | ||
| "access_key_id": ACCESS_KEY_ID, | ||
| "secret_access_key": SECRET_ACCESS_KEY, | ||
| } | ||
| } | ||
| }); | ||
|
|
||
| let deserialized = | ||
| serde_json::from_str::<LogsInput>(logs_input_config_json.to_string().as_str()) | ||
| .expect("failed to deserialize `LogsInput` from JSON"); | ||
|
|
||
| match deserialized { | ||
| LogsInput::S3 { config } => match config.aws_authentication { | ||
| crate::clp_config::AwsAuthentication::Credentials { credentials } => { | ||
| assert_eq!(credentials.access_key_id, ACCESS_KEY_ID); | ||
| assert_eq!(credentials.secret_access_key, SECRET_ACCESS_KEY); | ||
| } | ||
| }, | ||
| LogsInput::Fs { .. } => panic!("Expected S3"), | ||
| } | ||
| } | ||
|
|
||
| #[test] | ||
| fn deserialize_logs_input_fs_config() { | ||
| const DIRECTORY: &str = "/var/logs"; | ||
|
|
||
| let logs_input_config_json = serde_json::json!({ | ||
| "type": "fs", | ||
| "directory": DIRECTORY, | ||
| }); | ||
|
|
||
| let deserialized = | ||
| serde_json::from_str::<LogsInput>(logs_input_config_json.to_string().as_str()) | ||
| .expect("failed to deserialize `LogsInput` from JSON"); | ||
|
|
||
| match deserialized { | ||
| LogsInput::Fs { config } => { | ||
| assert_eq!(config.directory, DIRECTORY); | ||
| } | ||
| LogsInput::S3 { .. } => panic!("Expected Fs"), | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Tests exercise both fs and s3 paths; minor ergonomic improvements possible.
The tests correctly validate deserialisation of LogsInput for both S3 credentials and fs directory cases, which gives good coverage for the new enum wiring. To simplify them slightly, you could avoid the stringify/parse round‑trip by using serde_json::from_value(logs_input_config_json) directly instead of to_string().as_str(), but that’s purely a style/ergonomics tweak.
🤖 Prompt for AI Agents
In components/clp-rust-utils/src/clp_config/package/config.rs around lines 289
to 338, the tests build JSON values then convert to string and parse them back;
simplify by calling serde_json::from_value(logs_input_config_json) directly
instead of serde_json::from_str(logs_input_config_json.to_string().as_str()),
i.e., pass the existing serde_json::Value into from_value to avoid the needless
stringify/parse round-trip while keeping behavior identical.
…ngestor; Define base config for log-ingestor. (y-scope#1729)
Description
NOTE: This PR depends on #1704.
This PR:
The use of these configs can be found in the dev branch LinZhihao-723#29.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.