-
Notifications
You must be signed in to change notification settings - Fork 371
feat: Add SSL/TLS Configuration Support to ModelConfig CRD #1059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add SSL/TLS Configuration Support to ModelConfig CRD #1059
Conversation
4f19ca1 to
d912d76
Compare
|
Note, I built this so that we could get kagent to work so that we could analyze whether it fit our requirements and demo it to our architect here at Ancestry. I have tested all possible TLS configurations manually, in addition to the tests included in this PR. |
d912d76 to
b780998
Compare
Add comprehensive SSL/TLS configuration capabilities to Kagent's ModelConfig custom resource, enabling agents to securely connect to internal LiteLLM gateways and model providers that use self-signed certificates or custom certificate authorities. This is a production-ready, Kubernetes-native implementation that follows security best practices and maintains full backward compatibility with existing ModelConfig resources. Changes by Component: Go Backend (Kubernetes CRD & Controller): - Added TLSConfig struct to v1alpha1 and v1alpha2 CRD schemas - Implemented controller logic to mount CA certificates as volumes - Extended HTTP API to include TLS configuration in responses - Added comprehensive validation tests and controller mounting tests Python Runtime (kagent-adk): - Created SSL utilities module with create_ssl_context() supporting 3 modes - Extended OpenAI and AzureOpenAI clients with TLS configuration support - Added type-safe TLS fields to model configuration classes - Comprehensive test coverage with 33 test functions and test fixtures Key Features: 1. Kubernetes-native design using Secrets and volume mounts 2. Three TLS modes: disabled, custom CA only, system + custom CA 3. Security-focused with validation, warnings, and RBAC docs 4. Production-ready with error handling and extensive testing 5. Fully backward compatible (no breaking changes) Documentation: - User guide: docs/user-guide/modelconfig-tls.md - RBAC guide: docs/user-guide/tls-rbac.md - Troubleshooting: docs/troubleshooting/ssl-errors.md - Examples: examples/modelconfig-with-tls.yaml All tests pass (14 Go tests, 33 Python tests with ~62 test cases). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Collin Walker <cwalker@ancestry.com>
b780998 to
3134db4
Compare
That's awesome, I'll take a look!! Just FYI at a brief glance, I saw you added docs. The docs for kagent are actually located at https://github.com/kagent-dev/website, could you move those there? |
EItanya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is looking great overall, I have some pretty meaty but not foundational comments which I would love addressed before continuing the review.
| verify_disabled: bool, | ||
| ca_cert_path: str | None, | ||
| use_system_cas: bool, | ||
| ) -> ssl.SSLContext | bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be ssl.SSLContext | None? The value is always false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1059 (comment)
|
@EItanya these are great points. I only have a follow up to 1 question before I implement these fixes |
0a0306d to
6178784
Compare
inFocus7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall feature looks good! I had some feedback, primarily related to tests so they're nothing major. They are also non-blocking, so feel free to resolve as you please 👍🏼
I haven't had a chance to test this out locally, but I'm planning on trying it out tomorrow morning (EST)
python/packages/kagent-adk/tests/unittests/models/test_tls_e2e.py
Outdated
Show resolved
Hide resolved
python/packages/kagent-adk/tests/unittests/models/test_tls_integration.py
Outdated
Show resolved
Hide resolved
python/packages/kagent-adk/tests/unittests/models/test_openai.py
Outdated
Show resolved
Hide resolved
Implements all 16 review comments from inFocus7's code review to improve code quality, test consistency, and validation reliability for the TLS configuration feature. Changes: 1. Fix CEL validation syntax (comment kagent-dev#16 - CRITICAL) - Replace != "" with size(field) > 0 for non-empty checks - Replace == "" with size(field) == 0 for empty checks - Fixes validation syntax errors that blocked CRD deployment 2. Remove task tracking comments (comments #1, kagent-dev#13, kagent-dev#14, kagent-dev#15) - Remove "(Task X.Y)" references from test docstrings - Remove obsolete implementation notes about env vars vs agent config - Remove test_openai_client_tls_parameters_override_environment (obsolete) 3. Fix copyright headers (comment #3) - Replace incorrect "Google LLC" copyright with Kagent project copyright - Apply consistent headers across test_ssl.py, test_tls_e2e.py, test_tls_integration.py 4. Migrate Go tests to testify (comments kagent-dev#5, kagent-dev#6) - Add testify/assert and testify/require imports - Replace manual error checks with testify assertions - Add envVarToMapHelper() for O(n) environment variable validation 5. Add golden tests for TLS scenarios (comment kagent-dev#12) - Create tls-with-custom-ca.yaml input - Create tls-with-disabled-verify.yaml input - Create tls-with-system-cas-disabled.yaml input - Generate golden outputs to catch TLS mounting regressions 6. Improve Python test quality (comments #2, kagent-dev#4, kagent-dev#9) - Remove redundant test case from test_ssl.py - Add test_e2e_openai_client_fails_without_custom_ca (negative test) - Simplify E2E_TEST_SUMMARY.md (72% reduction, remove task references) 7. Use OpenAI SDK's DefaultAsyncHttpxClient (comments kagent-dev#7, kagent-dev#8) - Replace custom httpx.AsyncClient with DefaultAsyncHttpxClient - Preserves OpenAI SDK defaults for timeout, pooling, and redirects - Add tests to verify SDK defaults are maintained 8. Fix documentation links (comment kagent-dev#10) - Update broken troubleshooting links to https://kagent.dev/docs 9. Document future enhancement (comment kagent-dev#11) - Created GitHub issue kagent-dev#1091 for automatic agent redeployment on secret changes Test results: - All Go tests pass (11 golden tests including 3 new TLS scenarios) - All Python tests pass (15 tests including 2 new tests) - CRD validation working correctly with proper error messages 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implements all 16 review comments from inFocus7's code review to improve code quality, test consistency, and validation reliability for the TLS configuration feature. Changes: 1. Fix CEL validation syntax (comment kagent-dev#16 - CRITICAL) - Replace != "" with size(field) > 0 for non-empty checks - Replace == "" with size(field) == 0 for empty checks - Fixes validation syntax errors that blocked CRD deployment 2. Remove task tracking comments (comments #1, kagent-dev#13, kagent-dev#14, kagent-dev#15) - Remove "(Task X.Y)" references from test docstrings - Remove obsolete implementation notes about env vars vs agent config - Remove test_openai_client_tls_parameters_override_environment (obsolete) 3. Fix copyright headers (comment #3) - Replace incorrect "Google LLC" copyright with Kagent project copyright - Apply consistent headers across test_ssl.py, test_tls_e2e.py, test_tls_integration.py 4. Migrate Go tests to testify (comments kagent-dev#5, kagent-dev#6) - Add testify/assert and testify/require imports - Replace manual error checks with testify assertions - Add envVarToMapHelper() for O(n) environment variable validation 5. Add golden tests for TLS scenarios (comment kagent-dev#12) - Create tls-with-custom-ca.yaml input - Create tls-with-disabled-verify.yaml input - Create tls-with-system-cas-disabled.yaml input - Generate golden outputs to catch TLS mounting regressions 6. Improve Python test quality (comments #2, kagent-dev#4, kagent-dev#9) - Remove redundant test case from test_ssl.py - Add test_e2e_openai_client_fails_without_custom_ca (negative test) - Simplify E2E_TEST_SUMMARY.md (72% reduction, remove task references) 7. Use OpenAI SDK's DefaultAsyncHttpxClient (comments kagent-dev#7, kagent-dev#8) - Replace custom httpx.AsyncClient with DefaultAsyncHttpxClient - Preserves OpenAI SDK defaults for timeout, pooling, and redirects - Add tests to verify SDK defaults are maintained 8. Fix documentation links (comment kagent-dev#10) - Update broken troubleshooting links to https://kagent.dev/docs 9. Document future enhancement (comment kagent-dev#11) - Created GitHub issue kagent-dev#1091 for automatic agent redeployment on secret changes Test results: - All Go tests pass (11 golden tests including 3 new TLS scenarios) - All Python tests pass (15 tests including 2 new tests) - CRD validation working correctly with proper error messages
934fda4 to
8ff7c6d
Compare
Signed-off-by: Collin Walker <cwalker@ancestry.com>
8ff7c6d to
9b7fddc
Compare
Signed-off-by: Collin Walker <cwalker@ancestry.com>
f5b4990 to
6178784
Compare
Signed-off-by: Collin Walker <cwalker@ancestry.com>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
|
Heyyo @lets-call-n-walk! Another ping in case it's been a busy week for you, I was wondering if you'll have time to carry this over the finish line (going through feedback). If not, no biggie and I will carry this forward on Monday! In the meantime I have created this branch/PR that branched off your work, in addition to resolving:
If you can carry this over, I'm hoping it will be easy to cherry-pick my commits 👍🏼 If not, I'll look into cherry picking those changes into this branch if possible. |
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
…t failures Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
… target to create certs for tests as-needed Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
|
@inFocus7 Hey, thanks for looking at this! Kubecon and my team had kept me pretty busy this last week. I had discovered some of this and had the fixes locally, but was unable to rebuild to test one last thing before pushing due to a combination of the bad internet at kubecon and the finnickyness of my corporate VPN. Anyways, I will look into cherry picking those commits. Thanks! |
|
@lets-call-n-walk No worries, thanks for the update! Hope Kubecon was a blast (if you attended)! I haven't done a final validation of the changes in my branch/PR, but I'll hold off verifying until after you introduce the changes here 🫡. There were only two minor logic changes (it was primarily test updates for CI,) so I'm expecting it to continue working smoothly. |
78bad63 to
ec01cf8
Compare
|
@inFocus7 How does this look now? I just merged your pr into this one. |
|
thanks peter! I have a PR up to clean up the CI env which saves ~19GB. Once that merges to update after this pr merges lastest main updates, the test-e2e workflow issue should be resolved |
|
Hey @lets-call-n-walk! I was wondering if you'd have some time to merge in the changes from main 👀 After |
|
@inFocus7 Looks like all the checks passed. Thanks! |
yuval-k
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Thank you very much for the contribution!
Summary
This PR adds comprehensive SSL/TLS configuration support to Kagent's ModelConfig CRD, enabling agents to securely connect to internal LiteLLM gateways and model providers that use self-signed certificates or custom certificate authorities.
Note: TLS configuration is currently only implemented for OpenAI-compatible model types (OpenAI and AzureOpenAI providers). This design specifically targets internal LiteLLM gateway deployments. The field structure is intentionally generic to facilitate future implementations for other model types that require custom certificate handling.
This is a production-ready, Kubernetes-native implementation that follows security best practices and maintains full backward compatibility with existing ModelConfig resources.
Problem Statement
Organizations running Kagent often need to connect agents to:
Previously, there was no way to configure custom CA certificates or disable SSL verification for these scenarios, forcing users to:
Solution
This PR introduces a new
tlsfield in the ModelConfig spec that supports three modes:1. Disabled Verification (Development/Testing Only)
Disables SSL verification entirely. Includes security warnings in logs.
2. Custom CA Only
Trust only the specified CA certificate from a Kubernetes Secret.
3. System + Custom CA (Recommended)
Trust both system CAs (for public services) and custom CAs (for internal services). This is the recommended approach for hybrid environments.
Changes Made
Go Backend (Kubernetes CRD & Controller)
CRD Schema (v1alpha2 only)
TLSConfigstruct with four fields:disableVerify(bool): Disable SSL verification (default: false)caCertSecretRef(string): Reference to Secret containing CA certcaCertSecretKey(string): Key within Secret (default: "ca.crt")disableSystemCAs(bool): When true, only trust custom CAs (default: false)false= safe/secure behaviorFiles changed:
go/api/v1alpha2/modelconfig_types.gogo/config/crd/bases/kagent.dev_modelconfigs.yamlKubernetes Controller
/config/config.jsoninstead of environment variablesaddTLSConfiguration()function to mount TLS certificates/etc/ssl/certs/custom/tls_disable_verify,tls_ca_cert_path,tls_disable_system_cas0444Files changed:
go/internal/controller/translator/agent/adk_api_translator.gogo/internal/adk/types.goTest Coverage (7 test functions)
Test files:
go/internal/controller/translator/agent/tls_mounting_test.goPython Runtime (kagent-adk)
SSL Utilities Module
_ssl.pywithcreate_ssl_context()functionFalse, logs security warnings)SSLContext)File:
python/packages/kagent-adk/src/kagent/adk/models/_ssl.pyOpenAI SDK Integration (OpenAI/AzureOpenAI Only)
BaseOpenAIandAzureOpenAIclasses with TLS fields:tls_disable_verify,tls_ca_cert_path,tls_disable_system_cas_get_tls_config()to read from agent config_create_http_client()to build customhttpx.AsyncClientwith SSL contextAsyncOpenAIandAsyncAzureOpenAIuse custom http_client when TLS configuredFiles changed:
python/packages/kagent-adk/src/kagent/adk/models/_openai.pyType System
BaseLLM(available to all model types for future extensibility)OpenAIandAzureOpenAIPydantic modelsAgentConfig.to_agent()to propagate TLS config to model instancesFiles changed:
python/packages/kagent-adk/src/kagent/adk/types.pyTest Coverage (26 tests passing)
test_ssl.py: SSL context creation, certificate loading, error handlingtest_openai.py: OpenAI client instantiation with TLStest_tls_integration.py: End-to-end OpenAI/Azure integrationtest_tls_e2e.py: Full workflow with mock HTTPS serversTest files:
python/packages/kagent-adk/tests/unittests/models/test_ssl.pypython/packages/kagent-adk/tests/unittests/models/test_openai.pypython/packages/kagent-adk/tests/unittests/models/test_tls_integration.pypython/packages/kagent-adk/tests/unittests/models/test_tls_e2e.pypython/packages/kagent-adk/tests/fixtures/certs/Examples
YAML Examples (
examples/modelconfig-with-tls.yaml):provider: OpenAIrequirementKey Features
1. Kubernetes-Native Design
2. Security-Focused
0444)3. Production-Ready
4. Developer-Friendly
Provider Support
Currently Supported:
Not Yet Supported:
The TLS configuration fields are defined in
BaseLLMto facilitate future implementations, but only OpenAI and AzureOpenAI model types currently use them. If custom certificate handling is needed for other providers, implementations can reuse the same field structure.Testing
All tests pass:
Run tests:
Usage Example
1. Create a Secret with your CA certificate:
2. Create a ModelConfig with TLS configuration:
3. Use the ModelConfig in your Agent:
The agent will now be able to connect to the internal LiteLLM gateway using the custom CA certificate!
Breaking Changes
None. This is a purely additive feature.
tlsfield continue to work unchangedMigration
No migration required. The
tlsfield is optional with safe defaults:disableVerifydefaults tofalse(verification enabled - secure)disableSystemCAsdefaults tofalse(trust system CAs - safe)tlsconfiguration use standard SSL verificationSecurity Considerations
Best Practices
disableVerify: trueonly for development/testingdisableSystemCAs: false- Recommended (default) to maintain trust in public CAsSecurity Features
false= secure behaviorField Naming Rationale
All boolean fields follow the falsey-by-default pattern:
disableVerify: false= verification enabled (secure) ✅disableSystemCAs: false= system CAs enabled (safe) ✅This ensures that omitting fields or using default values results in the most secure configuration.
Review Checklist
Next Steps
After this PR is merged:
kubectl apply -f go/config/crd/bases/)