Releases: kserve/kserve
Releases · kserve/kserve
v0.16.0
What's Changed
- chore: remove 'default' suffix compatibility by @sivanantha321 in #4178
- Upgrade Torch to v2.6.0 everywhere by @ashahba in #4450
- chore: drop pydantic v1 support by @sivanantha321 in #4353
- fix: Update TextIteratorStreamer to skip special tokens by @sivanantha321 in #4490
- Add Jooho to approvers in OWNERS file by @terrytangyuan in #4504
- Rename CRD file to reflect all KServe CRDs (Fixes #4396) by @WHITE-ICE-BOX in #4494
- Update kserve-resources helm chart to disable desired servingruntimes by @jmlaubach in #4485
- upgrade vllm to v0.9.0 and Torch to v2.7.0 by @ashahba in #4501
- Upgrade vLLM to v0.9.0.1 by @ashahba in #4507
- Initial segregation of the storage module from KServe SDK by @spolti in #4391
- Fix pss restricted warnings by @akagami-harsh in #4327
- Fix: do not update poetry dependency when install hf cpu deps by @yuzisun in #4516
- [Bug] Fixes error in trace logging by @gavrissh in #4514
- Stop and resume a model [Raw Deployment] by @hdefazio in #4455
- Resolve inference endpoint using runtime protocol when applicable by @israel-hdez in #4527
- fix(codegen): pins code-generator binaries version by @bartoszmajsak in #4533
- Allow to set custom timeouts for
InferenceGraphrouter by @lifo9 in #4218 - feat: support remote storage URI injection for serving runtimes by @de0725 in #4492
- [API] Define LLMInferenceService and LLMInferenceServiceConfig types and CRDs by @pierDipi in #4522
- Stop and Resume a transformer by @hdefazio in #4534
- Allow OCI for multi-node/multi-gpu by @israel-hdez in #4441
- 4380 - Inference logging to blob storage by @cjohannsen-cloudera in #4473
- Fix outdated BentoML import in sample code (BentoService no longer available in v1.x) by @YehCC52 in #4540
- Auto-update annotation for isvc. by @andresllh in #4342
- Stop and resume an explainer by @hdefazio in #4546
- fix: unset clampMax and clampMin, since they are not for replicas by @houshengbo in #4556
- refactor: Enhance HTTPRoute readiness checks by @sivanantha321 in #4543
- Refactor KServe to use global context for PredictorConfig by @sivanantha321 in #4526
- feat: refactor storage initializer resources configuration by @takamai06 in #4411
- feat(envtest): simplifies CRD lookup by @bartoszmajsak in #4564
- llmisvc: Initial controller scaffold and helm chart by @sivanantha321 in #4557
- Add logic to merge specs for LLMInferenceService by @VedantMahabaleshwarkar in #4563
- fix: Allow CA bundle path without config map by @fabiendupont in #4451
- docs: fixes invalid openshift subscription by @bartoszmajsak in #4572
- Add Code Coverage change report for PRs by @andyi2it in #4487
- feat: support secure access to prometheus in keda by @andyi2it in #4384
- feat: switch kserve from poetry to uv by @andyi2it in #4407
- chore(utils): simplifies code using generics by @bartoszmajsak in #4578
- Add AOT flashinfer build to huggingfaceserver dockerfile to precompil… by @AyushSawant18588 in #4567
- feat: improves collection helpers by @bartoszmajsak in #4579
- Add Missing config file for code coverage by @andyi2it in #4581
- Fix: Support multiple metrics in OpenTelemetryCollector for autoscaling by @houshengbo in #4591
- Upgrade vllm to v0.9.2 by @AyushSawant18588 in #4586
- Remove unused Strategy interface from sharding package by @houshengbo in #4590
- fix(test): uses local httptest server urls for downloader tests by @bartoszmajsak in #4601
- Fixed the issue of the same metrics across different deployments under different namespaces by @houshengbo in #4593
- feat: Add disable postprocessing option for raw logits by @sivanantha321 in #4566
- Stop and resume an inference graph by @hdefazio in #4588
- disallow
namefield in standard predictor by @HutakiHare in #4535 - fix: missing pytest-asyncio session scope marker in test_kserve_logger_cipn by @sivanantha321 in #4607
- update CRDs for LWS based multi node support by @VedantMahabaleshwarkar in #4596
- Add LLM InferenceService base configurations by @VedantMahabaleshwarkar in #4613
- fix: fixes misleading print for parallelism when $# > 2 (#784) by @bartoszmajsak in #4619
- Remove storage spec from LLMIsvc by @israel-hdez in #4622
- feat: speed up ci by @ls-2018 in #4600
- fix(deps): adds missing LLMInferenceService types for python imports by @bartoszmajsak in #4604
- Adding Llm-d scheduler reconcilier by @andresllh in #4614
- [Followup] Improve the stop/resume Inference Service tests and status by @hdefazio in #4636
- Added advanced config for ScaleUp and ScaleDown by @andyi2it in #4570
- add resource default for otel collector container limit in configmap by @andyi2it in #4633
- Protobuf version bump by @karolpustelnik in #4634
- Stop and resume an inference graph [Raw Deployment] by @hdefazio in #4637
- Fix typos: in the repo by @mwaykole in #4641
- llm-d workload reconciliation by @brettmthompson in #4616
- Llm d http route reconciler by @andresllh in #4617
- feat(webhook): introduces Validating Webhook for LLMInferenceServiceConfig by @VedantMahabaleshwarkar in #4630
- fix: Adds support for NVIDIA MIG GPU resource detection by @sivanantha321 in #4642
- feat(webhook): introduces Validating Webhook for LLMInferenceService by @VedantMahabaleshwarkar in #4631
- Add logic for LLMISVC controller by @VedantMahabaleshwarkar in #4632
- Fix quickstart link by @thesteve0 in #4654
- ci: skip workflows when only markdown files are changed by @Jooho in #4650
- refactor: 4602 - Refactor "RawDeployment" to "Standard" and "Serverless" to "KNative" to clarify usage by @cjohannsen-cloudera in #4608
- fix: README links to KServe website updated by @dominikkawka in #4656
- fix: allow digests in runtimes' images by @tmvfb in #4653
- [llmisvc] Improve config merge and update well-known presets by @pierDipi in #4663
- [llmisvc] Support cluster-scoped objects in generic CRUD functions by @pierDipi in #4664
- fix: fix snyk scan sarif file upload by @sivanantha321 in #4660
- fix: defaults GITHUB_SHA for graph images by @bartoszmajsak in #4620
- Promote new KServe Storage module by @spolti in #4625
- fix: escape HTML characters in api comments to fix syntax errors in website docs by @sivanantha321 in #4662
- Implement the progressive rollout for raw deployment by @houshengbo in #4623
- fix: dry run update for keda autoscaling loop by @andyi2it in #4587
- Fix HF Token Vulnerability in Storage Initializer Container by @brettmthompson in #4677
- docs: update Kafka sample path after file relocation by @1lyvianis in #4680
- Last Deployment Status Should Reflect Deployment Status by @HotsauceLee in #4667
- Fix: Update ModelCopies.TotalCopies for all model states by @hardik-menger in #4676
- Prepare for 0.16.0-rc0 release by @houshengbo in https://github.com/kserve/kserve/pull...
v0.16.0-rc1
What's Changed
- deprecate: remove EnableDirectPvcVolumeMount flag by @anurags25 in #4694
- Fix autoscaling tests duration and crashes by @andyi2it in #4688
- Avoid Pervasive Logging of SA Not Found Errors by Credential Builder by @brettmthompson in #4696
- fix: correct llmisvc Dockerfile reference in image publish workflow by @sivanantha321 in #4705
- llmisvc: fix RBAC, templating, and adds quick install script by @sivanantha321 in #4698
- Fixed the panic nil pointer issue componentExt being nil by @houshengbo in #4704
- fix: Add disk space cleanup step to Docker publish workflows by @sivanantha321 in #4717
- Add Star History section to README by @terrytangyuan in #4719
- docs: Mention CNCF in project README by @terrytangyuan in #4718
- Enabled the configuration options in Helm for opentelemetryCollector and autoscaler by @houshengbo in #4725
- Time Series Forecast API Endpoint by @jinan-zhou in #4615
- llmisvc dev & kustomize setup, add webhook config to helm chart by @sivanantha321 in #4712
- Revise KServe overview and enhance features section by @terrytangyuan in #4721
- Fix incorrect entrypoint in llmisvc Dockerfile by @Jooho in #4730
- Configure HF Downloads to Lower Memory Usage by @brettmthompson in #4726
- Temporarily disable SSL to unblock e2e tests by @Jooho in #4731
- Use a wrapper struct to accept resource.Quantity and keep the original input by @houshengbo in #4699
- Support Multiple Storage URIs for InferenceServices by @anurags25 in #4702
- Fix: prepend the KO_DOCKER_REPOSITORY to the base docker build to allow local publishing of the controller by @cjohannsen-cloudera in #4736
- Add Cert Manager installation to llmisvc quick install script by @sivanantha321 in #4733
- Injecting CA Bundle Into Storage Initializer Container for S3 Storage on LLMISVC Reconciliation by @brettmthompson in #4728
- Add metadata propagation for Kueue configurations to both Deployment and LeaderWorkerSet workloads by @hdefazio in #4747
- Add Support for Configuring S3 Storage via Secret Data by @brettmthompson in #4727
- 4739 - Fix: blob storage for inference logging recognizes embedded spec by @cjohannsen-cloudera in #4740
- Add llmd e2e tests by @andresllh in #4729
- Prepare for 0.16.0-rc1 release by @houshengbo in #4732
- Update the kserve-storage module to the latest version by @spolti in #4754
- feature: 4553 - Support inference logging to GCS and Azure by @cjohannsen-cloudera in #4582
- fix(helm-chart): expose
uidModelcarin the chart by @maciej-tatarski in #4689
New Contributors
- @anurags25 made their first contribution in #4694
- @jinan-zhou made their first contribution in #4615
- @maciej-tatarski made their first contribution in #4689
Full Changelog: v0.16.0-rc0...v0.16.0-rc1
v0.16.0-rc0
What's Changed
- chore: remove 'default' suffix compatibility by @sivanantha321 in #4178
- Upgrade Torch to v2.6.0 everywhere by @ashahba in #4450
- chore: drop pydantic v1 support by @sivanantha321 in #4353
- fix: Update TextIteratorStreamer to skip special tokens by @sivanantha321 in #4490
- Add Jooho to approvers in OWNERS file by @terrytangyuan in #4504
- Rename CRD file to reflect all KServe CRDs (Fixes #4396) by @WHITE-ICE-BOX in #4494
- Update kserve-resources helm chart to disable desired servingruntimes by @jmlaubach in #4485
- upgrade vllm to v0.9.0 and Torch to v2.7.0 by @ashahba in #4501
- Upgrade vLLM to v0.9.0.1 by @ashahba in #4507
- Initial segregation of the storage module from KServe SDK by @spolti in #4391
- Fix pss restricted warnings by @akagami-harsh in #4327
- Fix: do not update poetry dependency when install hf cpu deps by @yuzisun in #4516
- [Bug] Fixes error in trace logging by @gavrissh in #4514
- Stop and resume a model [Raw Deployment] by @hdefazio in #4455
- Resolve inference endpoint using runtime protocol when applicable by @israel-hdez in #4527
- fix(codegen): pins code-generator binaries version by @bartoszmajsak in #4533
- Allow to set custom timeouts for
InferenceGraphrouter by @lifo9 in #4218 - feat: support remote storage URI injection for serving runtimes by @de0725 in #4492
- [API] Define LLMInferenceService and LLMInferenceServiceConfig types and CRDs by @pierDipi in #4522
- Stop and Resume a transformer by @hdefazio in #4534
- Allow OCI for multi-node/multi-gpu by @israel-hdez in #4441
- 4380 - Inference logging to blob storage by @cjohannsen-cloudera in #4473
- Fix outdated BentoML import in sample code (BentoService no longer available in v1.x) by @YehCC52 in #4540
- Auto-update annotation for isvc. by @andresllh in #4342
- Stop and resume an explainer by @hdefazio in #4546
- fix: unset clampMax and clampMin, since they are not for replicas by @houshengbo in #4556
- refactor: Enhance HTTPRoute readiness checks by @sivanantha321 in #4543
- Refactor KServe to use global context for PredictorConfig by @sivanantha321 in #4526
- feat: refactor storage initializer resources configuration by @takamai06 in #4411
- feat(envtest): simplifies CRD lookup by @bartoszmajsak in #4564
- llmisvc: Initial controller scaffold and helm chart by @sivanantha321 in #4557
- Add logic to merge specs for LLMInferenceService by @VedantMahabaleshwarkar in #4563
- fix: Allow CA bundle path without config map by @fabiendupont in #4451
- docs: fixes invalid openshift subscription by @bartoszmajsak in #4572
- Add Code Coverage change report for PRs by @andyi2it in #4487
- feat: support secure access to prometheus in keda by @andyi2it in #4384
- feat: switch kserve from poetry to uv by @andyi2it in #4407
- chore(utils): simplifies code using generics by @bartoszmajsak in #4578
- Add AOT flashinfer build to huggingfaceserver dockerfile to precompil… by @AyushSawant18588 in #4567
- feat: improves collection helpers by @bartoszmajsak in #4579
- Add Missing config file for code coverage by @andyi2it in #4581
- Fix: Support multiple metrics in OpenTelemetryCollector for autoscaling by @houshengbo in #4591
- Upgrade vllm to v0.9.2 by @AyushSawant18588 in #4586
- Remove unused Strategy interface from sharding package by @houshengbo in #4590
- fix(test): uses local httptest server urls for downloader tests by @bartoszmajsak in #4601
- Fixed the issue of the same metrics across different deployments under different namespaces by @houshengbo in #4593
- feat: Add disable postprocessing option for raw logits by @sivanantha321 in #4566
- Stop and resume an inference graph by @hdefazio in #4588
- disallow
namefield in standard predictor by @HutakiHare in #4535 - fix: missing pytest-asyncio session scope marker in test_kserve_logger_cipn by @sivanantha321 in #4607
- update CRDs for LWS based multi node support by @VedantMahabaleshwarkar in #4596
- Add LLM InferenceService base configurations by @VedantMahabaleshwarkar in #4613
- fix: fixes misleading print for parallelism when $# > 2 (#784) by @bartoszmajsak in #4619
- Remove storage spec from LLMIsvc by @israel-hdez in #4622
- feat: speed up ci by @ls-2018 in #4600
- fix(deps): adds missing LLMInferenceService types for python imports by @bartoszmajsak in #4604
- Adding Llm-d scheduler reconcilier by @andresllh in #4614
- [Followup] Improve the stop/resume Inference Service tests and status by @hdefazio in #4636
- Added advanced config for ScaleUp and ScaleDown by @andyi2it in #4570
- add resource default for otel collector container limit in configmap by @andyi2it in #4633
- Protobuf version bump by @karolpustelnik in #4634
- Stop and resume an inference graph [Raw Deployment] by @hdefazio in #4637
- Fix typos: in the repo by @mwaykole in #4641
- llm-d workload reconciliation by @brettmthompson in #4616
- Llm d http route reconciler by @andresllh in #4617
- feat(webhook): introduces Validating Webhook for LLMInferenceServiceConfig by @VedantMahabaleshwarkar in #4630
- fix: Adds support for NVIDIA MIG GPU resource detection by @sivanantha321 in #4642
- feat(webhook): introduces Validating Webhook for LLMInferenceService by @VedantMahabaleshwarkar in #4631
- Add logic for LLMISVC controller by @VedantMahabaleshwarkar in #4632
- Fix quickstart link by @thesteve0 in #4654
- ci: skip workflows when only markdown files are changed by @Jooho in #4650
- refactor: 4602 - Refactor "RawDeployment" to "Standard" and "Serverless" to "KNative" to clarify usage by @cjohannsen-cloudera in #4608
- fix: README links to KServe website updated by @dominikkawka in #4656
- fix: allow digests in runtimes' images by @tmvfb in #4653
- [llmisvc] Improve config merge and update well-known presets by @pierDipi in #4663
- [llmisvc] Support cluster-scoped objects in generic CRUD functions by @pierDipi in #4664
- fix: fix snyk scan sarif file upload by @sivanantha321 in #4660
- fix: defaults GITHUB_SHA for graph images by @bartoszmajsak in #4620
- Promote new KServe Storage module by @spolti in #4625
- fix: escape HTML characters in api comments to fix syntax errors in website docs by @sivanantha321 in #4662
- Implement the progressive rollout for raw deployment by @houshengbo in #4623
- fix: dry run update for keda autoscaling loop by @andyi2it in #4587
- Fix HF Token Vulnerability in Storage Initializer Container by @brettmthompson in #4677
- docs: update Kafka sample path after file relocation by @1lyvianis in #4680
- Last Deployment Status Should Reflect Deployment Status by @HotsauceLee in #4667
- Fix: Update ModelCopies.TotalCopies for all model states by @hardik-menger in #4676
- Prepare for 0.16.0-rc0 release by @houshengbo in https://github.com/kserve/kserve/pull...
v0.15.2
What's Changed
- Fixes CVE-2025-43859 by @spolti in #4468
- config: enable ModelCar by default by @tarilabs in #4316
- fix: huggingface e2e test output mismatch and add tests for stream requests by @sivanantha321 in #4482
- Rework the order in which the knative autoscaler configmap is read during reconciliation by @brettmthompson in #4471
- Add predictor_config to ModelServer init function by @greenmoon55 in #4491
- docs: enhance security documentation with detailed reporting and prevention mechanisms by @sivanantha321 in #4495
- fix: update workflow to use ubuntu-latest for rerun PR tests by @sivanantha321 in #4496
- Generate Release 0.15.2 by @yuzisun in #4497
New Contributors
Full Changelog: v0.15.1...v0.15.2
v0.15.1
What's Changed
- fix typo on inferenceservice-config by @spolti in #4244
- Bump Go version to 1.24 by @sivanantha321 in #4321
- CI: Increase timeout for REST client connections to improve reliability by @sivanantha321 in #4355
- Localmodel agent can watch node groups by @greenmoon55 in #4362
- Fixing flake8 linter error. by @andresllh in #4354
- Update Huggingface Transformer to 4.50.3 by @rajatvig in #4351
- fix: Register LoRA model name in model registry to avoid not found error by @sivanantha321 in #4352
- Update Huggingface Transformer to 4.51.0 and huggingface-hub for kserve by @rajatvig in #4364
- Router config fixes in configmap template by @tmbochenski in #4369
- Chore: Deprecate Openvino support in HF runtime by @gavrissh in #4379
- vLLM V1 support for HF Server Runtime by @gavrissh in #4368
- Support Numpy 2.x by @sivanantha321 in #4386
- [Model cache] Do not remove PVC and PV after isvc deletion by @greenmoon55 in #4390
- Fix Flaky multi processing tests by @andyi2it in #4383
- Rerank support for vLLM in HuggingFace Serving Runtime by @AyushSawant18588 in #4376
- Upgrade vllm to support Llama4 by @gavrissh in #4388
- Adding bitsandbytes package for 4 bit support by @johnugeorge in #4406
- Fix: Isvc matched with wrong ModelCache by @HotsauceLee in #4398
- Fix: remove duplicated OpenAIGenerativeModel in init by @huazq in #4399
- Feat: Allow inference service metadata injection at agent sidecar level for payload logging by @tylerhyang in #4325
- Fix for KEDA scaledobject target value is set to pointer instead of specified value by @andyi2it in #4373
- MultiNode change a logic to calculate ray node count and gpu count by @Jooho in #4356
- Remove internal annotations when no cache resource is matched by @greenmoon55 in #4412
- Add DeploymentMode to InferenceService and InferenceGraph status and prevent deploymentMode change by @israel-hdez in #4423
- Feat should not NewHPARecroncile for external hpa. by @johnzheng1975 in #4363
- chore: Upgrade prow-github-actions to version 2 in workflow files by @sivanantha321 in #4417
- Fix: update external autoscaler tests missed from PR 4363 by @Jooho in #4438
- Update OWNERS by @terrytangyuan in #4442
- Upgrade vLLM to support Qwen3 by @gavrissh in #4434
- InferenceGraph: Fix response code when condition step is not fulfilled by @israel-hdez in #4429
- Stop and resume a model by adding a new annotation [Serverless] by @hdefazio in #4337
- Improve Handling of Knative Autoscaler Confguration by @brettmthompson in #4394
- chore: adds CNCF Code of Conduct by @bartoszmajsak in #4458
- chore: Reenable Docker workflows to support arm64 build by @sivanantha321 in #4446
- Fix issue with precommit hook by @andyi2it in #4456
- Fix raw deployment update by @andyi2it in #4445
- update golangcli-lint to 1.64.8 by @ashahba in #4459
- fix: corrects links to translations by @bartoszmajsak in #4461
- chore: Include third party licenses, Add license checker, Enable SBOM Generation for images by @sivanantha321 in #4416
- LMCache Integration with vLLM runtime by @sivanantha321 in #4320
- Publish 0.15.1 release by @greenmoon55 in #4466
- Fix: add type specification for nthread argument in argument parser by @sivanantha321 in #4410
- Improve code coverage by @andyi2it in #4385
- Fixes vLLM V1 failures: Revert back the approach to initiate the background engine task by @gavrissh in #4470
New Contributors
- @andresllh made their first contribution in #4354
- @tmbochenski made their first contribution in #4369
- @huazq made their first contribution in #4399
- @johnzheng1975 made their first contribution in #4363
- @brettmthompson made their first contribution in #4394
- @bartoszmajsak made their first contribution in #4458
Full Changelog: v0.15.0...v0.15.1
v0.15.0
What's Changed
- bump to vllm0.6.2 and add explicit chat template by @hustxiayang in #3964
- bump to vllm0.6.3 by @hustxiayang in #4001
- Feature: Add hf transfer by @tjandy98 in #4000
- Fix snyk scan null error by @sivanantha321 in #3974
- Update quick install script by @johnugeorge in #4005
- Local Model Node CR by @HotsauceLee in #3978
- Reduce E2Es dependency on CI environment (2) by @israel-hdez in #4008
- Allow GCS to download single file by @spolti in #4015
- bump to vllm0.6.3.post1 by @hustxiayang in #4023
- Set default for SamplingParams.max_tokens in OpenAI requests if unset by @kevinmingtarja in #4020
- Add tools functionality to vLLM by @ArjunBhalla98 in #4033
- For vllm users, our parser should be able to support both - and _ by @hustxiayang in #3933
- Add tools unpacking for vLLM by @ArjunBhalla98 in #4035
- Multi-Node Inference Implementation by @Jooho in #3972
- Enhance InjectAgent to Handle Only HTTPGet, TCP Readiness Probes by @LOADBC in #4012
- Feat: Fix memory issue by replacing io.ReadAll with io.Copy (#4017) by @ops-jaeha in #4018
- Update alibiexplainer example by @spolti in #4004
- Fix huggingface build runs out of storage in CI by @sivanantha321 in #4044
- Update snyk scan to include new images by @sivanantha321 in #4042
- Introducing KServe Guru on Gurubase.io by @kursataktas in #4038
- Fix Hugging Face server EncoderModel not returning probabilities by correctly passing --return_probabilities flag (#3958) by @oplushappy in #4024
- Add deeper readiness check for transformer by @sivanantha321 in #3348
- Fix Starlette Denial of service (DoS) via multipart/form-data by @spolti in #4006
- remove duplicated import "github.com/onsi/gomega" by @carlory in #4051
- Fix localmodel controller name in snyk scan workflow by @sivanantha321 in #4054
- Fix azure blob storage access key env not mounted by @bentohset in #4064
- Storage Initializer support single digit azure DNS zone ID by @bentohset in #4070
- Fix trust remote code encoder model by @sivanantha321 in #4043
- introduce the prepare-for-release.sh script by @spolti in #3993
- Model cache controller and node agent by @yuzisun in #4089
- Storage containers typo fix for Huggingface Storage type by @andyi2it in #4098
- Support datetime object serialization in v1/v2 response by @sivanantha321 in #4099
- Replace klog with klog/v2 by @sivanantha321 in #4093
- Add exception handling and logging for grpc server by @sivanantha321 in #4066
- Update ClusterLocalModel to LocalModelCache by @yuzisun in #4105
- Fix LocalModelCache controller reconciles deleted resource by @sivanantha321 in #4106
- Fix InferenceService state when Predictor pod in CrashLoopBackOff by @hdefazio in #4003
- LocalModelCache Admission Webhook by @HotsauceLee in #4102
- Add namespace to localmodel and localmodelnode ServiceAccount helm chart by @ritzdevp in #4111
- KServe VLLM cpu image by @AyushSawant18588 in #4049
- Update max_model_len calculation and fixup encoder pooling by @Datta0 in #4055
- chore: use patch instead of update for finalizer changes by @whynowy in #4072
- Fix isvc role localmodelcache permission by @sivanantha321 in #4131
- Detect missing models and redownload models by @greenmoon55 in #4095
- introduce service configuration at configmap level by @spolti in #3672
- Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
- Annotation to disable model cache by @greenmoon55 in #4118
- Clean up jobs in model cache agent by @greenmoon55 in #4140
- Ensure Model root folder exists by @greenmoon55 in #4142
- Add NodeGroup Name Into PVC Name by @HotsauceLee in #4141
- Make LocalModel Agent reconcilation frequency configurable by @greenmoon55 in #4143
- Remove deepcopy-gen in favour of controller-gen by @sivanantha321 in #4109
- Add ability to set annotations on controll/webhook service and expose metrics bind port and address in helm chart by @mhowell24 in #4127
- Fix EOF error for downloading zip files by @Jonas-Bruns in #4082
- Remove redundant namespace yaml by @greenmoon55 in #4148
- Fix Localmodel agent build by @greenmoon55 in #4150
- Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
- Ensure root model directory exists and add protection for jobs created by @yuzisun in #4152
- Enable transformer deeper readiness check tests by @sivanantha321 in #4121
- Update HuggingFace server dependencies versions by @AyushSawant18588 in #4147
- Add workflow for verifying go mod by @sivanantha321 in #4137
- Fix for CVE-2024-52304 - aiohttp upgrade by @andyi2it in #4113
- Allow other engine builders other than docker by @spolti in #3906
- Add localmodelnode crd to helm chart by @greenmoon55 in #4161
- Fixes Non-linear parsing of case-insensitive content by @spolti in #4158
- Helm chart - option to run daemonset as root by @greenmoon55 in #4164
- Replace nodeGroup with nodeGroups in charts/kserve-crd by @ritzdevp in #4166
- Add affinity and tolerations to localmodel daemonset by @greenmoon55 in #4173
- Fix s3 download PermanentRedirectError for legacy s3 endpoint by @bentohset in #4157
- Make label and annotation propagation configurable by @spolti in #4030
- Add ModelCache e2e test by @sivanantha321 in #4136
- Update vllm to 0.6.6 by @rajatvig in #4176
- [bugfix] fix s3 storage download filename bug by @anencore94 in #4162
- Add hf to storageuri prefix list by @tjandy98 in #4184
- Add Support for OpenAI-compatible Embeddings API by @FabianScheidt in #4129
- fix: typo in _construct_http_status_error method by @Mgla96 in #4190
- Fix raw logger e2e test by @sivanantha321 in #4185
- Feat: Support configuring isvc resource defaults by @andyi2it in #4032
- keep replicas when autoscaler set external by @Jooho in #4196
- Increase kserve controller readiness probe time period by @sivanantha321 in #4200
- Fix golangci-lint binary path selection based on GOBIN by @Jooho in #4198
- Add option to disable volume management in localModel config by @ritzdevp in #4186
- set MaxUnavailable(0%)/MaxSurge(100%) for rollingUpdate in multinode case by @Jooho in #4188
- Gracefully shutdown the router server by @sivanantha321 in #3367
- Add workflow for manual huggingface vLLM image publish by @sivanantha321 in #4092
- Feat: Gateway API Support - Raw Deployment by @sivanantha321 in #3952
- add make goal to build huggingface cpu image by @spolti in #4202
- Cleanup the filepath in createNewFile to avoid path traversal issue by @hdefazio in #4205
- Enhance multinode health_check python and manifests by @Jooho in #4197
- Publish 0.15-rc0 release by @yuzisun in #4213
- Fix Gateway API flaky test by @...
v0.15.0-rc1
What's Changed
- Fix Gateway API flaky test by @sivanantha321 in #4214
- Remove linux/arm64/v8 as platform option to fix build errors by @gavrissh in #4217
- Fix: typo in inferenceservice configmap by @sukumargaonkar in #4215
- Fix CI not using localmodelnode agent dev image by @sivanantha321 in #4221
- Fix model download path by @hakuro95 in #4112
- Support Multiple NodeGroups In LocalModelCache by @HotsauceLee in #4170
- Inference Graph: use plain text HTTP when part of Istio Mesh by @israel-hdez in #4031
- Better compatibility with in-place upgrades by @israel-hdez in #4234
- Increase request timeout seconds for art explainer by @sivanantha321 in #4241
- fix: add trainedmodels custom resource to kubeflow-kserve clusterroles by @gigabyte132 in #4225
- Fix CVE-2025-24357 and Bump vLLM to 0.7.2 by @sivanantha321 in #4223
- Use Go 1.23 to build kserve and update mod versions by @rajatvig in #4239
- install: Remove modelmesh installation from helm chart by @sivanantha321 in #4243
- Bump golang-lint to 1.63 and fix all linter errors by @sivanantha321 in #3967
- Issue 4248: Request Logger with Multiple Metadata Headers fail by @tylerhyang in #4249
- Add predictor healthcheck to OpenAIProxyModel by @greenmoon55 in #4250
- Expose podSpec fields for Inferencegraph by @sivanantha321 in #4091
- Fix localmodel test by @greenmoon55 in #4268
- Force symlink for ModelCar by @pmtk in #4274
- Refactor vLLM + Embed support by @gavrissh in #4177
- Fix triton health check by @greenmoon55 in #4277
- Upgrade vLLM version to 0.7.3 by @gavrissh in #4281
- 0.15.0-rc1 release by @greenmoon55 in #4285
- Add model_version field to InferRequest by @greenmoon55 in #4287
- (Bug #4273) quick_install.sh failed to uninstall incomplete installation and has small syntax bug by @zozowell in #4275
- update openshift guide by @spolti in #4210
- Collocation transformer and predictor spec by @sivanantha321 in #4255
- Move arguments from 'args' to 'command' for huggingface server multnode SR by @Jooho in #4289
- Include reasoning parser option in vLLM for reasoning models by @gavrissh in #4282
- KServe Keda Integration by @andyi2it in #3652
- add huggingfaceserver-multinode to helm chart by @Jooho in #4293
- Add missing CRDs for Keda by @andyi2it in #4296
New Contributors
- @hakuro95 made their first contribution in #4112
- @gigabyte132 made their first contribution in #4225
- @tylerhyang made their first contribution in #4249
- @pmtk made their first contribution in #4274
- @zozowell made their first contribution in #4275
Full Changelog: v0.15.0-rc0...v0.15.0-rc1
v0.15.0-rc0
What's Changed
- bump to vllm0.6.2 and add explicit chat template by @hustxiayang in #3964
- bump to vllm0.6.3 by @hustxiayang in #4001
- Feature: Add hf transfer by @tjandy98 in #4000
- Fix snyk scan null error by @sivanantha321 in #3974
- Update quick install script by @johnugeorge in #4005
- Local Model Node CR by @HotsauceLee in #3978
- Reduce E2Es dependency on CI environment (2) by @israel-hdez in #4008
- Allow GCS to download single file by @spolti in #4015
- bump to vllm0.6.3.post1 by @hustxiayang in #4023
- Set default for SamplingParams.max_tokens in OpenAI requests if unset by @kevinmingtarja in #4020
- Add tools functionality to vLLM by @ArjunBhalla98 in #4033
- For vllm users, our parser should be able to support both - and _ by @hustxiayang in #3933
- Add tools unpacking for vLLM by @ArjunBhalla98 in #4035
- Multi-Node Inference Implementation by @Jooho in #3972
- Enhance InjectAgent to Handle Only HTTPGet, TCP Readiness Probes by @LOADBC in #4012
- Feat: Fix memory issue by replacing io.ReadAll with io.Copy (#4017) by @ops-jaeha in #4018
- Update alibiexplainer example by @spolti in #4004
- Fix huggingface build runs out of storage in CI by @sivanantha321 in #4044
- Update snyk scan to include new images by @sivanantha321 in #4042
- Introducing KServe Guru on Gurubase.io by @kursataktas in #4038
- Fix Hugging Face server EncoderModel not returning probabilities by correctly passing --return_probabilities flag (#3958) by @oplushappy in #4024
- Add deeper readiness check for transformer by @sivanantha321 in #3348
- Fix Starlette Denial of service (DoS) via multipart/form-data by @spolti in #4006
- remove duplicated import "github.com/onsi/gomega" by @carlory in #4051
- Fix localmodel controller name in snyk scan workflow by @sivanantha321 in #4054
- Fix azure blob storage access key env not mounted by @bentohset in #4064
- Storage Initializer support single digit azure DNS zone ID by @bentohset in #4070
- Fix trust remote code encoder model by @sivanantha321 in #4043
- introduce the prepare-for-release.sh script by @spolti in #3993
- Model cache controller and node agent by @yuzisun in #4089
- Storage containers typo fix for Huggingface Storage type by @andyi2it in #4098
- Support datetime object serialization in v1/v2 response by @sivanantha321 in #4099
- Replace klog with klog/v2 by @sivanantha321 in #4093
- Add exception handling and logging for grpc server by @sivanantha321 in #4066
- Update ClusterLocalModel to LocalModelCache by @yuzisun in #4105
- Fix LocalModelCache controller reconciles deleted resource by @sivanantha321 in #4106
- Fix InferenceService state when Predictor pod in CrashLoopBackOff by @hdefazio in #4003
- LocalModelCache Admission Webhook by @HotsauceLee in #4102
- Add namespace to localmodel and localmodelnode ServiceAccount helm chart by @ritzdevp in #4111
- KServe VLLM cpu image by @AyushSawant18588 in #4049
- Update max_model_len calculation and fixup encoder pooling by @Datta0 in #4055
- chore: use patch instead of update for finalizer changes by @whynowy in #4072
- Fix isvc role localmodelcache permission by @sivanantha321 in #4131
- Detect missing models and redownload models by @greenmoon55 in #4095
- introduce service configuration at configmap level by @spolti in #3672
- Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
- Annotation to disable model cache by @greenmoon55 in #4118
- Clean up jobs in model cache agent by @greenmoon55 in #4140
- Ensure Model root folder exists by @greenmoon55 in #4142
- Add NodeGroup Name Into PVC Name by @HotsauceLee in #4141
- Make LocalModel Agent reconcilation frequency configurable by @greenmoon55 in #4143
- Remove deepcopy-gen in favour of controller-gen by @sivanantha321 in #4109
- Add ability to set annotations on controll/webhook service and expose metrics bind port and address in helm chart by @mhowell24 in #4127
- Fix EOF error for downloading zip files by @Jonas-Bruns in #4082
- Remove redundant namespace yaml by @greenmoon55 in #4148
- Fix Localmodel agent build by @greenmoon55 in #4150
- Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
- Ensure root model directory exists and add protection for jobs created by @yuzisun in #4152
- Enable transformer deeper readiness check tests by @sivanantha321 in #4121
- Update HuggingFace server dependencies versions by @AyushSawant18588 in #4147
- Add workflow for verifying go mod by @sivanantha321 in #4137
- Fix for CVE-2024-52304 - aiohttp upgrade by @andyi2it in #4113
- Allow other engine builders other than docker by @spolti in #3906
- Add localmodelnode crd to helm chart by @greenmoon55 in #4161
- Fixes Non-linear parsing of case-insensitive content by @spolti in #4158
- Helm chart - option to run daemonset as root by @greenmoon55 in #4164
- Replace nodeGroup with nodeGroups in charts/kserve-crd by @ritzdevp in #4166
- Add affinity and tolerations to localmodel daemonset by @greenmoon55 in #4173
- Fix s3 download PermanentRedirectError for legacy s3 endpoint by @bentohset in #4157
- Make label and annotation propagation configurable by @spolti in #4030
- Add ModelCache e2e test by @sivanantha321 in #4136
- Update vllm to 0.6.6 by @rajatvig in #4176
- [bugfix] fix s3 storage download filename bug by @anencore94 in #4162
- Add hf to storageuri prefix list by @tjandy98 in #4184
- Add Support for OpenAI-compatible Embeddings API by @FabianScheidt in #4129
- fix: typo in _construct_http_status_error method by @Mgla96 in #4190
- Fix raw logger e2e test by @sivanantha321 in #4185
- Feat: Support configuring isvc resource defaults by @andyi2it in #4032
- keep replicas when autoscaler set external by @Jooho in #4196
- Increase kserve controller readiness probe time period by @sivanantha321 in #4200
- Fix golangci-lint binary path selection based on GOBIN by @Jooho in #4198
- Add option to disable volume management in localModel config by @ritzdevp in #4186
- set MaxUnavailable(0%)/MaxSurge(100%) for rollingUpdate in multinode case by @Jooho in #4188
- Gracefully shutdown the router server by @sivanantha321 in #3367
- Add workflow for manual huggingface vLLM image publish by @sivanantha321 in #4092
- Feat: Gateway API Support - Raw Deployment by @sivanantha321 in #3952
- add make goal to build huggingface cpu image by @spolti in #4202
- Cleanup the filepath in createNewFile to avoid path traversal issue by @hdefazio in #4205
- Enhance multinode health_check python and manifests by @Jooho in #4197
- Publish 0.15-rc0 release by @yuzisun in #4213
New Contributors
- @ArjunBH...
v0.14.1
What's Changed
- Support datetime object serialization for v1/v2 response by @sivanantha321 in #4123
- Introduce LocalModelNode CR by @HotsauceLee in #3978
- Update Model Cache controller for LocalModelNode and implement LocalModel node agent by @HotsauceLee @greenmoon55 in #4089
- Rename ClusterLocalModel to LocalModelCache by @yuzisun in #4105
- Detect missing models and redownload models by @greenmoon55 in #4095
- Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
- Annotation to disable model cache by @greenmoon55 in #4118
- Clean up jobs in local model agent by @greenmoon55 in #4140
- Add node group to PVC name by @HotsauceLee in #4141
- Make local node agent reconciliation frequency configurable by @greenmoon55 in #4143
- Add LocalModelCache admission webhook by @HotsauceLee in #4102
- Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
- Ensure root model directory exists and add protection for jobs created by @yuzisun #4152
Full Changelog: v0.14.0...v0.14.1
v0.14.0
What's Changed
- Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
- Extract openai predict logic into smaller methods by @grandbora in #3716
- Bump MLServer to 1.5.0 by @sivanantha321 in #3740
- Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
- inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
- Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
- Propagate
trust_remote_codeflag throughout vLLM startup by @calwoo in #3729 - Fix dead links on PyPI by @kevinbazira in #3754
- Fix model is ready even if there is no model by @HAO2167 in #3275
- Fix No model ready error in multi model serving by @sivanantha321 in #3758
- Initial implementation of Inference client by @sivanantha321 in #3401
- Fix logprobs for vLLM by @sivanantha321 in #3738
- Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
- pillow - Buffer Overflow by @spolti in #3598
- Use add_generation_prompt while creating chat template by @Datta0 in #3775
- Deduplicate the names for the additional domain names by @houshengbo in #3773
- Make Virtual Service case-insensitive by @andyi2it in #3779
- Install packages needed for vllm model load by @gavrissh in #3802
- Make gRPC max message length configurable by @sivanantha321 in #3741
- Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
- Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
- Increase timeout to make unit test stable by @Jooho in #3808
- Upgrade CI deps by @sivanantha321 in #3822
- Add tests for vLLM by @sivanantha321 in #3771
- Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
- Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
- Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
- Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
- Make ray an optional dependency by @sivanantha321 in #3834
- Update aif example by @spolti in #3765
- Use helm for quick installation by @sivanantha321 in #3813
- Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
- Add support for Azure DNS zone endpoints by @tjandy98 in #3819
- Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
- Add logging request feature for vLLM backend by @sivanantha321 in #3849
- Bump vLLM to 0.5.4 by @sivanantha321 in #3874
- Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
- Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
- Update KServe 2024-2025 Roadmap by @yuzisun in #3810
- Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
- Fix issue with rolling update behavior by @andyi2it in #3786
- Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
- Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
- Protobuf version upgrade 4.25.4 by @andyi2it in #3881
- Adds optional labels and annotations to the controller by @guitouni in #3366
- Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
- bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
- support text embedding task in hugging face server by @kevinmingtarja in #3743
- Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
- [Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
- Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
- adding metadata on requests by @gcemaj in #3635
- Publish 0.14.0-rc0 release by @yuzisun in #3867
- Use API token for publishing package to PyPI by @sivanantha321 in #3896
- Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
- Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
- Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
- bump to vllm 0.5.5 by @lizzzcai in #3911
- pin gosec to 2.20.0 by @greenmoon55 in #3921
- add a new doc 'common issues and solutions' by @Jooho in #3878
- Implement health endpoint for vLLM backend by @sivanantha321 in #3850
- Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
- Bump Go to 1.22 by @sivanantha321 in #3912
- bump to vllm 0.6.0 by @hustxiayang in #3934
- Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
- mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
- Fix permission error in snyk scan by @sivanantha321 in #3889
- Cluster Local Model CR by @greenmoon55 in #3839
- added http headers to inbound request by @andyi2it in #3895
- Add prow-github-action by @sivanantha321 in #3888
- Add TLS support for Inference Loggers by @ruivieira in #3863
- Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
- Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
- Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
- remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
- update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
- bump to vLLM0.6.1post2 by @hustxiayang in #3948
- Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
- add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
- Implement Huggingface model download in storage initializer by @andyi2it in #3584
- Update OWNERS file by @yuzisun in #3966
- Cluster local model controller by @greenmoon55 in #3860
- Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970
- add a new API for multi-node/multi-gpu by @Jooho in #3871
- Fix update-openapigen.sh that can be executed from kserve dir by @Jooho in #3924
- Add python 3.12 support and remove python 3.8 support by @sivanantha321 in #3645
- Fix openssl vulnerability CWE-1395 by @sivanantha321 in #3975
- Fix Kubernetes Doc Links by @jyono in #3670
- Fix kserve local testing env by @yuzisun in #3981
- Fix streaming response not working properly with logger by @sivanantha321 in #3847
- Add a flag for automount serviceaccount token by @greenmoon55 in https://github.com/kserve/ks...