Releases · kserve/kserve

@sivanantha321

What's Changed

chore: remove 'default' suffix compatibility by @sivanantha321 in #4178
Upgrade Torch to v2.6.0 everywhere by @ashahba in #4450
chore: drop pydantic v1 support by @sivanantha321 in #4353
fix: Update TextIteratorStreamer to skip special tokens by @sivanantha321 in #4490
Add Jooho to approvers in OWNERS file by @terrytangyuan in #4504
Rename CRD file to reflect all KServe CRDs (Fixes #4396) by @WHITE-ICE-BOX in #4494
Update kserve-resources helm chart to disable desired servingruntimes by @jmlaubach in #4485
upgrade vllm to v0.9.0 and Torch to v2.7.0 by @ashahba in #4501
Upgrade vLLM to v0.9.0.1 by @ashahba in #4507
Initial segregation of the storage module from KServe SDK by @spolti in #4391
Fix pss restricted warnings by @akagami-harsh in #4327
Fix: do not update poetry dependency when install hf cpu deps by @yuzisun in #4516
[Bug] Fixes error in trace logging by @gavrissh in #4514
Stop and resume a model [Raw Deployment] by @hdefazio in #4455
Resolve inference endpoint using runtime protocol when applicable by @israel-hdez in #4527
fix(codegen): pins code-generator binaries version by @bartoszmajsak in #4533
Allow to set custom timeouts for InferenceGraph router by @lifo9 in #4218
feat: support remote storage URI injection for serving runtimes by @de0725 in #4492
[API] Define LLMInferenceService and LLMInferenceServiceConfig types and CRDs by @pierDipi in #4522
Stop and Resume a transformer by @hdefazio in #4534
Allow OCI for multi-node/multi-gpu by @israel-hdez in #4441
4380 - Inference logging to blob storage by @cjohannsen-cloudera in #4473
Fix outdated BentoML import in sample code (BentoService no longer available in v1.x) by @YehCC52 in #4540
Auto-update annotation for isvc. by @andresllh in #4342
Stop and resume an explainer by @hdefazio in #4546
fix: unset clampMax and clampMin, since they are not for replicas by @houshengbo in #4556
refactor: Enhance HTTPRoute readiness checks by @sivanantha321 in #4543
Refactor KServe to use global context for PredictorConfig by @sivanantha321 in #4526
feat: refactor storage initializer resources configuration by @takamai06 in #4411
feat(envtest): simplifies CRD lookup by @bartoszmajsak in #4564
llmisvc: Initial controller scaffold and helm chart by @sivanantha321 in #4557
Add logic to merge specs for LLMInferenceService by @VedantMahabaleshwarkar in #4563
fix: Allow CA bundle path without config map by @fabiendupont in #4451
docs: fixes invalid openshift subscription by @bartoszmajsak in #4572
Add Code Coverage change report for PRs by @andyi2it in #4487
feat: support secure access to prometheus in keda by @andyi2it in #4384
feat: switch kserve from poetry to uv by @andyi2it in #4407
chore(utils): simplifies code using generics by @bartoszmajsak in #4578
Add AOT flashinfer build to huggingfaceserver dockerfile to precompil… by @AyushSawant18588 in #4567
feat: improves collection helpers by @bartoszmajsak in #4579
Add Missing config file for code coverage by @andyi2it in #4581
Fix: Support multiple metrics in OpenTelemetryCollector for autoscaling by @houshengbo in #4591
Upgrade vllm to v0.9.2 by @AyushSawant18588 in #4586
Remove unused Strategy interface from sharding package by @houshengbo in #4590
fix(test): uses local httptest server urls for downloader tests by @bartoszmajsak in #4601
Fixed the issue of the same metrics across different deployments under different namespaces by @houshengbo in #4593
feat: Add disable postprocessing option for raw logits by @sivanantha321 in #4566
Stop and resume an inference graph by @hdefazio in #4588
disallow name field in standard predictor by @HutakiHare in #4535
fix: missing pytest-asyncio session scope marker in test_kserve_logger_cipn by @sivanantha321 in #4607
update CRDs for LWS based multi node support by @VedantMahabaleshwarkar in #4596
Add LLM InferenceService base configurations by @VedantMahabaleshwarkar in #4613
fix: fixes misleading print for parallelism when $# > 2 (#784) by @bartoszmajsak in #4619
Remove storage spec from LLMIsvc by @israel-hdez in #4622
feat: speed up ci by @ls-2018 in #4600
fix(deps): adds missing LLMInferenceService types for python imports by @bartoszmajsak in #4604
Adding Llm-d scheduler reconcilier by @andresllh in #4614
[Followup] Improve the stop/resume Inference Service tests and status by @hdefazio in #4636
Added advanced config for ScaleUp and ScaleDown by @andyi2it in #4570
add resource default for otel collector container limit in configmap by @andyi2it in #4633
Protobuf version bump by @karolpustelnik in #4634
Stop and resume an inference graph [Raw Deployment] by @hdefazio in #4637
Fix typos: in the repo by @mwaykole in #4641
llm-d workload reconciliation by @brettmthompson in #4616
Llm d http route reconciler by @andresllh in #4617
feat(webhook): introduces Validating Webhook for LLMInferenceServiceConfig by @VedantMahabaleshwarkar in #4630
fix: Adds support for NVIDIA MIG GPU resource detection by @sivanantha321 in #4642
feat(webhook): introduces Validating Webhook for LLMInferenceService by @VedantMahabaleshwarkar in #4631
Add logic for LLMISVC controller by @VedantMahabaleshwarkar in #4632
Fix quickstart link by @thesteve0 in #4654
ci: skip workflows when only markdown files are changed by @Jooho in #4650
refactor: 4602 - Refactor "RawDeployment" to "Standard" and "Serverless" to "KNative" to clarify usage by @cjohannsen-cloudera in #4608
fix: README links to KServe website updated by @dominikkawka in #4656
fix: allow digests in runtimes' images by @tmvfb in #4653
[llmisvc] Improve config merge and update well-known presets by @pierDipi in #4663
[llmisvc] Support cluster-scoped objects in generic CRUD functions by @pierDipi in #4664
fix: fix snyk scan sarif file upload by @sivanantha321 in #4660
fix: defaults GITHUB_SHA for graph images by @bartoszmajsak in #4620
Promote new KServe Storage module by @spolti in #4625
fix: escape HTML characters in api comments to fix syntax errors in website docs by @sivanantha321 in #4662
Implement the progressive rollout for raw deployment by @houshengbo in #4623
fix: dry run update for keda autoscaling loop by @andyi2it in #4587
Fix HF Token Vulnerability in Storage Initializer Container by @brettmthompson in #4677
docs: update Kafka sample path after file relocation by @1lyvianis in #4680
Last Deployment Status Should Reflect Deployment Status by @HotsauceLee in #4667
Fix: Update ModelCopies.TotalCopies for all model states by @hardik-menger in #4676
Prepare for 0.16.0-rc0 release by @houshengbo in https://github.com/kserve/kserve/pull...

@anurags25

What's Changed

deprecate: remove EnableDirectPvcVolumeMount flag by @anurags25 in #4694
Fix autoscaling tests duration and crashes by @andyi2it in #4688
Avoid Pervasive Logging of SA Not Found Errors by Credential Builder by @brettmthompson in #4696
fix: correct llmisvc Dockerfile reference in image publish workflow by @sivanantha321 in #4705
llmisvc: fix RBAC, templating, and adds quick install script by @sivanantha321 in #4698
Fixed the panic nil pointer issue componentExt being nil by @houshengbo in #4704
fix: Add disk space cleanup step to Docker publish workflows by @sivanantha321 in #4717
Add Star History section to README by @terrytangyuan in #4719
docs: Mention CNCF in project README by @terrytangyuan in #4718
Enabled the configuration options in Helm for opentelemetryCollector and autoscaler by @houshengbo in #4725
Time Series Forecast API Endpoint by @jinan-zhou in #4615
llmisvc dev & kustomize setup, add webhook config to helm chart by @sivanantha321 in #4712
Revise KServe overview and enhance features section by @terrytangyuan in #4721
Fix incorrect entrypoint in llmisvc Dockerfile by @Jooho in #4730
Configure HF Downloads to Lower Memory Usage by @brettmthompson in #4726
Temporarily disable SSL to unblock e2e tests by @Jooho in #4731
Use a wrapper struct to accept resource.Quantity and keep the original input by @houshengbo in #4699
Support Multiple Storage URIs for InferenceServices by @anurags25 in #4702
Fix: prepend the KO_DOCKER_REPOSITORY to the base docker build to allow local publishing of the controller by @cjohannsen-cloudera in #4736
Add Cert Manager installation to llmisvc quick install script by @sivanantha321 in #4733
Injecting CA Bundle Into Storage Initializer Container for S3 Storage on LLMISVC Reconciliation by @brettmthompson in #4728
Add metadata propagation for Kueue configurations to both Deployment and LeaderWorkerSet workloads by @hdefazio in #4747
Add Support for Configuring S3 Storage via Secret Data by @brettmthompson in #4727
4739 - Fix: blob storage for inference logging recognizes embedded spec by @cjohannsen-cloudera in #4740
Add llmd e2e tests by @andresllh in #4729
Prepare for 0.16.0-rc1 release by @houshengbo in #4732
Update the kserve-storage module to the latest version by @spolti in #4754
feature: 4553 - Support inference logging to GCS and Azure by @cjohannsen-cloudera in #4582
fix(helm-chart): expose uidModelcar in the chart by @maciej-tatarski in #4689

New Contributors

@anurags25 made their first contribution in #4694
@jinan-zhou made their first contribution in #4615
@maciej-tatarski made their first contribution in #4689

Full Changelog: v0.16.0-rc0...v0.16.0-rc1

@sivanantha321

What's Changed

chore: remove 'default' suffix compatibility by @sivanantha321 in #4178
Upgrade Torch to v2.6.0 everywhere by @ashahba in #4450
chore: drop pydantic v1 support by @sivanantha321 in #4353
fix: Update TextIteratorStreamer to skip special tokens by @sivanantha321 in #4490
Add Jooho to approvers in OWNERS file by @terrytangyuan in #4504
Rename CRD file to reflect all KServe CRDs (Fixes #4396) by @WHITE-ICE-BOX in #4494
Update kserve-resources helm chart to disable desired servingruntimes by @jmlaubach in #4485
upgrade vllm to v0.9.0 and Torch to v2.7.0 by @ashahba in #4501
Upgrade vLLM to v0.9.0.1 by @ashahba in #4507
Initial segregation of the storage module from KServe SDK by @spolti in #4391
Fix pss restricted warnings by @akagami-harsh in #4327
Fix: do not update poetry dependency when install hf cpu deps by @yuzisun in #4516
[Bug] Fixes error in trace logging by @gavrissh in #4514
Stop and resume a model [Raw Deployment] by @hdefazio in #4455
Resolve inference endpoint using runtime protocol when applicable by @israel-hdez in #4527
fix(codegen): pins code-generator binaries version by @bartoszmajsak in #4533
Allow to set custom timeouts for InferenceGraph router by @lifo9 in #4218
feat: support remote storage URI injection for serving runtimes by @de0725 in #4492
[API] Define LLMInferenceService and LLMInferenceServiceConfig types and CRDs by @pierDipi in #4522
Stop and Resume a transformer by @hdefazio in #4534
Allow OCI for multi-node/multi-gpu by @israel-hdez in #4441
4380 - Inference logging to blob storage by @cjohannsen-cloudera in #4473
Fix outdated BentoML import in sample code (BentoService no longer available in v1.x) by @YehCC52 in #4540
Auto-update annotation for isvc. by @andresllh in #4342
Stop and resume an explainer by @hdefazio in #4546
fix: unset clampMax and clampMin, since they are not for replicas by @houshengbo in #4556
refactor: Enhance HTTPRoute readiness checks by @sivanantha321 in #4543
Refactor KServe to use global context for PredictorConfig by @sivanantha321 in #4526
feat: refactor storage initializer resources configuration by @takamai06 in #4411
feat(envtest): simplifies CRD lookup by @bartoszmajsak in #4564
llmisvc: Initial controller scaffold and helm chart by @sivanantha321 in #4557
Add logic to merge specs for LLMInferenceService by @VedantMahabaleshwarkar in #4563
fix: Allow CA bundle path without config map by @fabiendupont in #4451
docs: fixes invalid openshift subscription by @bartoszmajsak in #4572
Add Code Coverage change report for PRs by @andyi2it in #4487
feat: support secure access to prometheus in keda by @andyi2it in #4384
feat: switch kserve from poetry to uv by @andyi2it in #4407
chore(utils): simplifies code using generics by @bartoszmajsak in #4578
Add AOT flashinfer build to huggingfaceserver dockerfile to precompil… by @AyushSawant18588 in #4567
feat: improves collection helpers by @bartoszmajsak in #4579
Add Missing config file for code coverage by @andyi2it in #4581
Fix: Support multiple metrics in OpenTelemetryCollector for autoscaling by @houshengbo in #4591
Upgrade vllm to v0.9.2 by @AyushSawant18588 in #4586
Remove unused Strategy interface from sharding package by @houshengbo in #4590
fix(test): uses local httptest server urls for downloader tests by @bartoszmajsak in #4601
Fixed the issue of the same metrics across different deployments under different namespaces by @houshengbo in #4593
feat: Add disable postprocessing option for raw logits by @sivanantha321 in #4566
Stop and resume an inference graph by @hdefazio in #4588
disallow name field in standard predictor by @HutakiHare in #4535
fix: missing pytest-asyncio session scope marker in test_kserve_logger_cipn by @sivanantha321 in #4607
update CRDs for LWS based multi node support by @VedantMahabaleshwarkar in #4596
Add LLM InferenceService base configurations by @VedantMahabaleshwarkar in #4613
fix: fixes misleading print for parallelism when $# > 2 (#784) by @bartoszmajsak in #4619
Remove storage spec from LLMIsvc by @israel-hdez in #4622
feat: speed up ci by @ls-2018 in #4600
fix(deps): adds missing LLMInferenceService types for python imports by @bartoszmajsak in #4604
Adding Llm-d scheduler reconcilier by @andresllh in #4614
[Followup] Improve the stop/resume Inference Service tests and status by @hdefazio in #4636
Added advanced config for ScaleUp and ScaleDown by @andyi2it in #4570
add resource default for otel collector container limit in configmap by @andyi2it in #4633
Protobuf version bump by @karolpustelnik in #4634
Stop and resume an inference graph [Raw Deployment] by @hdefazio in #4637
Fix typos: in the repo by @mwaykole in #4641
llm-d workload reconciliation by @brettmthompson in #4616
Llm d http route reconciler by @andresllh in #4617
feat(webhook): introduces Validating Webhook for LLMInferenceServiceConfig by @VedantMahabaleshwarkar in #4630
fix: Adds support for NVIDIA MIG GPU resource detection by @sivanantha321 in #4642
feat(webhook): introduces Validating Webhook for LLMInferenceService by @VedantMahabaleshwarkar in #4631
Add logic for LLMISVC controller by @VedantMahabaleshwarkar in #4632
Fix quickstart link by @thesteve0 in #4654
ci: skip workflows when only markdown files are changed by @Jooho in #4650
refactor: 4602 - Refactor "RawDeployment" to "Standard" and "Serverless" to "KNative" to clarify usage by @cjohannsen-cloudera in #4608
fix: README links to KServe website updated by @dominikkawka in #4656
fix: allow digests in runtimes' images by @tmvfb in #4653
[llmisvc] Improve config merge and update well-known presets by @pierDipi in #4663
[llmisvc] Support cluster-scoped objects in generic CRUD functions by @pierDipi in #4664
fix: fix snyk scan sarif file upload by @sivanantha321 in #4660
fix: defaults GITHUB_SHA for graph images by @bartoszmajsak in #4620
Promote new KServe Storage module by @spolti in #4625
fix: escape HTML characters in api comments to fix syntax errors in website docs by @sivanantha321 in #4662
Implement the progressive rollout for raw deployment by @houshengbo in #4623
fix: dry run update for keda autoscaling loop by @andyi2it in #4587
Fix HF Token Vulnerability in Storage Initializer Container by @brettmthompson in #4677
docs: update Kafka sample path after file relocation by @1lyvianis in #4680
Last Deployment Status Should Reflect Deployment Status by @HotsauceLee in #4667
Fix: Update ModelCopies.TotalCopies for all model states by @hardik-menger in #4676
Prepare for 0.16.0-rc0 release by @houshengbo in https://github.com/kserve/kserve/pull...

@spolti

What's Changed

Fixes CVE-2025-43859 by @spolti in #4468
config: enable ModelCar by default by @tarilabs in #4316
fix: huggingface e2e test output mismatch and add tests for stream requests by @sivanantha321 in #4482
Rework the order in which the knative autoscaler configmap is read during reconciliation by @brettmthompson in #4471
Add predictor_config to ModelServer init function by @greenmoon55 in #4491
docs: enhance security documentation with detailed reporting and prevention mechanisms by @sivanantha321 in #4495
fix: update workflow to use ubuntu-latest for rerun PR tests by @sivanantha321 in #4496
Generate Release 0.15.2 by @yuzisun in #4497

New Contributors

@tarilabs made their first contribution in #4316

Full Changelog: v0.15.1...v0.15.2

@spolti

What's Changed

fix typo on inferenceservice-config by @spolti in #4244
Bump Go version to 1.24 by @sivanantha321 in #4321
CI: Increase timeout for REST client connections to improve reliability by @sivanantha321 in #4355
Localmodel agent can watch node groups by @greenmoon55 in #4362
Fixing flake8 linter error. by @andresllh in #4354
Update Huggingface Transformer to 4.50.3 by @rajatvig in #4351
fix: Register LoRA model name in model registry to avoid not found error by @sivanantha321 in #4352
Update Huggingface Transformer to 4.51.0 and huggingface-hub for kserve by @rajatvig in #4364
Router config fixes in configmap template by @tmbochenski in #4369
Chore: Deprecate Openvino support in HF runtime by @gavrissh in #4379
vLLM V1 support for HF Server Runtime by @gavrissh in #4368
Support Numpy 2.x by @sivanantha321 in #4386
[Model cache] Do not remove PVC and PV after isvc deletion by @greenmoon55 in #4390
Fix Flaky multi processing tests by @andyi2it in #4383
Rerank support for vLLM in HuggingFace Serving Runtime by @AyushSawant18588 in #4376
Upgrade vllm to support Llama4 by @gavrissh in #4388
Adding bitsandbytes package for 4 bit support by @johnugeorge in #4406
Fix: Isvc matched with wrong ModelCache by @HotsauceLee in #4398
Fix: remove duplicated OpenAIGenerativeModel in init by @huazq in #4399
Feat: Allow inference service metadata injection at agent sidecar level for payload logging by @tylerhyang in #4325
Fix for KEDA scaledobject target value is set to pointer instead of specified value by @andyi2it in #4373
MultiNode change a logic to calculate ray node count and gpu count by @Jooho in #4356
Remove internal annotations when no cache resource is matched by @greenmoon55 in #4412
Add DeploymentMode to InferenceService and InferenceGraph status and prevent deploymentMode change by @israel-hdez in #4423
Feat should not NewHPARecroncile for external hpa. by @johnzheng1975 in #4363
chore: Upgrade prow-github-actions to version 2 in workflow files by @sivanantha321 in #4417
Fix: update external autoscaler tests missed from PR 4363 by @Jooho in #4438
Update OWNERS by @terrytangyuan in #4442
Upgrade vLLM to support Qwen3 by @gavrissh in #4434
InferenceGraph: Fix response code when condition step is not fulfilled by @israel-hdez in #4429
Stop and resume a model by adding a new annotation [Serverless] by @hdefazio in #4337
Improve Handling of Knative Autoscaler Confguration by @brettmthompson in #4394
chore: adds CNCF Code of Conduct by @bartoszmajsak in #4458
chore: Reenable Docker workflows to support arm64 build by @sivanantha321 in #4446
Fix issue with precommit hook by @andyi2it in #4456
Fix raw deployment update by @andyi2it in #4445
update golangcli-lint to 1.64.8 by @ashahba in #4459
fix: corrects links to translations by @bartoszmajsak in #4461
chore: Include third party licenses, Add license checker, Enable SBOM Generation for images by @sivanantha321 in #4416
LMCache Integration with vLLM runtime by @sivanantha321 in #4320
Publish 0.15.1 release by @greenmoon55 in #4466
Fix: add type specification for nthread argument in argument parser by @sivanantha321 in #4410
Improve code coverage by @andyi2it in #4385
Fixes vLLM V1 failures: Revert back the approach to initiate the background engine task by @gavrissh in #4470

New Contributors

@andresllh made their first contribution in #4354
@tmbochenski made their first contribution in #4369
@huazq made their first contribution in #4399
@johnzheng1975 made their first contribution in #4363
@brettmthompson made their first contribution in #4394
@bartoszmajsak made their first contribution in #4458

Full Changelog: v0.15.0...v0.15.1

@hustxiayang

What's Changed

bump to vllm0.6.2 and add explicit chat template by @hustxiayang in #3964
bump to vllm0.6.3 by @hustxiayang in #4001
Feature: Add hf transfer by @tjandy98 in #4000
Fix snyk scan null error by @sivanantha321 in #3974
Update quick install script by @johnugeorge in #4005
Local Model Node CR by @HotsauceLee in #3978
Reduce E2Es dependency on CI environment (2) by @israel-hdez in #4008
Allow GCS to download single file by @spolti in #4015
bump to vllm0.6.3.post1 by @hustxiayang in #4023
Set default for SamplingParams.max_tokens in OpenAI requests if unset by @kevinmingtarja in #4020
Add tools functionality to vLLM by @ArjunBhalla98 in #4033
For vllm users, our parser should be able to support both - and _ by @hustxiayang in #3933
Add tools unpacking for vLLM by @ArjunBhalla98 in #4035
Multi-Node Inference Implementation by @Jooho in #3972
Enhance InjectAgent to Handle Only HTTPGet, TCP Readiness Probes by @LOADBC in #4012
Feat: Fix memory issue by replacing io.ReadAll with io.Copy (#4017) by @ops-jaeha in #4018
Update alibiexplainer example by @spolti in #4004
Fix huggingface build runs out of storage in CI by @sivanantha321 in #4044
Update snyk scan to include new images by @sivanantha321 in #4042
Introducing KServe Guru on Gurubase.io by @kursataktas in #4038
Fix Hugging Face server EncoderModel not returning probabilities by correctly passing --return_probabilities flag (#3958) by @oplushappy in #4024
Add deeper readiness check for transformer by @sivanantha321 in #3348
Fix Starlette Denial of service (DoS) via multipart/form-data by @spolti in #4006
remove duplicated import "github.com/onsi/gomega" by @carlory in #4051
Fix localmodel controller name in snyk scan workflow by @sivanantha321 in #4054
Fix azure blob storage access key env not mounted by @bentohset in #4064
Storage Initializer support single digit azure DNS zone ID by @bentohset in #4070
Fix trust remote code encoder model by @sivanantha321 in #4043
introduce the prepare-for-release.sh script by @spolti in #3993
Model cache controller and node agent by @yuzisun in #4089
Storage containers typo fix for Huggingface Storage type by @andyi2it in #4098
Support datetime object serialization in v1/v2 response by @sivanantha321 in #4099
Replace klog with klog/v2 by @sivanantha321 in #4093
Add exception handling and logging for grpc server by @sivanantha321 in #4066
Update ClusterLocalModel to LocalModelCache by @yuzisun in #4105
Fix LocalModelCache controller reconciles deleted resource by @sivanantha321 in #4106
Fix InferenceService state when Predictor pod in CrashLoopBackOff by @hdefazio in #4003
LocalModelCache Admission Webhook by @HotsauceLee in #4102
Add namespace to localmodel and localmodelnode ServiceAccount helm chart by @ritzdevp in #4111
KServe VLLM cpu image by @AyushSawant18588 in #4049
Update max_model_len calculation and fixup encoder pooling by @Datta0 in #4055
chore: use patch instead of update for finalizer changes by @whynowy in #4072
Fix isvc role localmodelcache permission by @sivanantha321 in #4131
Detect missing models and redownload models by @greenmoon55 in #4095
introduce service configuration at configmap level by @spolti in #3672
Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
Annotation to disable model cache by @greenmoon55 in #4118
Clean up jobs in model cache agent by @greenmoon55 in #4140
Ensure Model root folder exists by @greenmoon55 in #4142
Add NodeGroup Name Into PVC Name by @HotsauceLee in #4141
Make LocalModel Agent reconcilation frequency configurable by @greenmoon55 in #4143
Remove deepcopy-gen in favour of controller-gen by @sivanantha321 in #4109
Add ability to set annotations on controll/webhook service and expose metrics bind port and address in helm chart by @mhowell24 in #4127
Fix EOF error for downloading zip files by @Jonas-Bruns in #4082
Remove redundant namespace yaml by @greenmoon55 in #4148
Fix Localmodel agent build by @greenmoon55 in #4150
Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
Ensure root model directory exists and add protection for jobs created by @yuzisun in #4152
Enable transformer deeper readiness check tests by @sivanantha321 in #4121
Update HuggingFace server dependencies versions by @AyushSawant18588 in #4147
Add workflow for verifying go mod by @sivanantha321 in #4137
Fix for CVE-2024-52304 - aiohttp upgrade by @andyi2it in #4113
Allow other engine builders other than docker by @spolti in #3906
Add localmodelnode crd to helm chart by @greenmoon55 in #4161
Fixes Non-linear parsing of case-insensitive content by @spolti in #4158
Helm chart - option to run daemonset as root by @greenmoon55 in #4164
Replace nodeGroup with nodeGroups in charts/kserve-crd by @ritzdevp in #4166
Add affinity and tolerations to localmodel daemonset by @greenmoon55 in #4173
Fix s3 download PermanentRedirectError for legacy s3 endpoint by @bentohset in #4157
Make label and annotation propagation configurable by @spolti in #4030
Add ModelCache e2e test by @sivanantha321 in #4136
Update vllm to 0.6.6 by @rajatvig in #4176
[bugfix] fix s3 storage download filename bug by @anencore94 in #4162
Add hf to storageuri prefix list by @tjandy98 in #4184
Add Support for OpenAI-compatible Embeddings API by @FabianScheidt in #4129
fix: typo in _construct_http_status_error method by @Mgla96 in #4190
Fix raw logger e2e test by @sivanantha321 in #4185
Feat: Support configuring isvc resource defaults by @andyi2it in #4032
keep replicas when autoscaler set external by @Jooho in #4196
Increase kserve controller readiness probe time period by @sivanantha321 in #4200
Fix golangci-lint binary path selection based on GOBIN by @Jooho in #4198
Add option to disable volume management in localModel config by @ritzdevp in #4186
set MaxUnavailable(0%)/MaxSurge(100%) for rollingUpdate in multinode case by @Jooho in #4188
Gracefully shutdown the router server by @sivanantha321 in #3367
Add workflow for manual huggingface vLLM image publish by @sivanantha321 in #4092
Feat: Gateway API Support - Raw Deployment by @sivanantha321 in #3952
add make goal to build huggingface cpu image by @spolti in #4202
Cleanup the filepath in createNewFile to avoid path traversal issue by @hdefazio in #4205
Enhance multinode health_check python and manifests by @Jooho in #4197
Publish 0.15-rc0 release by @yuzisun in #4213
Fix Gateway API flaky test by @...

@sivanantha321

What's Changed

Fix Gateway API flaky test by @sivanantha321 in #4214
Remove linux/arm64/v8 as platform option to fix build errors by @gavrissh in #4217
Fix: typo in inferenceservice configmap by @sukumargaonkar in #4215
Fix CI not using localmodelnode agent dev image by @sivanantha321 in #4221
Fix model download path by @hakuro95 in #4112
Support Multiple NodeGroups In LocalModelCache by @HotsauceLee in #4170
Inference Graph: use plain text HTTP when part of Istio Mesh by @israel-hdez in #4031
Better compatibility with in-place upgrades by @israel-hdez in #4234
Increase request timeout seconds for art explainer by @sivanantha321 in #4241
fix: add trainedmodels custom resource to kubeflow-kserve clusterroles by @gigabyte132 in #4225
Fix CVE-2025-24357 and Bump vLLM to 0.7.2 by @sivanantha321 in #4223
Use Go 1.23 to build kserve and update mod versions by @rajatvig in #4239
install: Remove modelmesh installation from helm chart by @sivanantha321 in #4243
Bump golang-lint to 1.63 and fix all linter errors by @sivanantha321 in #3967
Issue 4248: Request Logger with Multiple Metadata Headers fail by @tylerhyang in #4249
Add predictor healthcheck to OpenAIProxyModel by @greenmoon55 in #4250
Expose podSpec fields for Inferencegraph by @sivanantha321 in #4091
Fix localmodel test by @greenmoon55 in #4268
Force symlink for ModelCar by @pmtk in #4274
Refactor vLLM + Embed support by @gavrissh in #4177
Fix triton health check by @greenmoon55 in #4277
Upgrade vLLM version to 0.7.3 by @gavrissh in #4281
0.15.0-rc1 release by @greenmoon55 in #4285
Add model_version field to InferRequest by @greenmoon55 in #4287
(Bug #4273) quick_install.sh failed to uninstall incomplete installation and has small syntax bug by @zozowell in #4275
update openshift guide by @spolti in #4210
Collocation transformer and predictor spec by @sivanantha321 in #4255
Move arguments from 'args' to 'command' for huggingface server multnode SR by @Jooho in #4289
Include reasoning parser option in vLLM for reasoning models by @gavrissh in #4282
KServe Keda Integration by @andyi2it in #3652
add huggingfaceserver-multinode to helm chart by @Jooho in #4293
Add missing CRDs for Keda by @andyi2it in #4296

New Contributors

@hakuro95 made their first contribution in #4112
@gigabyte132 made their first contribution in #4225
@tylerhyang made their first contribution in #4249
@pmtk made their first contribution in #4274
@zozowell made their first contribution in #4275

Full Changelog: v0.15.0-rc0...v0.15.0-rc1

@hustxiayang

What's Changed

bump to vllm0.6.2 and add explicit chat template by @hustxiayang in #3964
bump to vllm0.6.3 by @hustxiayang in #4001
Feature: Add hf transfer by @tjandy98 in #4000
Fix snyk scan null error by @sivanantha321 in #3974
Update quick install script by @johnugeorge in #4005
Local Model Node CR by @HotsauceLee in #3978
Reduce E2Es dependency on CI environment (2) by @israel-hdez in #4008
Allow GCS to download single file by @spolti in #4015
bump to vllm0.6.3.post1 by @hustxiayang in #4023
Set default for SamplingParams.max_tokens in OpenAI requests if unset by @kevinmingtarja in #4020
Add tools functionality to vLLM by @ArjunBhalla98 in #4033
For vllm users, our parser should be able to support both - and _ by @hustxiayang in #3933
Add tools unpacking for vLLM by @ArjunBhalla98 in #4035
Multi-Node Inference Implementation by @Jooho in #3972
Enhance InjectAgent to Handle Only HTTPGet, TCP Readiness Probes by @LOADBC in #4012
Feat: Fix memory issue by replacing io.ReadAll with io.Copy (#4017) by @ops-jaeha in #4018
Update alibiexplainer example by @spolti in #4004
Fix huggingface build runs out of storage in CI by @sivanantha321 in #4044
Update snyk scan to include new images by @sivanantha321 in #4042
Introducing KServe Guru on Gurubase.io by @kursataktas in #4038
Fix Hugging Face server EncoderModel not returning probabilities by correctly passing --return_probabilities flag (#3958) by @oplushappy in #4024
Add deeper readiness check for transformer by @sivanantha321 in #3348
Fix Starlette Denial of service (DoS) via multipart/form-data by @spolti in #4006
remove duplicated import "github.com/onsi/gomega" by @carlory in #4051
Fix localmodel controller name in snyk scan workflow by @sivanantha321 in #4054
Fix azure blob storage access key env not mounted by @bentohset in #4064
Storage Initializer support single digit azure DNS zone ID by @bentohset in #4070
Fix trust remote code encoder model by @sivanantha321 in #4043
introduce the prepare-for-release.sh script by @spolti in #3993
Model cache controller and node agent by @yuzisun in #4089
Storage containers typo fix for Huggingface Storage type by @andyi2it in #4098
Support datetime object serialization in v1/v2 response by @sivanantha321 in #4099
Replace klog with klog/v2 by @sivanantha321 in #4093
Add exception handling and logging for grpc server by @sivanantha321 in #4066
Update ClusterLocalModel to LocalModelCache by @yuzisun in #4105
Fix LocalModelCache controller reconciles deleted resource by @sivanantha321 in #4106
Fix InferenceService state when Predictor pod in CrashLoopBackOff by @hdefazio in #4003
LocalModelCache Admission Webhook by @HotsauceLee in #4102
Add namespace to localmodel and localmodelnode ServiceAccount helm chart by @ritzdevp in #4111
KServe VLLM cpu image by @AyushSawant18588 in #4049
Update max_model_len calculation and fixup encoder pooling by @Datta0 in #4055
chore: use patch instead of update for finalizer changes by @whynowy in #4072
Fix isvc role localmodelcache permission by @sivanantha321 in #4131
Detect missing models and redownload models by @greenmoon55 in #4095
introduce service configuration at configmap level by @spolti in #3672
Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
Annotation to disable model cache by @greenmoon55 in #4118
Clean up jobs in model cache agent by @greenmoon55 in #4140
Ensure Model root folder exists by @greenmoon55 in #4142
Add NodeGroup Name Into PVC Name by @HotsauceLee in #4141
Make LocalModel Agent reconcilation frequency configurable by @greenmoon55 in #4143
Remove deepcopy-gen in favour of controller-gen by @sivanantha321 in #4109
Add ability to set annotations on controll/webhook service and expose metrics bind port and address in helm chart by @mhowell24 in #4127
Fix EOF error for downloading zip files by @Jonas-Bruns in #4082
Remove redundant namespace yaml by @greenmoon55 in #4148
Fix Localmodel agent build by @greenmoon55 in #4150
Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
Ensure root model directory exists and add protection for jobs created by @yuzisun in #4152
Enable transformer deeper readiness check tests by @sivanantha321 in #4121
Update HuggingFace server dependencies versions by @AyushSawant18588 in #4147
Add workflow for verifying go mod by @sivanantha321 in #4137
Fix for CVE-2024-52304 - aiohttp upgrade by @andyi2it in #4113
Allow other engine builders other than docker by @spolti in #3906
Add localmodelnode crd to helm chart by @greenmoon55 in #4161
Fixes Non-linear parsing of case-insensitive content by @spolti in #4158
Helm chart - option to run daemonset as root by @greenmoon55 in #4164
Replace nodeGroup with nodeGroups in charts/kserve-crd by @ritzdevp in #4166
Add affinity and tolerations to localmodel daemonset by @greenmoon55 in #4173
Fix s3 download PermanentRedirectError for legacy s3 endpoint by @bentohset in #4157
Make label and annotation propagation configurable by @spolti in #4030
Add ModelCache e2e test by @sivanantha321 in #4136
Update vllm to 0.6.6 by @rajatvig in #4176
[bugfix] fix s3 storage download filename bug by @anencore94 in #4162
Add hf to storageuri prefix list by @tjandy98 in #4184
Add Support for OpenAI-compatible Embeddings API by @FabianScheidt in #4129
fix: typo in _construct_http_status_error method by @Mgla96 in #4190
Fix raw logger e2e test by @sivanantha321 in #4185
Feat: Support configuring isvc resource defaults by @andyi2it in #4032
keep replicas when autoscaler set external by @Jooho in #4196
Increase kserve controller readiness probe time period by @sivanantha321 in #4200
Fix golangci-lint binary path selection based on GOBIN by @Jooho in #4198
Add option to disable volume management in localModel config by @ritzdevp in #4186
set MaxUnavailable(0%)/MaxSurge(100%) for rollingUpdate in multinode case by @Jooho in #4188
Gracefully shutdown the router server by @sivanantha321 in #3367
Add workflow for manual huggingface vLLM image publish by @sivanantha321 in #4092
Feat: Gateway API Support - Raw Deployment by @sivanantha321 in #3952
add make goal to build huggingface cpu image by @spolti in #4202
Cleanup the filepath in createNewFile to avoid path traversal issue by @hdefazio in #4205
Enhance multinode health_check python and manifests by @Jooho in #4197
Publish 0.15-rc0 release by @yuzisun in #4213

New Contributors

@ArjunBH...

@sivanantha321

What's Changed

Support datetime object serialization for v1/v2 response by @sivanantha321 in #4123
Introduce LocalModelNode CR by @HotsauceLee in #3978
Update Model Cache controller for LocalModelNode and implement LocalModel node agent by @HotsauceLee @greenmoon55 in #4089
Rename ClusterLocalModel to LocalModelCache by @yuzisun in #4105
Detect missing models and redownload models by @greenmoon55 in #4095
Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
Annotation to disable model cache by @greenmoon55 in #4118
Clean up jobs in local model agent by @greenmoon55 in #4140
Add node group to PVC name by @HotsauceLee in #4141
Make local node agent reconciliation frequency configurable by @greenmoon55 in #4143
Add LocalModelCache admission webhook by @HotsauceLee in #4102
Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
Ensure root model directory exists and add protection for jobs created by @yuzisun #4152

Full Changelog: v0.14.0...v0.14.1

@israel-hdez

What's Changed

Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
Extract openai predict logic into smaller methods by @grandbora in #3716
Bump MLServer to 1.5.0 by @sivanantha321 in #3740
Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
Propagate trust_remote_code flag throughout vLLM startup by @calwoo in #3729
Fix dead links on PyPI by @kevinbazira in #3754
Fix model is ready even if there is no model by @HAO2167 in #3275
Fix No model ready error in multi model serving by @sivanantha321 in #3758
Initial implementation of Inference client by @sivanantha321 in #3401
Fix logprobs for vLLM by @sivanantha321 in #3738
Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
pillow - Buffer Overflow by @spolti in #3598
Use add_generation_prompt while creating chat template by @Datta0 in #3775
Deduplicate the names for the additional domain names by @houshengbo in #3773
Make Virtual Service case-insensitive by @andyi2it in #3779
Install packages needed for vllm model load by @gavrissh in #3802
Make gRPC max message length configurable by @sivanantha321 in #3741
Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
Increase timeout to make unit test stable by @Jooho in #3808
Upgrade CI deps by @sivanantha321 in #3822
Add tests for vLLM by @sivanantha321 in #3771
Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
Make ray an optional dependency by @sivanantha321 in #3834
Update aif example by @spolti in #3765
Use helm for quick installation by @sivanantha321 in #3813
Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
Add support for Azure DNS zone endpoints by @tjandy98 in #3819
Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
Add logging request feature for vLLM backend by @sivanantha321 in #3849
Bump vLLM to 0.5.4 by @sivanantha321 in #3874
Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
Update KServe 2024-2025 Roadmap by @yuzisun in #3810
Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
Fix issue with rolling update behavior by @andyi2it in #3786
Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
Protobuf version upgrade 4.25.4 by @andyi2it in #3881
Adds optional labels and annotations to the controller by @guitouni in #3366
Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
support text embedding task in hugging face server by @kevinmingtarja in #3743
Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
[Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
adding metadata on requests by @gcemaj in #3635
Publish 0.14.0-rc0 release by @yuzisun in #3867
Use API token for publishing package to PyPI by @sivanantha321 in #3896
Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
bump to vllm 0.5.5 by @lizzzcai in #3911
pin gosec to 2.20.0 by @greenmoon55 in #3921
add a new doc 'common issues and solutions' by @Jooho in #3878
Implement health endpoint for vLLM backend by @sivanantha321 in #3850
Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
Bump Go to 1.22 by @sivanantha321 in #3912
bump to vllm 0.6.0 by @hustxiayang in #3934
Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
Fix permission error in snyk scan by @sivanantha321 in #3889
Cluster Local Model CR by @greenmoon55 in #3839
added http headers to inbound request by @andyi2it in #3895
Add prow-github-action by @sivanantha321 in #3888
Add TLS support for Inference Loggers by @ruivieira in #3863
Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
bump to vLLM0.6.1post2 by @hustxiayang in #3948
Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
Implement Huggingface model download in storage initializer by @andyi2it in #3584
Update OWNERS file by @yuzisun in #3966
Cluster local model controller by @greenmoon55 in #3860
Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970
add a new API for multi-node/multi-gpu by @Jooho in #3871
Fix update-openapigen.sh that can be executed from kserve dir by @Jooho in #3924
Add python 3.12 support and remove python 3.8 support by @sivanantha321 in #3645
Fix openssl vulnerability CWE-1395 by @sivanantha321 in #3975
Fix Kubernetes Doc Links by @jyono in #3670
Fix kserve local testing env by @yuzisun in #3981
Fix streaming response not working properly with logger by @sivanantha321 in #3847
Add a flag for automount serviceaccount token by @greenmoon55 in https://github.com/kserve/ks...

Releases: kserve/kserve

v0.16.0

What's Changed

Contributors

Uh oh!

v0.16.0-rc1

What's Changed

New Contributors

Contributors

Uh oh!

v0.16.0-rc0

What's Changed

Contributors

Uh oh!

v0.15.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.15.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.15.0

What's Changed

Contributors

Uh oh!

v0.15.0-rc1

What's Changed

New Contributors

Contributors

Uh oh!

v0.15.0-rc0

What's Changed

New Contributors

Contributors

Uh oh!

v0.14.1

What's Changed

Contributors

Uh oh!

v0.14.0

What's Changed

Contributors

Uh oh!