Skip to content

Commit 6700f48

Browse files
authored
Merge c1b47ea into 6ecbdef
2 parents 6ecbdef + c1b47ea commit 6700f48

3 files changed

Lines changed: 9 additions & 0 deletions

File tree

docs/architecture/load_planner.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
This document covers load-based planner in `examples/llm/components/planner.py`.
44

5+
> [!WARNING]
6+
> Bare metal deployment with local connector is deprecated. The only option to deploy load-based planner is via k8s. We will update the examples in this document soon.
7+
58
## Load-based Scaling Up/Down Prefill/Decode Workers
69

710
To adjust the number of prefill/decode workers, planner monitors the following metrics:

docs/architecture/sla_planner.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ The SLA (Service Level Agreement)-based planner is an intelligent autoscaling sy
77
> [!NOTE]
88
> Currently, SLA-based planner only supports disaggregated setup.
99
10+
> [!WARNING]
11+
> Bare metal deployment with local connector is deprecated. The only option to deploy SLA-based planner is via k8s. We will update the examples in this document soon.
12+
1013
## Features
1114

1215
* **SLA-driven scaling**: Automatically scales prefill/decode workers to meet TTFT and ITL targets

docs/guides/planner_benchmark/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ limitations under the License.
1919

2020
This guide shows an example of benchmarking `LocalPlanner` performance with synthetic data. In this example, we focus on 8x H100 SXM GPU and `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` model with TP1 prefill and decode engine.
2121

22+
> [!WARNING]
23+
> Bare metal deployment with local connector is deprecated. The only option to deploy planner is via k8s. We will update the examples in this document soon.
24+
2225
## Synthetic Data Generation
2326

2427
We first generate synthetic data with varying request rate from 0.75 to 3 using the provided `generate_synthetic_data.py` script.

0 commit comments

Comments
 (0)