Merge c1b47ea into 6ecbdef

tedzhouhk · web-flow · commit 6700f48d5440 · 2025-07-16T09:07:05.000-07:00
diff --git a/docs/architecture/load_planner.md b/docs/architecture/load_planner.md
@@ -2,6 +2,9 @@
 
 This document covers load-based planner in `examples/llm/components/planner.py`.
 
+> [!WARNING]
+> Bare metal deployment with local connector is deprecated. The only option to deploy load-based planner is via k8s. We will update the examples in this document soon.
+
 ## Load-based Scaling Up/Down Prefill/Decode Workers
 
 To adjust the number of prefill/decode workers, planner monitors the following metrics:
diff --git a/docs/architecture/sla_planner.md b/docs/architecture/sla_planner.md
@@ -7,6 +7,9 @@ The SLA (Service Level Agreement)-based planner is an intelligent autoscaling sy
 > [!NOTE]
 > Currently, SLA-based planner only supports disaggregated setup.
 
+> [!WARNING]
+> Bare metal deployment with local connector is deprecated. The only option to deploy SLA-based planner is via k8s. We will update the examples in this document soon.
+
 ## Features
 
 * **SLA-driven scaling**: Automatically scales prefill/decode workers to meet TTFT and ITL targets
diff --git a/docs/guides/planner_benchmark/README.md b/docs/guides/planner_benchmark/README.md
@@ -19,6 +19,9 @@ limitations under the License.
 
 This guide shows an example of benchmarking `LocalPlanner` performance with synthetic data. In this example, we focus on 8x H100 SXM GPU and `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` model with TP1 prefill and decode engine.
 
+> [!WARNING]
+> Bare metal deployment with local connector is deprecated. The only option to deploy planner is via k8s. We will update the examples in this document soon.
+
 ## Synthetic Data Generation
 
 We first generate synthetic data with varying request rate from 0.75 to 3 using the provided `generate_synthetic_data.py` script.