[serve] Fix buffered logging reusing request context (Fixes #55851) by vaishdho1 · Pull Request #56094 · ray-project/ray

vaishdho1 · 2025-08-29T20:31:55Z

Why are these changes needed?

Currently, when Serve file logs are buffered via a MemoryHandler, ServeContextFilter fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing.
This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.

Related issue number

Closes #55851

Performance Testing

Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: Avg: 396.69(ms), P99: 580(ms), RPS: 228.4
Without fix: 391.29(ms), P99: 560(ms), RPS: 239

With buffering:
set RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE = 1000
With fix: Avg(ms): 400.83, P99(ms): 620, RPS: 230.5
Without fix: Avg(ms): 373.25, P99(ms): 610, RPS: 249.4

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ct#55851) Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

gemini-code-assist

Code Review

This pull request effectively addresses the issue of buffered logs reusing request contexts by capturing the context at emit time. The approach of wrapping logger methods for buffered logs is sound, and making ServeContextFilter idempotent is a necessary change. My feedback includes a couple of suggestions to improve the robustness of the new wrap_logger_for_buffering function to prevent potential side effects and make its signature more explicit.

python/ray/serve/_private/logging_utils.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>

abrarsheikh · 2025-08-30T05:59:14Z

Thanks for the analysis on this.

When i compare:

Buffering with fix Avg(ms): 400.83, P99(ms): 620, RPS: 230.5
No-buffering without fix 391.29(ms), P99: 560(ms), RPS: 239

I can conclude that it's better to remove MemoryHandler than to apply this fix. The reason why MemoryHandler was added was to improve performance. Can you think of a more optimal solution, if not I suggest we drop the memory handle. @akyang-anyscale, any other ideas?

vaishdho1 · 2025-08-31T06:45:26Z

I tried running these tests 4-5 times more. I found the following stats.

Case Requests Avg P99 RPS
Buffer-original 33372 345.32. 540. 291
Buffer-Fix 31857 349.41 550 281.9
Nonbuffer-original. 38024 381. 560. 253.6
Nonbuffer-fix 31662. 376.73. 550. 249.7

For different runs the latency slightly differs so these were the best figures after running it 4-5 times.

There is another method I can think of for this. We can add the ServeContextFilter() to the memory handler for the buffered case instead of adding it to the file handler directly. This will take care of adding context in the buffered case when the filter is added to the memory handler. But I need to check latency here.
Another method is to use the Log Record factory and add the context to the log record. This logic is already implemented inside https://github.com/ray-project/ray/blob/master/python/ray/_private/log.py#L71 for adding custom time to all logs. Something similar can be used inside serve for context. But, I am not sure how this will effect the flow. I need to look at this.

I also have a couple of questions here

The example I checked is a very small application with just 2 deployments so if there is a bigger application with more deployments and more logging, the latency might increase without buffering right? So the memory handler can help in this case.
I also wanted to know what is the threshold for the different parameters after which we decide the latency is large?

vaishdho1 · 2025-09-02T18:15:09Z

I have analyzed adding ServeContextFilter directly to the memory handler.
In this case, the filter is added to the memory handler directly instead of the file handler since the file handler should always move through the memory handler for both buffered and non buffered cases.

I have removed adding the filter to the file handler

if(
     logging_config.enable_access_log or
     RAY_SERVE_ENABLE_JSON_LOGGING or 
     logging_config.encoding == EncodingType.JSON
   ):
      memory_handler.addFilter(ServeContextFilter())

I feel this method is more robust since we are not explicitly adding wrappers around specific logging levels.

The latency comparison is shown below:

Method                          Average latency       P99        RPS
Original+buffering                324.68              550         280.9    
Original+ Nobuffering             373.72              550         250.9
Fix+ buffering                      330               530         307.6
Fix+ No buffering                  359.83             540         272.4

python/ray/serve/_private/logging_utils.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

vaishdho1 · 2025-09-04T01:22:39Z

Fixed the code with the changes and benchmarked the results here.

Method.             Requests      Avg      P99      RPS
Buffering+fix         25643       269.8      440      350.1
Nobuffering+fix       24844       308        460      305.4
Buffering+original    25582       271.6      460      335
Nobuffering+original  24818       303.94     470     303.94

abrarsheikh · 2025-09-04T01:36:36Z

Any explanation for why Buffering+fix performs better?

Let's add a test in test_logging_utils.py to make sure when buffering is used request_id is not duplicated.

vaishdho1 · 2025-09-04T18:17:43Z

The performance of buffer+original and buffer+fix is almost similar. Sometimes one performs better than the other but they are very close. I don't see a concrete reason behind the difference because they are effectively doing the same thing but in a different order.

For the test case, I am thinking of implementing a small deployment with logging and sending requests(>buffer_Size) with buffering enabled. I will then count the occurrences of replica_ids in the logs(system and application) generated?
This would be sufficient for the use case right?
I can add this directly under ray/serve/tests/test_logging.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

vaishdho1 · 2025-09-04T22:52:20Z

Added a test which checks reuse of request ids for buffering case

python/ray/serve/tests/test_logging.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

python/ray/serve/tests/test_logging.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

vaishdho1 · 2025-09-05T17:02:11Z

Added (request_id, message) pairs for counting. This takes care of uniqueness as there are only three unique messages which are different , if request ids repeat this will catch any duplicates.

python/ray/serve/_private/logging_utils.py

python/ray/serve/tests/test_logging.py

added `wait_for_condition` before checking logs Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>

abrarsheikh · 2025-09-05T23:41:16Z

python/ray/serve/tests/test_logging.py

+
+    logs_dir = get_serve_logs_dir()
+
+def check_logs():


bad indentation.

Yes, re committing this

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

abrarsheikh

thank you. You just fixed a high priority bug for us.

…ct#55851) (ray-project#56094) ## Why are these changes needed? Currently, when Serve file logs are buffered via a `MemoryHandler`, `ServeContextFilter` fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing. This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.  ## Related issue number Closes ray-project#55851 ## Performance Testing Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix. Performance- Used Locust with 100 users for a duration of 3-4 mins Without buffering: With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4` Without fix: `391.29(ms), P99: 560(ms), RPS: 239` With buffering: set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000 With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5` Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: sampan <sampan@anyscale.com>

…ct#55851) (ray-project#56094) ## Why are these changes needed? Currently, when Serve file logs are buffered via a `MemoryHandler`, `ServeContextFilter` fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing. This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.  ## Related issue number Closes ray-project#55851 ## Performance Testing Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix. Performance- Used Locust with 100 users for a duration of 3-4 mins Without buffering: With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4` Without fix: `391.29(ms), P99: 560(ms), RPS: 239` With buffering: set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000 With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5` Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>

…ct#55851) (ray-project#56094) ## Why are these changes needed? Currently, when Serve file logs are buffered via a `MemoryHandler`, `ServeContextFilter` fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing. This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.  ## Related issue number Closes ray-project#55851 ## Performance Testing Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix. Performance- Used Locust with 100 users for a duration of 3-4 mins Without buffering: With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4` Without fix: `391.29(ms), P99: 560(ms), RPS: 239` With buffering: set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000 With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5` Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>

…ct#55851) (ray-project#56094) ## Why are these changes needed? Currently, when Serve file logs are buffered via a `MemoryHandler`, `ServeContextFilter` fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing. This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.  ## Related issue number Closes ray-project#55851 ## Performance Testing Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix. Performance- Used Locust with 100 users for a duration of 3-4 mins Without buffering: With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4` Without fix: `391.29(ms), P99: 560(ms), RPS: 239` With buffering: set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000 With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5` Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: zac <zac@anyscale.com>

…56094) ## Why are these changes needed? Currently, when Serve file logs are buffered via a `MemoryHandler`, `ServeContextFilter` fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing. This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.  ## Related issue number Closes #55851 ## Performance Testing Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix. Performance- Used Locust with 100 users for a duration of 3-4 mins Without buffering: With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4` Without fix: `391.29(ms), P99: 560(ms), RPS: 239` With buffering: set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000 With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5` Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

…ct#55851) (ray-project#56094) ## Why are these changes needed? Currently, when Serve file logs are buffered via a `MemoryHandler`, `ServeContextFilter` fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing. This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.  ## Related issue number Closes ray-project#55851 ## Performance Testing Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix. Performance- Used Locust with 100 users for a duration of 3-4 mins Without buffering: With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4` Without fix: `391.29(ms), P99: 560(ms), RPS: 239` With buffering: set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000 With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5` Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>

[serve] Fix buffered logging reusing request context (Fixes ray-proje…

36cb30f

…ct#55851) Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

vaishdho1 requested a review from a team as a code owner August 29, 2025 20:31

Merge branch 'master' into serve-logging-buffer-fix

09d05ca

gemini-code-assist bot reviewed Aug 29, 2025

View reviewed changes

python/ray/serve/_private/logging_utils.py Outdated Show resolved Hide resolved

python/ray/serve/_private/logging_utils.py Outdated Show resolved Hide resolved

Update python/ray/serve/_private/logging_utils.py

8b1a392

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>

ray-gardener bot added serve Ray Serve Related Issue community-contribution Contributed by the community labels Aug 30, 2025

abrarsheikh reviewed Sep 3, 2025

View reviewed changes

python/ray/serve/_private/logging_utils.py Outdated Show resolved Hide resolved

[serve] Moved file_handler filters into MemoryHandler

bd281e4

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

[serve] Request id uniqueness test for buffered logs

40fb04e

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

abrarsheikh reviewed Sep 4, 2025

View reviewed changes

python/ray/serve/tests/test_logging.py Outdated Show resolved Hide resolved

abrarsheikh reviewed Sep 4, 2025

View reviewed changes

python/ray/serve/tests/test_logging.py Outdated Show resolved Hide resolved

abrarsheikh requested a review from akyang-anyscale September 4, 2025 23:00

[serve] Fixed the requests loop with a single value

cbff0f5

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>