Skip to content

[serve] Fix buffered logging reusing request context (Fixes #55851)#56094

Merged
zcin merged 9 commits intoray-project:masterfrom
vaishdho1:serve-logging-buffer-fix
Sep 6, 2025
Merged

[serve] Fix buffered logging reusing request context (Fixes #55851)#56094
zcin merged 9 commits intoray-project:masterfrom
vaishdho1:serve-logging-buffer-fix

Conversation

@vaishdho1
Copy link
Copy Markdown
Contributor

Why are these changes needed?

Currently, when Serve file logs are buffered via a MemoryHandler, ServeContextFilter fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing.
This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.

Related issue number

Closes #55851

Performance Testing

Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: Avg: 396.69(ms), P99: 580(ms), RPS: 228.4
Without fix: 391.29(ms), P99: 560(ms), RPS: 239

With buffering:
set RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE = 1000
With fix: Avg(ms): 400.83, P99(ms): 620, RPS: 230.5
Without fix: Avg(ms): 373.25, P99(ms): 610, RPS: 249.4

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

…ct#55851)

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@vaishdho1 vaishdho1 requested a review from a team as a code owner August 29, 2025 20:31
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the issue of buffered logs reusing request contexts by capturing the context at emit time. The approach of wrapping logger methods for buffered logs is sound, and making ServeContextFilter idempotent is a necessary change. My feedback includes a couple of suggestions to improve the robustness of the new wrap_logger_for_buffering function to prevent potential side effects and make its signature more explicit.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue community-contribution Contributed by the community labels Aug 30, 2025
@abrarsheikh
Copy link
Copy Markdown
Contributor

Thanks for the analysis on this.

When i compare:

  1. Buffering with fix Avg(ms): 400.83, P99(ms): 620, RPS: 230.5
  2. No-buffering without fix 391.29(ms), P99: 560(ms), RPS: 239

I can conclude that it's better to remove MemoryHandler than to apply this fix. The reason why MemoryHandler was added was to improve performance. Can you think of a more optimal solution, if not I suggest we drop the memory handle. @akyang-anyscale, any other ideas?

@vaishdho1
Copy link
Copy Markdown
Contributor Author

I tried running these tests 4-5 times more. I found the following stats.

Case Requests Avg P99 RPS
Buffer-original 33372 345.32. 540. 291
Buffer-Fix 31857 349.41 550 281.9
Nonbuffer-original. 38024 381. 560. 253.6
Nonbuffer-fix 31662. 376.73. 550. 249.7

For different runs the latency slightly differs so these were the best figures after running it 4-5 times.

  • There is another method I can think of for this. We can add the ServeContextFilter() to the memory handler for the buffered case instead of adding it to the file handler directly. This will take care of adding context in the buffered case when the filter is added to the memory handler. But I need to check latency here.

  • Another method is to use the Log Record factory and add the context to the log record. This logic is already implemented inside https://github.com/ray-project/ray/blob/master/python/ray/_private/log.py#L71 for adding custom time to all logs. Something similar can be used inside serve for context. But, I am not sure how this will effect the flow. I need to look at this.

I also have a couple of questions here

  1. The example I checked is a very small application with just 2 deployments so if there is a bigger application with more deployments and more logging, the latency might increase without buffering right? So the memory handler can help in this case.
  2. I also wanted to know what is the threshold for the different parameters after which we decide the latency is large?

@vaishdho1
Copy link
Copy Markdown
Contributor Author

I have analyzed adding ServeContextFilter directly to the memory handler.
In this case, the filter is added to the memory handler directly instead of the file handler since the file handler should always move through the memory handler for both buffered and non buffered cases.

I have removed adding the filter to the file handler

if(
     logging_config.enable_access_log or
     RAY_SERVE_ENABLE_JSON_LOGGING or 
     logging_config.encoding == EncodingType.JSON
   ):
      memory_handler.addFilter(ServeContextFilter())
   

I feel this method is more robust since we are not explicitly adding wrappers around specific logging levels.

The latency comparison is shown below:

Method                          Average latency       P99        RPS
Original+buffering                324.68              550         280.9    
Original+ Nobuffering             373.72              550         250.9
Fix+ buffering                      330               530         307.6
Fix+ No buffering                  359.83             540         272.4

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@vaishdho1
Copy link
Copy Markdown
Contributor Author

Fixed the code with the changes and benchmarked the results here.

Method.             Requests      Avg      P99      RPS
Buffering+fix         25643       269.8      440      350.1
Nobuffering+fix       24844       308        460      305.4
Buffering+original    25582       271.6      460      335
Nobuffering+original  24818       303.94     470     303.94   

@abrarsheikh
Copy link
Copy Markdown
Contributor

Any explanation for why Buffering+fix performs better?

Let's add a test in test_logging_utils.py to make sure when buffering is used request_id is not duplicated.

@vaishdho1
Copy link
Copy Markdown
Contributor Author

The performance of buffer+original and buffer+fix is almost similar. Sometimes one performs better than the other but they are very close. I don't see a concrete reason behind the difference because they are effectively doing the same thing but in a different order.

For the test case, I am thinking of implementing a small deployment with logging and sending requests(>buffer_Size) with buffering enabled. I will then count the occurrences of replica_ids in the logs(system and application) generated?
This would be sufficient for the use case right?
I can add this directly under ray/serve/tests/test_logging.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@vaishdho1
Copy link
Copy Markdown
Contributor Author

Added a test which checks reuse of request ids for buffering case

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@vaishdho1
Copy link
Copy Markdown
Contributor Author

vaishdho1 commented Sep 5, 2025

Added (request_id, message) pairs for counting. This takes care of uniqueness as there are only three unique messages which are different , if request ids repeat this will catch any duplicates.

@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Sep 5, 2025
added `wait_for_condition` before checking logs

Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>

logs_dir = get_serve_logs_dir()

def check_logs():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad indentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, re committing this

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Copy link
Copy Markdown
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you. You just fixed a high priority bug for us.

@zcin zcin merged commit 369c780 into ray-project:master Sep 6, 2025
5 checks passed
sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Sep 8, 2025
…ct#55851) (ray-project#56094)

## Why are these changes needed?

Currently, when Serve file logs are buffered via a `MemoryHandler`,
`ServeContextFilter` fetches the serve request context at flush time
instead of when the log record is emitted. As a result, many log records
flushed together can share the same request context, breaking per
request tracing.
This PR captures the request context at emit time when buffering is
enabled and makes the filter idempotent so it won’t overwrite pre
populated fields. This preserves correct per record context for buffered
file logs without changing non buffered behavior.
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#55851

## Performance Testing
Manual Verification - Benchmarked both buffered and non buffered cases
with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4`
Without fix:  `391.29(ms), P99: 560(ms), RPS: 239`

With buffering:
set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000
With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5`
Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
Signed-off-by: sampan <sampan@anyscale.com>
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
…ct#55851) (ray-project#56094)

## Why are these changes needed?

Currently, when Serve file logs are buffered via a `MemoryHandler`,
`ServeContextFilter` fetches the serve request context at flush time
instead of when the log record is emitted. As a result, many log records
flushed together can share the same request context, breaking per
request tracing.
This PR captures the request context at emit time when buffering is
enabled and makes the filter idempotent so it won’t overwrite pre
populated fields. This preserves correct per record context for buffered
file logs without changing non buffered behavior.
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#55851

## Performance Testing
Manual Verification - Benchmarked both buffered and non buffered cases
with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4`
Without fix:  `391.29(ms), P99: 560(ms), RPS: 239`

With buffering:
set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000
With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5`
Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
wyhong3103 pushed a commit to wyhong3103/ray that referenced this pull request Sep 12, 2025
…ct#55851) (ray-project#56094)

## Why are these changes needed?

Currently, when Serve file logs are buffered via a `MemoryHandler`,
`ServeContextFilter` fetches the serve request context at flush time
instead of when the log record is emitted. As a result, many log records
flushed together can share the same request context, breaking per
request tracing.
This PR captures the request context at emit time when buffering is
enabled and makes the filter idempotent so it won’t overwrite pre
populated fields. This preserves correct per record context for buffered
file logs without changing non buffered behavior.
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#55851

## Performance Testing
Manual Verification - Benchmarked both buffered and non buffered cases
with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4`
Without fix:  `391.29(ms), P99: 560(ms), RPS: 239`

With buffering:
set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000
With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5`
Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>
ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
…ct#55851) (ray-project#56094)

## Why are these changes needed?

Currently, when Serve file logs are buffered via a `MemoryHandler`,
`ServeContextFilter` fetches the serve request context at flush time
instead of when the log record is emitted. As a result, many log records
flushed together can share the same request context, breaking per
request tracing.
This PR captures the request context at emit time when buffering is
enabled and makes the filter idempotent so it won’t overwrite pre
populated fields. This preserves correct per record context for buffered
file logs without changing non buffered behavior.
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#55851 

## Performance Testing
Manual Verification - Benchmarked both buffered and non buffered cases
with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4`
Without fix:  `391.29(ms), P99: 560(ms), RPS: 239`

With buffering:
set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000
With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5`
Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`


<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
Signed-off-by: zac <zac@anyscale.com>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
…56094)

## Why are these changes needed?

Currently, when Serve file logs are buffered via a `MemoryHandler`,
`ServeContextFilter` fetches the serve request context at flush time
instead of when the log record is emitted. As a result, many log records
flushed together can share the same request context, breaking per
request tracing.
This PR captures the request context at emit time when buffering is
enabled and makes the filter idempotent so it won’t overwrite pre
populated fields. This preserves correct per record context for buffered
file logs without changing non buffered behavior.
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes #55851

## Performance Testing
Manual Verification - Benchmarked both buffered and non buffered cases
with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4`
Without fix:  `391.29(ms), P99: 560(ms), RPS: 239`

With buffering:
set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000
With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5`
Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`

<!-- For example: "Closes #1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…ct#55851) (ray-project#56094)

## Why are these changes needed?

Currently, when Serve file logs are buffered via a `MemoryHandler`,
`ServeContextFilter` fetches the serve request context at flush time
instead of when the log record is emitted. As a result, many log records
flushed together can share the same request context, breaking per
request tracing.
This PR captures the request context at emit time when buffering is
enabled and makes the filter idempotent so it won’t overwrite pre
populated fields. This preserves correct per record context for buffered
file logs without changing non buffered behavior.
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#55851 

## Performance Testing
Manual Verification - Benchmarked both buffered and non buffered cases
with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4`
Without fix:  `391.29(ms), P99: 560(ms), RPS: 239`

With buffering:
set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000
With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5`
Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`


<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…ct#55851) (ray-project#56094)

## Why are these changes needed?

Currently, when Serve file logs are buffered via a `MemoryHandler`,
`ServeContextFilter` fetches the serve request context at flush time
instead of when the log record is emitted. As a result, many log records
flushed together can share the same request context, breaking per
request tracing.
This PR captures the request context at emit time when buffering is
enabled and makes the filter idempotent so it won’t overwrite pre
populated fields. This preserves correct per record context for buffered
file logs without changing non buffered behavior.
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#55851 

## Performance Testing
Manual Verification - Benchmarked both buffered and non buffered cases
with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: `Avg: 396.69(ms), P99: 580(ms), RPS: 228.4`
Without fix:  `391.29(ms), P99: 560(ms), RPS: 239`

With buffering:
set `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE` = 1000
With fix: `Avg(ms): 400.83, P99(ms): 620, RPS: 230.5`
Without fix: `Avg(ms): 373.25, P99(ms): 610, RPS: 249.4`


<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <38342947+vaishdho1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
@vaishdho1 vaishdho1 deleted the serve-logging-buffer-fix branch December 17, 2025 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Serve] Using RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE reuses replica context across logs

3 participants