Skip to content

test: TestStartConfigWatcher flaky test#816

Closed
googs1025 wants to merge 1 commit intoenvoyproxy:mainfrom
googs1025:fix/flaky
Closed

test: TestStartConfigWatcher flaky test#816
googs1025 wants to merge 1 commit intoenvoyproxy:mainfrom
googs1025:fix/flaky

Conversation

@googs1025
Copy link
Copy Markdown
Contributor

Description

this commit try to fix TestStartConfigWatcher flaky test

Related Issues/PRs (if applicable)
Fixes #701

Special notes for reviewers (if applicable)
None

@googs1025 googs1025 requested a review from a team as a code owner July 2, 2025 11:51
@googs1025
Copy link
Copy Markdown
Contributor Author

before change:

use stress to test flaky test for 500 times

3m5s: 460 runs so far, 22 failures (4.78%), 10 active
3m10s: 470 runs so far, 22 failures (4.68%), 10 active
3m15s: 490 runs so far, 22 failures (4.49%), 10 active

/var/folders/j3/b8896xf92g7d60x2ghdpcnd40000gn/T/go-stress-20250702T002815-2264074391
2025/07/02 00:31:30 INFO Serving list of declared models
2025/07/02 00:31:30 ERROR unknown request type request=<nil>
2025/07/02 00:31:30 ERROR cannot receive stream request error="some error"
2025/07/02 00:31:30 ERROR error processing request message error="rpc error: code = Internal desc = missing xds.upstream_host_metadata in request"
2025/07/02 00:31:32 ERROR cannot get processor error="no processor defined for path: /unknown"
--- FAIL: TestStartConfigWatcher (1.00s)
    watcher_test.go:171: 
                Error Trace:    /Users/zhenyu.jiang/go/src/golanglearning/new_project/ai-gateway/internal/extproc/watcher_test.go:171
                Error:          Not equal: 
                                expected: 3
                                actual  : 4
                Test:           TestStartConfigWatcher
FAIL


ERROR: exit status 1

3m20s: 500 runs so far, 23 failures (4.60%), 10 active

/var/folders/j3/b8896xf92g7d60x2ghdpcnd40000gn/T/go-stress-20250702T002815-1094693002
2025/07/02 00:31:34 INFO Serving list of declared models
2025/07/02 00:31:34 ERROR unknown request type request=<nil>
2025/07/02 00:31:34 ERROR cannot receive stream request error="some error"
2025/07/02 00:31:34 ERROR error processing request message error="rpc error: code = Internal desc = missing xds.upstream_host_metadata in request"
2025/07/02 00:31:36 ERROR cannot get processor error="no processor defined for path: /unknown"
--- FAIL: TestStartConfigWatcher (0.90s)
    watcher_test.go:171: 
                Error Trace:    /Users/zhenyu.jiang/go/src/golanglearning/new_project/ai-gateway/internal/extproc/watcher_test.go:171
                Error:          Not equal: 
                                expected: 3
                                actual  : 2
                Test:           TestStartConfigWatcher
FAIL


ERROR: exit status 1

3m25s: 510 runs so far, 24 failures (4.71%), 10 active

after this change:

...

ERROR: exit status 1

2m35s: 394 runs so far, 4 failures (1.01%), 10 active
2m40s: 404 runs so far, 4 failures (0.99%), 10 active
2m45s: 415 runs so far, 4 failures (0.96%), 10 active

/var/folders/j3/b8896xf92g7d60x2ghdpcnd40000gn/T/go-stress-20250702T194107-3561783169
2025/07/02 19:43:46 INFO Serving list of declared models
2025/07/02 19:43:46 ERROR unknown request type request=<nil>
2025/07/02 19:43:46 ERROR cannot receive stream request error="some error"
2025/07/02 19:43:46 ERROR error processing request message error="rpc error: code = Internal desc = missing xds.upstream_host_metadata in request"
2025/07/02 19:43:48 ERROR cannot get processor error="no processor defined for path: /unknown"
--- FAIL: TestStartConfigWatcher (3.71s)
    watcher_test.go:171: 
                Error Trace:    /Users/zhenyu.jiang/go/src/golanglearning/new_project/ai-gateway/internal/extproc/watcher_test.go:171
                Error:          Condition never satisfied
                Test:           TestStartConfigWatcher
FAIL


ERROR: exit status 1

2m50s: 432 runs so far, 5 failures (1.15%), 10 active
2m55s: 443 runs so far, 5 failures (1.12%), 10 active
3m0s: 454 runs so far, 5 failures (1.10%), 10 active
3m5s: 467 runs so far, 5 failures (1.07%), 10 active
3m10s: 483 runs so far, 5 failures (1.03%), 10 active
3m15s: 494 runs so far, 5 failures (1.01%), 10 active
3m20s: 506 runs so far, 5 failures (0.98%), 10 active

@googs1025 googs1025 changed the title fix: TestStartConfigWatcher flaky test test: TestStartConfigWatcher flaky test Jul 2, 2025
Signed-off-by: googs1025 <googs1025@gmail.com>
Copy link
Copy Markdown
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are still see flake. you can use go test ./internal/extproc -run=TestStartConfigWatcher -count=10000 to repeatedly run the test. Let me know if locally the command passes. Until then i am marking this PR not ready for review (draft).

Comment on lines +82 to +83
cw.mu.Lock()
defer cw.mu.Unlock()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe you shouldn't need the change like this whose purpose is only for testing. The code path itself here is not concurrently called, so this mutex shouldn't exist.

@mathetake mathetake marked this pull request as draft July 2, 2025 15:26
@mathetake mathetake self-assigned this Jul 2, 2025
@mathetake
Copy link
Copy Markdown
Member

i went ahead and fixed this in #843

@mathetake mathetake closed this Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test flake: TestStartConfigWatcher

2 participants