Skip to content

server: close sql rows to fix unstable test#30306

Merged
ti-chi-bot merged 3 commits intopingcap:masterfrom
hawkingrei:remove_parallel
Dec 7, 2021
Merged

server: close sql rows to fix unstable test#30306
ti-chi-bot merged 3 commits intopingcap:masterfrom
hawkingrei:remove_parallel

Conversation

@hawkingrei
Copy link
Member

@hawkingrei hawkingrei commented Dec 1, 2021

Signed-off-by: Weizhen Wang wangweizhen@pingcap.com

What problem does this PR solve?

Problem Summary:

What is changed and how it works?

server: fix unstable test by removing parallel in test

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Dec 1, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • djshow832
  • mjonss

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 1, 2021
@hawkingrei
Copy link
Member Author

hawkingrei commented Dec 1, 2021

Don't worry about that this PR is too big to review. It is just remove Parallel and remove those cases to tidb_serial_test.go

@hawkingrei hawkingrei changed the title server: fix unstable test by removing parallel in test server: fix unstable test by removing parallel Dec 1, 2021
@hawkingrei
Copy link
Member Author

when Parallel run test, The port of NewServer is not enough random so that it lead to fail to startup Server.

Copy link
Contributor

@mjonss mjonss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just one question about a potentially missed t.Parallel()

@ti-chi-bot ti-chi-bot added status/LGT1 Indicates that a PR has LGTM 1. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 1, 2021
@hawkingrei hawkingrei force-pushed the remove_parallel branch 2 times, most recently from 4b9f974 to b6a2cc8 Compare December 1, 2021 08:07
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2021
@hawkingrei
Copy link
Member Author

@tisonkun @kennytm @wjhuang2016 PTAL

Copy link
Member

@wjhuang2016 wjhuang2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please compare the test time for the server package.

@hawkingrei
Copy link
Member Author

/run-check_dev_2

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 1, 2021
@djshow832
Copy link
Contributor

There are 3 more tests not covered: #30303, #30256, #30286. Are you going to file another PR for them?

@djshow832
Copy link
Contributor

djshow832 commented Dec 1, 2021

TestTopSQLAgent fails in CI, which runs serially.

@djshow832
Copy link
Contributor

when Parallel run test, The port of NewServer is not enough random so that it lead to fail to startup Server.

createTidbTestSuite calls waitUntilServerOnline. If the server failed to startup, I don't understand why there is no log reporting failed to connect DB in every 10 ms.

@mjonss
Copy link
Contributor

mjonss commented Dec 1, 2021

when Parallel run test, The port of NewServer is not enough random so that it lead to fail to startup Server.

createTidbTestSuite calls waitUntilServerOnline. If the server failed to startup, I don't understand why there is no log reporting failed to connect DB in every 10 ms.

Oh, waitUntilServerOnline (waitUntilServerCanConnect) does only do sql.Open, which does not do the connection, it would need db.Ping() to actually make the connection!

@djshow832
Copy link
Contributor

djshow832 commented Dec 1, 2021

when Parallel run test, The port of NewServer is not enough random so that it lead to fail to startup Server.

createTidbTestSuite calls waitUntilServerOnline. If the server failed to startup, I don't understand why there is no log reporting failed to connect DB in every 10 ms.

Oh, waitUntilServerOnline (waitUntilServerCanConnect) does only do sql.Open, which does not do the connection, it would need db.Ping() to actually make the connection!

I see, I checked sql.Open and found it doesn't do the connection in the current goroutine.
I think we should modify waitUntilServerOnline to do the ping and restart a server if the ping fails. In this way, the CI still runs in parallel, which is faster.

@djshow832
Copy link
Contributor

djshow832 commented Dec 1, 2021

Will net.Listen return an error if the port is occupied? @mjonss I guess so, but the case should fail if net.Listen fails:

if s.listener, err = net.Listen(tcpProto, addr); err != nil {
	return nil, errors.Trace(err)
}
server, err := NewServer(cfg, ts.tidbdrv)
require.NoError(t, err)

@hawkingrei
Copy link
Member Author

/hold

@ti-chi-bot ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2021
@hawkingrei hawkingrei force-pushed the remove_parallel branch 2 times, most recently from 568a6cd to a1f8c02 Compare December 6, 2021 06:39
@hawkingrei
Copy link
Member Author

@bb7133 @djshow832 PTAL

Copy link
Member Author

@hawkingrei hawkingrei Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestSocket need a custom link checker.

@mjonss
Copy link
Contributor

mjonss commented Dec 6, 2021

Will net.Listen return an error if the port is occupied? @mjonss I guess so, but the case should fail if net.Listen fails:

if s.listener, err = net.Listen(tcpProto, addr); err != nil {
	return nil, errors.Trace(err)
}
server, err := NewServer(cfg, ts.tidbdrv)
require.NoError(t, err)

(for reference 😃 ) yes, if the Listen() call fail it will also fail to start the server:

[2021/12/06 13:15:45.137 +01:00] [FATAL] [main.go:660] ["failed to create the server"] [error="listen tcp 0.0.0.0:4000: bind: address already in use"] [stack="main.createServer\n\trepos/tidb/tidb-server/main.go:660\nmain.main\n\trepos/tidb/tidb-server/main.go:199\nruntime.main\n\t/usr/lib/go-1.17/src/runtime/proc.go:255"] [stack="main.createServer\n\trepos/tidb/tidb-server/main.go:660\nmain.main\n\trepos/tidb/tidb-server/main.go:199\nruntime.main\n\t/usr/lib/go-1.17/src/runtime/proc.go:255"]

Copy link
Contributor

@mjonss mjonss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestOnlySocket fails when I try to run it:

--- FAIL: TestOnlySocket (0.79s)
    server_test.go:848: 
        	Error Trace:	server_test.go:848
        	            				tidb_test.go:769
        	            				server_test.go:117
        	            				tidb_test.go:760
        	Error:      	Not equal: 
        	            	expected: "root@localhost"
        	            	actual  : ""
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-root@localhost
        	            	+
        	Test:       	TestOnlySocket

@hawkingrei
Copy link
Member Author

TestOnlySocket fails when I try to run it:

--- FAIL: TestOnlySocket (0.79s)
    server_test.go:848: 
        	Error Trace:	server_test.go:848
        	            				tidb_test.go:769
        	            				server_test.go:117
        	            				tidb_test.go:760
        	Error:      	Not equal: 
        	            	expected: "root@localhost"
        	            	actual  : ""
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-root@localhost
        	            	+
        	Test:       	TestOnlySocket

it is a bug. I have fixed.

@hawkingrei
Copy link
Member Author

hawkingrei commented Dec 7, 2021

@bb7133 @djshow832 @mjonss PTAL

Copy link
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bb7133
Copy link
Member

bb7133 commented Dec 7, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

DetailsCommit hash: abf158d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants