server, sql: expose closed sessions to ListSessions endpoint#78650
server, sql: expose closed sessions to ListSessions endpoint#78650craig[bot] merged 1 commit intomasterfrom
Conversation
maryliag
left a comment
There was a problem hiding this comment.
can you add tests to verify the new values being returned?
Reviewed 13 of 16 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @gtr)
-- commits, line 11 at r1:
nit: include
pkg/sql/closed_session_cache.go, line 31 at r1 (raw file):
"sql.closed_session_cache.capacity", "the maximum number of sessions in the cache", 1000, // TODO(gtr): Totally arbitrary for now, adjust later.
have you done any testing to adjust this number?
pkg/ui/workspaces/cluster-ui/src/sessions/sessionsPage.fixture.ts, line 50 at r1 (raw file):
max_alloc_bytes: Long.fromNumber(10240), active_queries: [], status: Status.OPEN,
can you add some examples with status closed?
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/server/status.go, line 202 at r1 (raw file):
registry := b.sessionRegistry sessions := registry.SerializeAll() closedSessions := b.closedSessionCache.GetSerializedSessions()
Just want to point out a subtle concurrency issue here.
// the mutex on the registry is dropped after function returns,
// suppose we have a sessionA in `sessions` since sessionA is active.
sessions := registry.SerializeAll()
/*
* sessionA finishes. Since mutex on the active session registry is dropped, sessionA moves from
* the active registry into the closed session cache.
*/
// closedSessions now contains sessionA. The API response now will contain two entries of
// sessionA showing that sessionA is both "active" and "closed" at the same time.
closedSessions := b.closedSessionCache.GetSerializedSessions()You might want to do some manual mutex management here to prevent that from happening.
pkg/sql/conn_executor.go, line 1779 at r1 (raw file):
ex.planner.extendedEvalCtx.setSessionID(ex.sessionID) defer ex.server.cfg.SessionRegistry.deregister(ex.sessionID, ex.queryCancelKey) defer ex.server.cfg.ClosedSessionCache.add(ex.sessionID, ex.serialize())
optional style nit: i'd just put two function into a single defer block
defer func() {
ex.server.cfg.SessionRegistry.deregistry()
ex.server.cfg.ClosedSessionCache.add()
}()fcfbed3 to
14b4783
Compare
14b4783 to
62137e9
Compare
gtr
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/server/status.go, line 202 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Just want to point out a subtle concurrency issue here.
// the mutex on the registry is dropped after function returns, // suppose we have a sessionA in `sessions` since sessionA is active. sessions := registry.SerializeAll() /* * sessionA finishes. Since mutex on the active session registry is dropped, sessionA moves from * the active registry into the closed session cache. */ // closedSessions now contains sessionA. The API response now will contain two entries of // sessionA showing that sessionA is both "active" and "closed" at the same time. closedSessions := b.closedSessionCache.GetSerializedSessions()You might want to do some manual mutex management here to prevent that from happening.
Good catch! I locked the session registry while we get the serialized sessions from the closed session cache.
pkg/sql/conn_executor.go, line 1779 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
optional style nit: i'd just put two function into a single defer block
defer func() { ex.server.cfg.SessionRegistry.deregistry() ex.server.cfg.ClosedSessionCache.add() }()
Done.
knz
left a comment
There was a problem hiding this comment.
I dont see any memory accounting for the new structure. Where can I find it?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
62137e9 to
2e9733b
Compare
There was a problem hiding this comment.
Thanks for the heads up! I embedded a*mon.BytesMonitor object inside ClosedSessionCache.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/ui/workspaces/cluster-ui/src/sessions/sessionsPage.fixture.ts, line 50 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
can you add some examples with status closed?
I think the scope of the PR is getting a bit long; should these tests be added when i get to the UI part? These were added simply to get rid of compilation errors for the newly added status field in serverpb.Session. Also note the new IDLE status, which is defined in pkg/sql/conn_executor.go:
status := serverpb.Session_IDLE
if len(activeQueries) > 0 {
status = serverpb.Session_OPEN
}fc214c7 to
6f7188b
Compare
579f8b1 to
3e5d453
Compare
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 1 of 11 files at r2, 1 of 12 files at r3, 17 of 22 files at r5, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/server/status.go, line 2757 at r5 (raw file):
} sessions := nodeResp.([]serverpb.Session) sort.Slice(sessions, func(i, j int) bool {
nit: you don't need to do this sort since you are sorting again after appending the sessions
pkg/ui/workspaces/cluster-ui/src/sessions/sessionsPage.fixture.ts, line 50 at r1 (raw file):
Previously, gtr (Gerardo Torres Castro) wrote…
I think the scope of the PR is getting a bit long; should these tests be added when i get to the UI part? These were added simply to get rid of compilation errors for the newly added
statusfield inserverpb.Session. Also note the newIDLEstatus, which is defined inpkg/sql/conn_executor.go:status := serverpb.Session_IDLE if len(activeQueries) > 0 { status = serverpb.Session_OPEN }
Sure, they can be added during the UI PR. But I would like to see on the PR description some examples of the responses that include open, idle and closed sessions.
pkg/server/status_test.go, line 3377 at r5 (raw file):
} require.Equal(t, 5, open)
Avoid using the same value for 2 different things, because they might be returned switched values and you wouldn't know. Make one value for open and another for closed
Azhng
left a comment
There was a problem hiding this comment.
Reviewed 1 of 16 files at r1, 4 of 22 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/server/status.go, line 202 at r1 (raw file):
Previously, gtr (Gerardo Torres Castro) wrote…
Good catch! I locked the session registry while we get the serialized sessions from the closed session cache.
Hmm, you might want to lock the sessionRegistry before calling .SerializeAll() (this will result in deadlock, but you can create a new function called .SerializedAllLocked().
The code currently still have unprotected critical section (between .SerializeAll() and the .Lock()) which can result in race condition.
pkg/sql/closed_session_cache.go, line 176 at r5 (raw file):
size += 16 // size of ClusterWideID size += n.data.Size() // size of serverpb.Session size += 8 // size of time.Time
You sure time.Time only takes 8 bytes? Seems like it has 2 * 64 bit integer + a pointer type.
type Time struct {
wall uint64
ext int64
loc *Location
}A trick here would be
unsafe.Sizeof(time.Time{})pkg/server/status_test.go, line 3265 at r5 (raw file):
ReplicationMode: base.ReplicationManual, ServerArgs: base.TestServerArgs{ Insecure: true,
Why are we spinning up an insecure cluster here?
pkg/server/status_test.go, line 3274 at r5 (raw file):
doSessionsRequest := func(client http.Client, username string) listSessionsResponse { req, err := http.NewRequest("GET", ts1.AdminURL()+"/_status/sessions", nil)
we have serverutils.GetJSONProto for all this boilerplates
pkg/server/status_test.go, line 3302 at r5 (raw file):
RawQuery: "sslmode=disable", } db, err := gosql.Open("postgres", pgURL.String())
have you tried serverutils.OpenDBConnE :P
pkg/server/status_test.go, line 3311 at r5 (raw file):
// Create three users. users := [3]string{"test_user_a", "test_user_b", "test_user_c"} conn, err := testCluster.ServerConn(0).Conn(ctx)
Can we just use testCluster.ServerConn(0) here? iirc .Conn(ctx) may or may not create a new connection.
pkg/server/status_test.go, line 3324 at r5 (raw file):
if err != nil { t.Fatal(err) }
small nit: require.NoError
pkg/server/status_test.go, line 3349 at r5 (raw file):
targetDB := getUserConn(t, user, testCluster.Server(0)) defer targetDB.Close() sqlutils.MakeSQLRunner(targetDB).Exec(t, `SELECT version()`)
Did you have to execute something for the session to be consider active?
pkg/server/status_test.go, line 3353 at r5 (raw file):
} time.Sleep(500 * time.Millisecond)
Hmm why the half a second sleep here?
3e5d453 to
3dea7f8
Compare
gtr
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/server/status.go, line 202 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Hmm, you might want to lock the
sessionRegistrybefore calling.SerializeAll()(this will result in deadlock, but you can create a new function called.SerializedAllLocked().The code currently still have unprotected critical section (between
.SerializeAll()and the.Lock()) which can result in race condition.
Good catch! I added a method SerializedAllLocked() to the session registry and kept the session registry locked until we serialize all the sessions.
Code quote:
// Before retrieving all the closed sessions, prevent any changes to the
// session registry by locking the session registry. Without this, a sessionpkg/server/status.go, line 2757 at r5 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: you don't need to do this sort since you are sorting again after appending the sessions
Done!
Code quote:
sort.Slice(sessions, func(i, j int) bool {pkg/server/status_test.go, line 3265 at r5 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Why are we spinning up an insecure cluster here?
I based this test off TestListSessionsV2 which uses an insecure connection but no reason other than that. I removed the insecure setting and added passwords for each user.
pkg/server/status_test.go, line 3274 at r5 (raw file):
Previously, Azhng (Archer Zhang) wrote…
we have
serverutils.GetJSONProtofor all this boilerplates
Thanks! Simplifies the code a lot :)
pkg/server/status_test.go, line 3302 at r5 (raw file):
Previously, Azhng (Archer Zhang) wrote…
have you tried
serverutils.OpenDBConnE:P
I might be wrong on this but it looks like serverutils.OpenDBConnE opens a gosql DB connection using the root user:
pgURL, cleanupGoDB, err := sqlutils.PGUrlE(
sqlAddr, "StartServer" /* prefix */, url.User(security.RootUser))but I want to open a connection using a specific username and password combination.
Code quote:
db, err := gosql.Open("postgres", pgURL.String())pkg/server/status_test.go, line 3311 at r5 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Can we just use
testCluster.ServerConn(0)here? iirc.Conn(ctx)may or may not create a new connection.
Good catch, done!
pkg/server/status_test.go, line 3324 at r5 (raw file):
Previously, Azhng (Archer Zhang) wrote…
small nit: require.NoError
Done.
pkg/server/status_test.go, line 3349 at r5 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Did you have to execute something for the session to be consider active?
If you mean "active" as in a session is started, yes. But this question made me think of the IDLE status. So I also decided to test if we get the correct response for IDLE vs OPEN sessions by running SELECT pg_sleep(5) to keep a session open. (see PR description).
pkg/server/status_test.go, line 3353 at r5 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Hmm why the half a second sleep here?
No reason, deleted!
Code quote:
time.Sleep(500 * time.Millisecond)pkg/server/status_test.go, line 3377 at r5 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
Avoid using the same value for 2 different things, because they might be returned switched values and you wouldn't know. Make one value for open and another for closed
Got it! Done.
5f731e5 to
63ac9d1
Compare
2e9562c to
5f3b4bd
Compare
Azhng
left a comment
There was a problem hiding this comment.
Nice! Almost there!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/sql/closed_session_cache.go, line 54 at r12 (raw file):
mu struct { syncutil.RWMutex mon *mon.BytesMonitor
nit: you can move the bytes monitor out the mu struct since it has its own mutex to protect it.
For bound accounts, you can optionally assign it an mutex to give the bound account threadsafety.
pkg/sql/closed_session_cache.go, line 91 at r12 (raw file):
node := &sessionNode{id: id, data: session, timestamp: c.timeSrc()} acc := c.mu.mon.MakeBoundAccount()
Hmm I should've caught this earlier, we don't want to make a new bound account each time .add() is called. What we want is to initialize a bound account once when we initialize ClosedSessionCache, and close it once when we are destroying the this object. (however, in this case, ClosedSessionCache has the same object lifetime as the SQL Server, so it's really getting destroyed only when the SQL Server shuts down.
The consequence of initializing bound account inside .add() is that, because of the defer acc.Close(ctx) call, the memory account becomes no-op, which defeat the purpose the memory accounting.
pkg/sql/closed_session_cache.go, line 174 at r12 (raw file):
func (n *sessionNode) size() int { return int(unsafe.Sizeof(time.Time{}))
Ah my bad, what I meant was to replace the hardcoded 8 byte with this line. What I had in mind was
size := 0
size += int(unsafe.Sizeof(ClusterWideID{}))
size += n.data.Size()
size += int(unsafe.Sizeof(time.Time{}))
return sizepkg/server/status_test.go, line 3377 at r5 (raw file):
Previously, gtr (Gerardo Torres Castro) wrote…
Got it! Done.
Hmm I don't see the SucceedSoon wraps? accidentally reverted ?
gtr
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/sql/closed_session_cache.go, line 54 at r12 (raw file):
Previously, Azhng (Archer Zhang) wrote…
nit: you can move the bytes monitor out the
mustruct since it has its own mutex to protect it.For bound accounts, you can optionally assign it an mutex to give the bound account threadsafety.
Done.
pkg/sql/closed_session_cache.go, line 91 at r12 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Hmm I should've caught this earlier, we don't want to make a new bound account each time
.add()is called. What we want is to initialize a bound account once when we initializeClosedSessionCache, and close it once when we are destroying the this object. (however, in this case,ClosedSessionCachehas the same object lifetime as the SQL Server, so it's really getting destroyed only when the SQL Server shuts down.The consequence of initializing bound account inside
.add()is that, because of thedefer acc.Close(ctx)call, the memory account becomes no-op, which defeat the purpose the memory accounting.
Good catch, I initialized the bound account when we initialize the ClosedSessionCache and call c.mu.acc.Grow() in the .add() method.
pkg/sql/closed_session_cache.go, line 174 at r12 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Ah my bad, what I meant was to replace the hardcoded
8byte with this line. What I had in mind wassize := 0 size += int(unsafe.Sizeof(ClusterWideID{})) size += n.data.Size() size += int(unsafe.Sizeof(time.Time{})) return size
My bad, fixed!
pkg/server/status_test.go, line 3377 at r5 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Hmm I don't see the
SucceedSoonwraps? accidentally reverted ?
Yup, added it back.
5f3b4bd to
e68c4f1
Compare
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/sql/closed_session_cache.go, line 72 at r13 (raw file):
ShouldEvict: func(size int, _, _ interface{}) bool { capacity := ClosedSessionCacheCapacity.Get(&st.SV) return int64(size) > capacity
Remember to shrink the bound account when you evicts the older data. Otherwise we would just be leaking memory here. There should be a callback here you can register.
e68c4f1 to
e0927a3
Compare
gtr
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/sql/closed_session_cache.go, line 72 at r13 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Remember to shrink the bound account when you evicts the older data. Otherwise we would just be leaking memory here. There should be a callback here you can register.
Good catch, I used the OnEvictedEntry callback function.
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/sql/closed_session_cache.go, line 174 at r12 (raw file):
Previously, gtr (Gerardo Torres Castro) wrote…
My bad, fixed!
Ugh, doesn't seem like it's fixed 😅
22c1a65 to
9ef6d32
Compare
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
pkg/sql/closed_session_cache.go, line 174 at r12 (raw file):
size := 0
size += int(unsafe.Sizeof(ClusterWideID{}))
size += n.data.Size()
size += int(unsafe.Sizeof(time.Time{}))
return size
Ok got it this time haha
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @Azhng, @gtr, and @maryliag)
|
TFTR! let me just get CI to green |
This is the second phase of #67888 which is to expose closed sessions to the ListSessions endpoint. Previously, the ListSessions endpoint only returned open sessions. This change builds on top of the previous PR to add the `ClosedSessionsCache` and now allows the ListSessions endpoint to also return closed sessions in its response. The `ListSessionsRequest` object was edited to include a flag `exclude_closed_sessions` which is a boolean to exclude closed sessions. If unspecified, defaults to false and closed sessions are included in the response. Additionally, the `serverpb.Session` object was updated to include new `end` and `status` fields which specify the time the session ended and the status (active, idle, closed) of the session, respectively. Release note (api change): `ListSessions` now returns closed sessions in addition to open sessions; `ListSessionsRequest` now has a `exclude_closed_sessions` flag; `serverpb.Session` now has `end` and `status` fields.
9ef6d32 to
1c36b6e
Compare
|
pkg/sql/crdb_internal.go, line 1898 at r16 (raw file):
Are these fields just for clearer distinction now that we're saving closed sessions? I noticed in both sessions table requests you're excluding closed sessions so these tables are still just open sessions. |
gtr
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Azhng, @gtr, @maryliag, and @xinhaoz)
pkg/sql/crdb_internal.go, line 1898 at r16 (raw file):
Previously, xinhaoz (Xin Hao Zhang) wrote…
Are these fields just for clearer distinction now that we're saving closed sessions? I noticed in both sessions table requests you're excluding closed sessions so these tables are still just open sessions.
Yes, they are for clearer distinction. Right now the tables only list open sessions but I plan on also listing closed sessions; I'm saving that for a future PR.
|
bors r+ |
|
Build succeeded: |
This is the second phase of #67888 which is to expose closed sessions to
the ListSessions endpoint.
Previously, the ListSessions endpoint only returned open sessions. This
change builds on top of the previous PR to add the
ClosedSessionsCacheand now allows the ListSessions endpoint to also return closed sessions
in its response. The
ListSessionsRequestobject was edited to include a flagexclude_closed_sessionswhich is a boolean to exclude closed sessions. Ifunspecified, defaults to false and closed sessions are included in the
response.
Additionally, the
serverpb.Sessionobject was updatedto include new
endandstatusfields which specify the time thesession ended and the status (opened, closed) of the session,
respectively.
To test the changes, I made a test
TestListClosedSessionswhich creates threeusers. For each user, we create 10 closed sessions, 5 open sessions, and 3 idle
sessions.
The closed sessions are created by opening 10 DB connections and then
closing them right after:
The open sessions are created by opening up 5 DB connections and running
pg_sleep(30)concurrently:The idle sessions are created by opening up 3 DB connections and running
SELECT version()(which finishes executing almost instantaneously) butdeferring the
db.Close()call until the end of the test:These are the results for the three users:
Release note (api change):
ListSessionsnow returns closed sessions inaddition to open sessions;
ListSessionsRequestnow has aexclude_closed_sessionsflag;serverpb.Sessionnow hasendandstatusfields.