server: new HTTP API to execute SQL statements by knz · Pull Request #79663 · cockroachdb/cockroach

knz · 2022-04-08T15:59:42Z

This commit introduces a new API endpoint /api/v2/sql/,
which can execute arbitrary SQL statements over HTTP.

The endpoint is authenticated, i.e. the client must provide
an appropriate X-Cockroach-API-Session: header on each request.

There is a Swagger spec to enable auto-documentation of the API.

The request payload should be submitted using JSON.

Summary of request fields:

database: the current database, defaults to defaultdb
application_name
timeout: max execution time, defaults to 5s
max_results_size: max size of results, defaults to ~10KB.
statements: which statements to run. An array of {"sql",
"arguments"} objects. The number of items in "arguments"
must match the number of placehoders in "sql".
execute: must be set to true to actually execute the SQL.
Otherwise, the request is merely parsed to verify the syntax is correct.

Example use:

Generate some test data and create a user with access to it:

$ cockroach gen example-data intro | cockroach sql --certs-dir=...
$ cockroach sql --certs-dir=... \
    -e 'create user myuser with password abc'
    -e 'grant select on table intro.mytable to myuser'

Generate an HTTP API key, for example using the CLI (an HTTP
request to /api/v2/login is also possible):
```
$ cockroach session-auth login myuser  --certs-dir=certs
```
(this prints out the session key, assign it manually to the $APIKEY env var.)

Run some HTTP API queries from the shell using curl. For example:

$ curl -k -H "X-Cockroach-API-Session: $APIKEY" \
    -H "Content-Type: application/json" \
    --data-binary '{"database":"intro",
    "statements":[{"sql":"SELECT v FROM mytable WHERE l%2 = 0"}],
    "execute": true}' \
    https://127.0.0.1:8080/api/v2/sql/

Example output:

{"num_statements": 1,
 "execution": {
   "retries":0,
   "txn_results": [
{
  "statement": 1,
  "tag": "SELECT",
  "start":"2022-04-08T15:55:30.659257101Z",
  "columns": [{"name":"v", "type":"STRING", "oid":25}],
  "rows": [
   {"v":"..."},
   {"v":"..."},
   {"v":"..."}
  ],
  "end":"2022-04-08T15:55:30.665283699Z"
}
  ]
}}

This currently executes using the "internal executor" and thus
does not support SQL sessions and transaction control. Some consequences:

SET statements are ineffective.
BEGIN/COMMIT/ABORT/SAVEPOINT etc are ineffective.

Security note: We place this logic under /api/v2 and not
/_admin/v1 because the latter uses session cookies and is thus
susceptible to CSRF.
The /api/v2 tree requires an explicit header and thus blocks CSRF.

Release note (api change): A new /api/v2/sql/ endpoints enable
execution of simple SQL queries over HTTP.

cockroach-teamcity · 2022-04-08T15:59:51Z

This change is

rafiss · 2022-04-08T16:42:38Z

Very exciting! Tagging CC-4166 and CC-6259 as related issues.

I wonder if the improvements we plan to make to the internal executor in #71467 and #78998 would help us be able to support SET/BEGIN/ROLLBACK.

Or (maybe this is unrealistic) maybe we could use a normal connExecutor for this, with a mandatory upper limit on idle_session_timeout. And some way of including the session ID in the request body (or if there is no session ID in the request body, then create a new one).

knz · 2022-04-08T17:12:07Z

The way sessions would work, there are two different directions:

one direction suggested by Peter: keep the session object open server-side (and keep the SQL pod alive), and then route the http queries to it using its session ID
one suggestion I came up with: in each request, at the end serialize the current session and send the serialized payload to the HTTP client. Then in the next request the client can send the serialized session in the request, and the server re-hydrates it when it receives the request. This way, we don't need the same SQL pod to handle different requests and we can also switch them off if the HTTP client is idle.

knz · 2022-04-08T17:13:56Z

Regarding CC-4166 and CC-6259:

this PR is aimed to satisfy certain needs by the observability teams, to provide an internal API for the o11y to use. We probably want some flexibility in the design here, as well as the ability to change this new SQL API at our leisure without caring about stability for end-users.

Therefore, I can't say this PR would be a good mach for the needs expressed in the above jira epics.

Nonetheless, if/when someone wishes to explore the data API described in these jira epics, we can perhaps take inspiration from the code here.

knz · 2022-04-08T17:16:49Z

Regarding how we can introduce session mgt inside a single request.

There are two ways:

continue to use InternalExecutor, but rely on sql: add support to the InternalExecutor to execute multiple SQL queries #71467 and sql: internal executor is very confusing about managing transaction state #78998 to provide session mgt "inside it".

This has my personal preference currently (at the time of this writing), because it allows us to consume result rows using tree.Datum and tree.AsJSON.

However, if we go this route we also need to teach InternalExecutor to accept a serialized session as input, and produce the serialized session at the end.
another way suggested by @ajwerner : make the API endpoint use a pgwire loopback connection in-memory, using lib/pq or pgx internally to talk to the SQL service using a regular client session.

The advantage of this is that it's rather simple to implement (simpler than sql: add support to the InternalExecutor to execute multiple SQL queries #71467 / sql: internal executor is very confusing about managing transaction state #78998) but we lose the ability to observe tree.Datum in the API code and thus the production of good row JSON will be more difficult. (But maybe it's OK for a prototype, I'm not sure).

knz · 2022-04-08T17:17:41Z

maybe we could use a normal connExecutor for this, with a mandatory upper limit on idle_session_timeout. And some way of including the session ID in the request body (or if there is no session ID in the request body, then create a new one).

Can you explain why the session ID is important?

rafiss · 2022-04-08T17:26:13Z

Can you explain why the session ID is important?

Nevermind. I missed that the X-Cockroach-API-Session header would already uniquely identify the session (I had thought it was just for auth before). I was thinking of a way to identify the session over the stateless HTTP protocol.

keep the session object open server-side (and keep the SQL pod alive), and then route the http queries to it using its session ID

I feel like a big advantage of this is that it makes the infra (within the SQL instance) of an HTTP-based SQL session quite similar to the infra of a pgwire SQL session. There is the cost of needing new routing logic in SQL proxies, but that feels do-able. We're going to be adding another form of routing there anyway as part of supporting PG cancel in serverless (CRDB-13073).

knz · 2022-04-08T17:28:20Z

Nevermind. I missed that the X-Cockroach-API-Session header would already uniquely identify the session (I had thought it was just for auth before).

It is only for authn, that's right. We're not identifying the SQL session anywhere currently, because this code does not support SQL sessions yet.

If we start implementing sessions, we need to look at the two possible approaches described in this comment:

with the session routing thing, we will need a session ID, yes.
with the session serialization thing, we don't need a session ID; we need a "serialized_session" parameter.

knz · 2022-04-08T17:30:00Z

keep the session object open server-side (and keep the SQL pod alive), and then route the http queries to it using its session ID

I feel like a big advantage of this is that it makes the infra (within the SQL instance) of an HTTP-based SQL session quite similar to the infra of a pgwire SQL session.

Can you explain more?

The big drawback (not to be underestimated!) is that it forces the SQL pod to remain alive. With a HTTP API, we don't receive a clear signal server-side when the browser closes their window. We won't know when to close the session.

This also makes it hard to switch off the sql pod when the session is idle.

In contrast, if we use session serialization, the HTTP client mainains the session state and we can route the request to any server, and switch off idle servers.

petermattis · 2022-04-08T17:56:50Z

The big drawback (not to be underestimated!) is that it forces the SQL pod to remain alive. With a HTTP API, we don't receive a clear signal server-side when the browser closes their window. We won't know when to close the session.

This also makes it hard to switch off the sql pod when the session is idle.

In contrast, if we use session serialization, the HTTP client mainains the session state and we can route the request to any server, and switch off idle servers.

You could imagine a bit of a combination of the two. My understanding is that sqlproxy session migration can only operate when a session is in certain states (i.e. there is not an active transaction). The HTTP API could do similar. If there is an active transaction the session state is a routing identifier. If there is no active transaction the session state is actually the serialized session state.

knz · 2022-04-08T17:58:35Z

We can do what you say Peter yes (although it's twice more complex than just doing 1 thing)

That said:

My understanding is that sqlproxy session migration can only operate when a session is in certain states (i.e. there is not an active transaction).

I also recall a plan to serialize active transactions. So maybe that can help too.

rafiss · 2022-04-09T16:53:23Z

I also recall a plan to serialize active transactions.

There is no plan to work on this anytime soon, but yes, some ideas had been discussed in Andrei's RFC from 2020.

We're not identifying the SQL session anywhere currently, because this code does not support SQL sessions yet.

If we start implementing sessions, we need to look at the two possible approaches described in #79663 (comment):

with the session routing thing, we will need a session ID, yes.

...

Yes, that's what I was suggesting in #79663 (comment) -- use the session ID in the HTTP request to achieve the goal of supporting SQL sessions. I think we're on the same page

The big drawback (not to be underestimated!) is that it forces the SQL pod to remain alive. With a HTTP API, we don't receive a clear signal server-side when the browser closes their window. We won't know when to close the session.

That was why I propose that if we go the route of keeping the connExecutor / SQL pod up, then we should require that the idle_session_timeout for these HTTP based sessions be set to something like 1 hour.

However, I want to discuss this "forces the SQL pod to remain alive" concern more.

In dedicated clusters, this is not a concern. The SQL instance is always on.
In serverless clusters, another way to address this concern is to have sqlproxy be the one to save the session state. It will already have the ability to serialize/deserialize sessions as part of the session migration project. A future direction (which I think has only been discussed in Slack so far), is to take this logic further, and have sqlproxy suspend idle sessions and keep them serialized in sqlproxy in-memory until the user comes back to use them later. This future work is already desirable to complete since it would take down serverless costs even further.

knz · 2022-04-10T12:13:10Z

In serverless clusters, another way to address this concern is to have sqlproxy be the one to save the session state.

So for one this idea requires the HTTP API to use sqlproxy as its back-end, which is a design constraint I don't really want to introduce - I think it would do good if the observability service (which is the consumer of the new API) does not start depending on sqlproxy.

But let's ignore that for a second.

The original argument still holds: the question is really who pays for storing the serialized session state and how long.

If we use sqlproxy to store that session, then CRL is still paying for it. If we flow the session state to the end-user's web browser, then the browser and the end user pay for it.

Then about how long, it's still the same problem. If we use sqlproxy to store the session, we still don't know when the browser has gone away and we need to use some kind of idling mechanism. And also we need to pay for that idle session after the browser is done with it, until the idle timeout expires. And also we can't scale down sqlproxy as much (it will be more disruptive). If we let the end-user's web browser do it, then the session state goes away when they close their browser's tab.

The question of cost matters at scale, when we have millions of concurrent accesses.

knz · 2022-04-13T14:27:14Z

NB: I found a way to lift the restriction so the client can use begin/commit etc, by using sql.SetupConn / sql.ServeConn with a shim that pushes the results through the http connection. will update PR

dhartunian

Reviewed 1 of 1 files at r1, 2 of 6 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @knz)

-- commits, line 86 at r2:
Why is this a concern on this endpoint in particular, and not on all the existing APIs?
Using cookies is much more ergonomic for a browser app so I'd definitely prefer for this API to support that as well.

pkg/server/api_v2_sql.go, line 190 at r2 (raw file):

	// Parse the parameters.
	if err := r.ParseForm(); err != nil {

might be nice to also take args as a JSON object.

pkg/server/api_v2_sql.go, line 237 at r2 (raw file):

	// we saw and call it a day.
	if r.PostForm.Get("execute") == "" {
		w.Header().Add("Content-Type", "text/plain; charset=utf-8")

nit: probably nicer to support JSON on all responses for consistency.

knz · 2022-04-23T10:00:22Z

Why is this a concern on this endpoint in particular, and not on all the existing APIs?

Because the other endpoints do not modify data in the database. They only retrieve it.

@bdarnell can you chime in here? the question at hand is whether we should continue to use a separate header for CSRF protection (as in the PR currently) or whether we can make-do with a session cookie.

Using cookies is much more ergonomic for a browser app so I'd definitely prefer for this API to support that as well.

The entire APIv2 tree uses a separate header (and not a cookie) to avoid CSRF. I'm following pattern here.

bdarnell

Because the other endpoints do not modify data in the database. They only retrieve it.

And this one is unrestricted in its ability to modify the database. It's the most sensitive thing we've ever exposed over HTTP.

bdarnell can you chime in here? the question at hand is whether we should continue to use a separate header for CSRF protection (as in the PR currently) or whether we can make-do with a session cookie.

I'd need someone from the security team to weigh in here. In the past, cookies were not sufficient and an HTTP header was required. The introduction of the samesite cookie attribute was intended to make it feasible to use cookies without worrying about CSRF, but I'm not sure about current real-world deployment.

The entire APIv2 tree uses a separate header (and not a cookie) to avoid CSRF. I'm following pattern here.

I think this is a sufficient justification for sticking with the separate header even if samesite cookies are also a solution. We should be consistent with the rest of APIv2 and (potentially) introduce a browser-friendly mode of API access for everything at once.

The original argument still holds: the question is really who pays for storing the serialized session state and how long.

I think there are two distinct cases: idle session state (mainly SET variables) and transactional state. For transactional state, we don't want to make it too easy to have long-running transactions that hold on to locks, so I think it's fine if open transactions keep the SQL pod alive.

For session variables, it makes sense to keep these on the client side. But it may make even more sense to model them not as a cookie-like blob that the server returns but a set of parameters to the request. If you could just pass session variables in the HTTP request you could use them the same way whether it's the first request of a "session" or not and you don't need to parse the results and retain anything.

If we let the end-user's web browser do it, then the session state goes away when they close their browser's tab.

But we don't reliably find out when they close the browser, which is important for cleaning up transactional state. An API for transactions from the browser should perhaps use websockets instead of HTTP req/resp.

Reviewed 5 of 6 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @knz)

pkg/server/api_v2_sql.go line 49 at r2 (raw file):

// implicit transaction.
//
// If multiple SQL statements are specified, they are executed using a

Multiple SQL statements can make it easier to exploit SQL injection; many drivers place restrictions on the ability to use multiple statements in one string for this reason. We may want to add an argument multi_statement= (default false) so that a client must opt in to this multi-statement mode for a little extra safety.

pkg/server/api_v2_sql.go line 59 at r2 (raw file):

// ---
// parameters:
// - name: sql

Possible future extension: allow the sql string to contain placeholders and have a separate argument array for their values, to allow for safe parameterization without client-side knowledge of the sql grammar.

pkg/server/api_v2_sql.go line 190 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

might be nice to also take args as a JSON object.

Be consistent with the rest of APIv2

pkg/server/api_v2_sql.go line 224 at r2 (raw file):

		dbName = "defaultdb"
	}
	showCols := r.PostForm.Get("show_columns_always") != ""

Why is this an option? Seems to me that it's better to always return the columns even if there are no rows.

Nits: return instead of show, parse it as a bool or 1/0 instead of using empty vs non-empty.

pkg/server/api_v2_sql.go line 236 at r2 (raw file):

	// If the client did not request execution, just print what
	// we saw and call it a day.
	if r.PostForm.Get("execute") == "" {

It's weird that execution defaults to off.

pkg/server/api_v2_sql.go line 237 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

nit: probably nicer to support JSON on all responses for consistency.

Agreed (including all errors)

pkg/server/api_v2_sql.go line 254 at r2 (raw file):

	// We can't use json.Marshal for the outer shell of the results, because
	// we want to stream the results and avoid accumulating results in RAM.
	fmt.Fprintf(w, "{\"num_statements\": %d,\n \"execution\": {\n", len(statements))

Constructing json by hand here makes me very nervous, especially when considering unwinding the stack to output errors in the json payload. You've already got a size limit with a small default; I'd prefer to buffer everything and encode normally. We can consider streaming when/if we decide we need to handle larger result sets (at which time we'll also need to consider how the client will be consuming them - many clients also have difficulties parsing json in a streaming fashion. I think a format other than json may be appropriate at that point, or maybe one of the json wrappers in https://en.wikipedia.org/wiki/JSON_streaming).

pkg/server/api_v2_sql.go line 318 at r2 (raw file):

					}()

					it, err := a.admin.ie.QueryIteratorEx(ctx, "run-query-via-api", txn,

Have we ever used the internal executor for user-specified code before? This seems like a risk of crashing because the user stumbles across something not supported by the internal executor, or invalid security assumptions, etc.

pkg/server/api_v2_sql.go line 334 at r2 (raw file):

							// NB: we need to place the number of rows affected inside
							// quotes because json integers must be 53 bits or less and
							// the row affected count may be 64 bits.

Can the row affected count really be more than 53 bits? That's an awfully big number.

bdarnell

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @knz)

pkg/server/api_v2_sql.go line 59 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Possible future extension: allow the sql string to contain placeholders and have a separate argument array for their values, to allow for safe parameterization without client-side knowledge of the sql grammar.

After a little more thought, this is really important. I wouldn't feel comfortable shipping this without supporting placeholder-based usage.

knz

I think there are two distinct cases: idle session state (mainly SET variables) and transactional state.

I agree on the distinction between idle and live txn. I also believe that the requesters of this feature only care about preserving the idle session state (the same case we already implemented session serialization for).

However, you're incorrect that idle sessions are only about SET variables. They also contain:

prepared statements.
sequence cache.

Additionally, I'm not sure all the remaining fields in the in-memory (and serializable) session state can be re-populated using SET statements. Someone from XP would need to confirm.

If you could just pass session variables in the HTTP request you could use them the same way whether it's the first request of a "session" or not and you don't need to parse the results and retain anything.

I agree we should support setting session variables in the HTTP request. It's not done here yet because we're still using the InternalExecutor, but as soon as that changes we can do session variable overrides.

However if we wish to use this interface to e.g. support a SQL shell, we need capturing more of the session state.

But we don't reliably find out when they close the browser, which is important for cleaning up transactional state.

I'm pretty sure we're not interested in offering a HTTP interface that leaves transactions open (or adds more statements on an open txn).

I've also taken note of your other review comments. I agree with all of them and hope to address them in a later revision to the PR.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @dhartunian, and @knz)

pkg/server/api_v2_sql.go line 254 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Constructing json by hand here makes me very nervous, especially when considering unwinding the stack to output errors in the json payload. You've already got a size limit with a small default; I'd prefer to buffer everything and encode normally. We can consider streaming when/if we decide we need to handle larger result sets (at which time we'll also need to consider how the client will be consuming them - many clients also have difficulties parsing json in a streaming fashion. I think a format other than json may be appropriate at that point, or maybe one of the json wrappers in https://en.wikipedia.org/wiki/JSON_streaming).

The choice I made here is because we're converting the SQL datums to pkg/util/json.JSON, which do not implement go's json.Marshaler interface. So to do what you say, we'd need to do this:

convert datum to crdbjson.JSON
serialize crdbjson.JSON to string
parse string via gojson.Unmarshal
add resulting datum into result tree
at the very end, re-serialize everything

The current implementations skip steps 3-5.

We can do what you say by adding proper json marshaling support to crdb's util/json package. I think it's out of scope for this MVP.

pkg/server/api_v2_sql.go line 318 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Have we ever used the internal executor for user-specified code before? This seems like a risk of crashing because the user stumbles across something not supported by the internal executor, or invalid security assumptions, etc.

there's no panic inside the internal executor, all errors flow out.

pkg/server/api_v2_sql.go line 334 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Can the row affected count really be more than 53 bits? That's an awfully big number.

Are you interested to make that call now and set it in stone in the API spec? I wasn't comfortable doing so and thought the string syntax was more generic.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @dhartunian, and @knz)

pkg/server/api_v2_sql.go line 49 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Multiple SQL statements can make it easier to exploit SQL injection; many drivers place restrictions on the ability to use multiple statements in one string for this reason. We may want to add an argument multi_statement= (default false) so that a client must opt in to this multi-statement mode for a little extra safety.

Instead of semicolons the interface now requires the statements to be passed as separate JSON payloads. So this point is moot.

pkg/server/api_v2_sql.go line 59 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

After a little more thought, this is really important. I wouldn't feel comfortable shipping this without supporting placeholder-based usage.

Done.

pkg/server/api_v2_sql.go line 190 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Be consistent with the rest of APIv2

Done.

pkg/server/api_v2_sql.go line 224 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why is this an option? Seems to me that it's better to always return the columns even if there are no rows.

Nits: return instead of show, parse it as a bool or 1/0 instead of using empty vs non-empty.

Removed.

pkg/server/api_v2_sql.go line 236 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's weird that execution defaults to off.

I found it easier to use to troubleshoot interactively this way. I don't have hard feelings either way.

pkg/server/api_v2_sql.go line 237 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Agreed (including all errors)

Done.

knz · 2022-04-29T13:39:37Z

Example query with placeholders with the new input format:

curl -H "X-Cockroach-API-Session: ..." \
  -H "Content-Type: application/json" -k \
  --data-binary '{"statements":[{"sql":"select $1","arguments":[123]}],"execute":true}' \
  https://127.0.0.1:8080/api/v2/sql/

Each statement has its own independent list of placeholders and argument values.

knz · 2022-05-13T16:39:56Z

ok I've polished this further to avoid the custom json serialization code.

Are folk happy with the interface / UX? If so, I will implement some unit tests and mark the PR ready for review.

cc @andreimatei

dhartunian

Are folk happy with the interface / UX? If so, I will implement some unit tests and mark the PR ready for review.

👍 I'd like to merge this and then port one of our features to it. I successfully prototyped a SQL-execution UI feature in DB Console using an earlier iteration of the PR and found the API easy to work with.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @dhartunian)

-- commits line 86 at r2:

Previously, dhartunian (David Hartunian) wrote…

Why is this a concern on this endpoint in particular, and not on all the existing APIs?
Using cookies is much more ergonomic for a browser app so I'd definitely prefer for this API to support that as well.

Retracting this. I understand the motivation provided for using the header and we can keep it.

dhartunian

@knz just pushed tests up. Looks good to merge unless you have more cases for me to cover.

One thing that I was a bit surprised by during testing was having a top-level "error" field and a per-statement error. In the case of SQL errors, the object is repeated once at the top of the JSON blob and again inside the statement result. Feels a bit redundant. I understand the purpose of each since we can have non-statement errors and am wondering if in the case of SQL errors we should only have them within the result.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @dhartunian)

knz

LGTM modulo the lint nit that CI is telling you about.

In the case of SQL errors, the object is repeated once at the top of the JSON blob and again inside the statement result.

The design snag this addresses is that there can be multiple statements executed, some of them succesful followed by a problem. For example, select 1; select 2; select foo can return two results and then an error.

The current error presentation ensures that it's clear which of the statement the error pertains to; and it also lets the statements before it return an error.

I'm interested in your opinion on how to present this differently.

Reviewed 1 of 5 files at r4, 3 of 5 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @dhartunian)

pkg/server/api_v2_sql_test.go line 85 at r6 (raw file):

				return fmt.Sprintf("%s|%s", er.Error.Code, er.Error.Message)
			}
			return string(r)

can I suggest you unmarshal the response in every case,
then re-marshal it with indentation for the expected output. This will make the expected output much more readable.

abarganier

- thanks for adding tests to help drive this home @dhartunian

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell and @dhartunian)

This commit introduces a new API endpoint `/api/v2/sql/`, which can execute arbitrary SQL statements over HTTP. The endpoint is authenticated, i.e. the client must provide an appropriate `X-Cockroach-API-Session:` header on each request. There is a Swagger spec to enable auto-documentation of the API. The request payload should be submitted using JSON. Summary of request fields: - `database`: the current database, defaults to `defaultdb` - `application_name` - `timeout`: max execution time, defaults to 5s - `max_results_size`: max size of results, defaults to ~10KB. - `statements`: which statements to run. An array of {"sql", "arguments"} objects. The number of items in "arguments" must match the number of placehoders in "sql". - `execute`: must be set to true to actually execute the SQL. Otherwise, the request is merely parsed to verify the syntax is correct. Example use: 1. Generate some test data and create a user with access to it: ``` $ cockroach gen example-data intro | cockroach sql --certs-dir=... $ cockroach sql --certs-dir=... \ -e 'create user myuser with password abc' -e 'grant select on table intro.mytable to myuser' ``` 2. Generate an HTTP API key, for example using the CLI (an HTTP request to /api/v2/login is also possible): ``` $ cockroach session-auth login myuser --certs-dir=certs ``` (this prints out the session key, assign it manually to the $APIKEY env var.) 3. Run some HTTP API queries from the shell using `curl`. For example: ``` $ curl -k -H "X-Cockroach-API-Session: $APIKEY" \ -H "Content-Type: application/json" \ --data-binary '{"database":"intro", "statements":[{"sql":"SELECT v FROM mytable WHERE l%2 = 0"}], "execute": true}' \ https://127.0.0.1:8080/api/v2/sql/ ``` Example output: ```json {"num_statements": 1, "execution": { "retries":0, "txn_results": [ { "statement": 1, "tag": "SELECT", "start":"2022-04-08T15:55:30.659257101Z", "columns": [{"name":"v", "type":"STRING", "oid":25}], "rows": [ {"v":"..."}, {"v":"..."}, {"v":"..."} ], "end":"2022-04-08T15:55:30.665283699Z" } ] }} ``` This currently executes using the "internal executor" and thus does not support SQL sessions and transaction control. Some consequences: - SET statements are ineffective. - BEGIN/COMMIT/ABORT/SAVEPOINT etc are ineffective. Security note: We place this logic under `/api/v2` and not `/_admin/v1` because the latter uses session cookies and is thus susceptible to [CSRF](https://en.wikipedia.org/wiki/Cross-site_request_forgery). The `/api/v2` tree requires an explicit header and thus blocks CSRF. Release note (api change): A new `/api/v2/sql/` endpoints enable execution of simple SQL queries over HTTP.

dhartunian

I'm interested in your opinion on how to present this differently.

My only idea was to omit a top level error in the case where the errors are attributable to a statement and only show the error in one place. But I think this is fine to keep for now. Once there's more usage by teams we can see if it causes confusion.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell, @dhartunian, and @knz)

pkg/server/api_v2_sql_test.go line 85 at r6 (raw file):

Previously, knz (kena) wrote…

can I suggest you unmarshal the response in every case,
then re-marshal it with indentation for the expected output. This will make the expected output much more readable.

Done.

knz

👏 👏

Reviewed 1 of 5 files at r4, 3 of 3 files at r7, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell)

dhartunian · 2022-06-23T21:06:17Z

bors r=knz,abarganier

craig · 2022-06-23T23:05:52Z

Build failed (retrying...):

GitHub CI (Cockroach)

andreimatei · 2022-06-23T23:10:52Z

bors r-

craig · 2022-06-23T23:10:54Z

Canceled.

andreimatei · 2022-06-23T23:22:50Z

I think it failed with a false positive for a new linter. Discussing here: #82324 (comment)

knz · 2022-06-24T14:17:09Z

I see on the linked issue that the linter will be improved.
In the meantime, we can improve the code as suggested here. Do you need me to do it?

dhartunian · 2022-06-28T16:13:46Z

@knz I'll update the PR today

dhartunian · 2022-06-28T18:11:50Z

I've confirmed that #83418 fixes the linter issue that kept this PR from merging as-is so I'll give that a day to get merged so we can follow.

dhartunian · 2022-07-02T18:45:13Z

bors r=knz,abarganier

craig · 2022-07-02T19:58:06Z

Build succeeded:

GitHub CI (Cockroach)

knz force-pushed the 20220407-sql-api branch from 1f07df6 to 849cb45 Compare April 8, 2022 16:02

knz force-pushed the 20220407-sql-api branch from 849cb45 to cd182f4 Compare April 8, 2022 16:58

dhartunian requested a review from a team April 11, 2022 14:39

dhartunian reviewed Apr 22, 2022

View reviewed changes

bdarnell reviewed Apr 25, 2022

View reviewed changes

knz commented Apr 27, 2022

View reviewed changes

knz force-pushed the 20220407-sql-api branch from cd182f4 to dc984be Compare April 29, 2022 13:35

knz commented Apr 29, 2022

View reviewed changes

knz force-pushed the 20220407-sql-api branch 2 times, most recently from 4bacb79 to b5bed81 Compare May 13, 2022 16:38

dhartunian reviewed May 17, 2022

View reviewed changes

xinhaoz mentioned this pull request May 18, 2022

server: create endpoint to retrieve cluster locks state #81397

Closed

dhartunian force-pushed the 20220407-sql-api branch 2 times, most recently from 8787ab1 to 0472284 Compare June 16, 2022 19:45

dhartunian approved these changes Jun 16, 2022

View reviewed changes

dhartunian marked this pull request as ready for review June 16, 2022 19:46

dhartunian requested a review from a team as a code owner June 16, 2022 19:46

dhartunian requested a review from a team June 16, 2022 19:49

knz commented Jun 16, 2022

View reviewed changes

abarganier approved these changes Jun 21, 2022

View reviewed changes

dhartunian force-pushed the 20220407-sql-api branch from 0472284 to c916cbd Compare June 23, 2022 16:36

dhartunian reviewed Jun 23, 2022

View reviewed changes

knz commented Jun 23, 2022

View reviewed changes

craig bot merged commit 3c9b171 into cockroachdb:master Jul 2, 2022

cockroach-teamcity mentioned this pull request Jul 2, 2022

server: new HTTP API to execute SQL statements cockroachdb/docs#14440

Closed

This was referenced Jul 6, 2022

ui: expose sql-over-http /api/v2 endpoint for sql tenants #83925

Closed

api: support cookie-based auth for read-only queries for sql-over-http #84311

Closed

dhartunian mentioned this pull request Jul 13, 2022

release-22.1: server: new HTTP API to execute SQL statements #84374

Merged

This was referenced Oct 20, 2022

server: make Databases endpoint tenant-scoped #90257

Closed

server: make CreateStatementDiagnosticsReport tenant-scoped #90274

Closed

Conversation

knz commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Apr 8, 2022

Uh oh!

rafiss commented Apr 8, 2022

Uh oh!

knz commented Apr 8, 2022

Uh oh!

knz commented Apr 8, 2022

Uh oh!

knz commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knz commented Apr 8, 2022

Uh oh!

rafiss commented Apr 8, 2022

Uh oh!

knz commented Apr 8, 2022

Uh oh!

knz commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petermattis commented Apr 8, 2022

Uh oh!

knz commented Apr 8, 2022

Uh oh!

rafiss commented Apr 9, 2022

Uh oh!

knz commented Apr 10, 2022

Uh oh!

knz commented Apr 13, 2022

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

knz commented Apr 23, 2022

Uh oh!

bdarnell left a comment

Choose a reason for hiding this comment

Uh oh!

bdarnell left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

knz commented Apr 29, 2022

Uh oh!

knz commented May 13, 2022

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

abarganier left a comment

Choose a reason for hiding this comment

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

dhartunian commented Jun 23, 2022

Uh oh!

craig bot commented Jun 23, 2022

Uh oh!

andreimatei commented Jun 23, 2022

Uh oh!

craig bot commented Jun 23, 2022

knz commented Apr 8, 2022 •

edited

Loading

knz commented Apr 8, 2022 •

edited

Loading

knz commented Apr 8, 2022 •

edited

Loading