refactor(db_api): remove parsing of insert statements#630
refactor(db_api): remove parsing of insert statements#630larkee wants to merge 13 commits intogoogleapis:mainfrom
Conversation
olavloite
left a comment
There was a problem hiding this comment.
If I understand this correctly, this change will change the behavior of dbapi execute_many is called with an INSERT statement. Previously such a call would result in a BatchUpdate RPC invocation with the SQL statement repeated multiple times, but with different parameter values. This will cause dbapi to alter the SQL statement that is executed into a single INSERT statement that inserts multiple rows (e.g. INSERT INTO Foo (c1, c2) VALUES (@p1, @p2), (@p3, @p4)). That is (probably) more efficient on Spanner than a BatchUpdate RPC with multiple INSERT statements.
In general I would say that:
- A driver should normally execute the SQL statement(s) it receives without altering them. It is the responsibility of the application or framework that uses it to generate the SQL statement that is needed.
- If the driver supports re-writing certain SQL statements it should be an op-in (or at least have an opt-out possibility) so it is clear to the user that it might happen. If it is an op-out, it should also be lenient enough to fall back to just executing the supplied SQL without returning an error (unless of course the SQL statement is invalid, in which case the backend is responsible for returning an error).
So if in any way possible, I would rather say that this change should be in SQLAlchemy and/or Django. If that is not feasible, then at least this should be optional and more lenient so it doesn't cause surprises for anyone that might be using dbapi directly.
| about the resulting table in case c). | ||
| :returns: A list of (sql, param) tuples | ||
| """ # noqa | ||
| match = RE_INSERT.search(insert_sql) |
There was a problem hiding this comment.
I know that this is not part of this change, but I think this part should be more lenient. As far as I can tell, the function will return an error if it cannot match the insert regexp, and then refuse to execute the statement(s). I would rather expect something like:
- Try to parse the insert statement, and if it does not work, fall back to a 'simple' case where the sql statement and parameter list is converted to a normal batch update RPC invocation. That prevents failures when an application tries to execute an insert statement that does not match the regexp (for example if it contains comments, hints, ...)
- If the parse succeeds, then try the different scenarios in this function.
- If the parse fails because it encounters some other non-expected scenario, then also fall back to the default of just trying to execute the statements as a batch update.
The above would ensure that the common cases of (generated) sql are parsed into a more efficient format, while other sql is still executed without any unexpected errors.
It might also be good to consider making entire parsing optional. It might be surprising for users that dbapi actually changes the SQL that is executed. Normally a driver like dbapi should execute the SQL as is. The generation of efficient SQL statements should as far as possible be a responsibility of any framework or application that uses the driver.
There was a problem hiding this comment.
After looking more closely, this parsing and rewriting of insert statements doesn't seem to add any benefits. It's only used for execute which means at best it was returning the single SQL statement and at worst it was splitting it into multiple statements.
I have removed parse_insert in favor of sending the SQL as is.
This was indeed my original intent. However, I agree with your reasoning that this type of change does not belong here. As such, I have updated this PR to instead:
PTAL :) |
| @@ -706,9 +771,6 @@ def test_setoutputsize(self): | |||
| # | |||
| # def test_do_execute_insert_heterogenous(self): | |||
| # pass | |||
There was a problem hiding this comment.
I guess you can erase other commented cases as well now
|
It'd be marvelous if we could erase all of this code, but I'm afraid something will fail somewhere. First of all, I don't think we can erase the function which adds a dummy WHERE clause to UPDATE operations. We tried it several times and always hit problems in Django and SQLAlchemy level. So, I guess, these changes should be tried with SQLAlchemy and Django tests first. |
| @@ -391,22 +389,13 @@ def run_statement(self, statement, retried=False): | |||
| self._statements.append(statement) | |||
|
|
|||
| if statement.is_insert: | |||
There was a problem hiding this comment.
Do we need this if statement at all any more? Or can it be merged with the default case for any other update statement?
google/cloud/spanner_dbapi/cursor.py
Outdated
| if classification == parse_utils.STMT_UPDATING: | ||
| sql = parse_utils.ensure_where_clause(sql) | ||
|
|
||
| if classification != parse_utils.STMT_INSERT: |
There was a problem hiding this comment.
Do we still need this? As in; do we handle an insert in any different way than for example an UPDATE DML statement?
| :rtype: str | ||
| :returns: The query type name. | ||
| """ | ||
| query = sqlparse.format(query, strip_comments=True).strip() |
There was a problem hiding this comment.
As this is an external library, it might be good to add a comment here that it has been verified that this library supports exactly the types of comments that Cloud Spanner supports, and no other comment types. That is, the library will strip the following types of comments (from what I can see in the source code of the library this holds true):
# single line comment starting with a dash-- single line comment starting with two hyphens- `/* multi line comment */
And it does not support for example the PostgreSQL style dollar quoted string constants (https://www.postgresql.org/docs/current/sql-syntax-lexical.html) (or if it does, document that the parsing would fail if a SQL string contains a comment inside what would be interpreted as a dollar quoted string)
b6ef602 to
6d479c0
Compare
|
@vi3k6i5, we have this a bit stale PR here, I'm going to take a look at it and finish it, 'cause I like these changes. They'll reduce the DB API code complexity. |
|
Closing this PR in favor of: #679, which contains almost the same changes, but without conflicts and with some errors fixed. |
No description provided.