refactor(db_api): remove parsing of insert statements by larkee · Pull Request #630 · googleapis/python-spanner

larkee · 2021-10-21T03:24:00Z

No description provided.

olavloite

If I understand this correctly, this change will change the behavior of dbapi execute_many is called with an INSERT statement. Previously such a call would result in a BatchUpdate RPC invocation with the SQL statement repeated multiple times, but with different parameter values. This will cause dbapi to alter the SQL statement that is executed into a single INSERT statement that inserts multiple rows (e.g. INSERT INTO Foo (c1, c2) VALUES (@p1, @p2), (@p3, @p4)). That is (probably) more efficient on Spanner than a BatchUpdate RPC with multiple INSERT statements.

In general I would say that:

A driver should normally execute the SQL statement(s) it receives without altering them. It is the responsibility of the application or framework that uses it to generate the SQL statement that is needed.
If the driver supports re-writing certain SQL statements it should be an op-in (or at least have an opt-out possibility) so it is clear to the user that it might happen. If it is an op-out, it should also be lenient enough to fall back to just executing the supplied SQL without returning an error (unless of course the SQL statement is invalid, in which case the backend is responsible for returning an error).

So if in any way possible, I would rather say that this change should be in SQLAlchemy and/or Django. If that is not feasible, then at least this should be optional and more lenient so it doesn't cause surprises for anyone that might be using dbapi directly.

olavloite · 2021-11-03T07:36:40Z

google/cloud/spanner_dbapi/parse_utils.py

-              about the resulting table in case c).
+    :returns: A list of (sql, param) tuples
    """  # noqa
    match = RE_INSERT.search(insert_sql)


I know that this is not part of this change, but I think this part should be more lenient. As far as I can tell, the function will return an error if it cannot match the insert regexp, and then refuse to execute the statement(s). I would rather expect something like:

Try to parse the insert statement, and if it does not work, fall back to a 'simple' case where the sql statement and parameter list is converted to a normal batch update RPC invocation. That prevents failures when an application tries to execute an insert statement that does not match the regexp (for example if it contains comments, hints, ...)

If the parse succeeds, then try the different scenarios in this function.

If the parse fails because it encounters some other non-expected scenario, then also fall back to the default of just trying to execute the statements as a batch update.

The above would ensure that the common cases of (generated) sql are parsed into a more efficient format, while other sql is still executed without any unexpected errors.

It might also be good to consider making entire parsing optional. It might be surprising for users that dbapi actually changes the SQL that is executed. Normally a driver like dbapi should execute the SQL as is. The generation of efficient SQL statements should as far as possible be a responsibility of any framework or application that uses the driver.

After looking more closely, this parsing and rewriting of insert statements doesn't seem to add any benefits. It's only used for execute which means at best it was returning the single SQL statement and at worst it was splitting it into multiple statements.

I have removed parse_insert in favor of sending the SQL as is.

larkee · 2021-11-15T05:19:30Z

If I understand this correctly, this change will change the behavior of dbapi execute_many is called with an INSERT statement. Previously such a call would result in a BatchUpdate RPC invocation with the SQL statement repeated multiple times, but with different parameter values. This will cause dbapi to alter the SQL statement that is executed into a single INSERT statement that inserts multiple rows (e.g. INSERT INTO Foo (c1, c2) VALUES (@p1, @p2), (@p3, @P4)). That is (probably) more efficient on Spanner than a BatchUpdate RPC with multiple INSERT statements.

This was indeed my original intent. However, I agree with your reasoning that this type of change does not belong here. As such, I have updated this PR to instead:

Remove unneeded mutations code
Remove insert statement parsing
Remove adding superfluous WHERE clauses
Strip out comments before classifying SQL

PTAL :)

IlyaFaer · 2021-11-15T09:15:04Z

tests/unit/spanner_dbapi/test_cursor.py

@@ -706,9 +771,6 @@ def test_setoutputsize(self):
    #
    # def test_do_execute_insert_heterogenous(self):
    #     pass


I guess you can erase other commented cases as well now

IlyaFaer · 2021-11-15T09:16:22Z

It'd be marvelous if we could erase all of this code, but I'm afraid something will fail somewhere.

First of all, I don't think we can erase the function which adds a dummy WHERE clause to UPDATE operations. We tried it several times and always hit problems in Django and SQLAlchemy level. So, I guess, these changes should be tried with SQLAlchemy and Django tests first.

olavloite · 2021-11-15T08:36:07Z

google/cloud/spanner_dbapi/connection.py

@@ -391,22 +389,13 @@ def run_statement(self, statement, retried=False):
            self._statements.append(statement)

        if statement.is_insert:


Do we need this if statement at all any more? Or can it be merged with the default case for any other update statement?

olavloite · 2021-11-15T09:24:24Z

google/cloud/spanner_dbapi/cursor.py

-                if classification == parse_utils.STMT_UPDATING:
-                    sql = parse_utils.ensure_where_clause(sql)
-
                if classification != parse_utils.STMT_INSERT:


Do we still need this? As in; do we handle an insert in any different way than for example an UPDATE DML statement?

olavloite · 2021-11-15T09:42:32Z

google/cloud/spanner_dbapi/parse_utils.py

    :rtype: str
    :returns: The query type name.
    """
+    query = sqlparse.format(query, strip_comments=True).strip()


As this is an external library, it might be good to add a comment here that it has been verified that this library supports exactly the types of comments that Cloud Spanner supports, and no other comment types. That is, the library will strip the following types of comments (from what I can see in the source code of the library this holds true):

# single line comment starting with a dash

-- single line comment starting with two hyphens

`/* multi line comment */

And it does not support for example the PostgreSQL style dollar quoted string constants (https://www.postgresql.org/docs/current/sql-syntax-lexical.html) (or if it does, document that the parsing would fail if a SQL string contains a comment inside what would be interpreted as a dollar quoted string)

… statement

IlyaFaer · 2022-02-07T07:51:16Z

@vi3k6i5, we have this a bit stale PR here, I'm going to take a look at it and finish it, 'cause I like these changes. They'll reduce the DB API code complexity.

IlyaFaer · 2022-02-11T08:02:36Z

Closing this PR in favor of: #679, which contains almost the same changes, but without conflicts and with some errors fixed.

product-auto-label bot added the api: spanner Issues related to the googleapis/python-spanner API. label Oct 21, 2021

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Oct 21, 2021

larkee changed the title ~~perf(db_api): generate one statement for homogeneous inserts~~ perf(db_api): improve performance of insert statements Oct 25, 2021

larkee marked this pull request as ready for review November 3, 2021 03:15

larkee requested a review from a team November 3, 2021 03:15

larkee requested a review from a team as a code owner November 3, 2021 03:15

larkee requested review from IlyaFaer and olavloite November 3, 2021 03:39

olavloite reviewed Nov 3, 2021

View reviewed changes

larkee mentioned this pull request Nov 15, 2021

perf: use batch_update instead of sending a request for each statement #572

Closed

larkee changed the title ~~perf(db_api): improve performance of insert statements~~ refactor(db_api): remove parsing of insert statements Nov 15, 2021

IlyaFaer reviewed Nov 15, 2021

View reviewed changes

olavloite reviewed Nov 15, 2021

View reviewed changes

larkee added 13 commits November 16, 2021 21:59

perf: use batch_update instead of sending a separate request for each…

433ba90

… statement

fix: check status for errors

85927c9

test: add and update tests

007f8b1

refactor: remove redundant mutations code

068289a

perf: generate fewer insert statements for execute calls

f6125c5

perf: generate fewer insert statements for executemany calls

7a7aae2

test: fix executemany insert tests

501a625

fix: handle aborted errors for executemany inserts

e677526

style: fix lint

a377683

refactor: stop adding WHERE clause

02bca0c

refactor: remove unnecessary parsing

307f60c

refactor: revert executemany changes

8d771a1

refactor: revert where clause removal

6d479c0

larkee force-pushed the dbapi-bug-fix branch from b6ef602 to 6d479c0 Compare November 16, 2021 11:00

IlyaFaer mentioned this pull request Feb 10, 2022

refactor: erase SQL statements parsing #679

Merged

IlyaFaer closed this Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(db_api): remove parsing of insert statements#630

refactor(db_api): remove parsing of insert statements#630
larkee wants to merge 13 commits intogoogleapis:mainfrom
larkee:dbapi-bug-fix

larkee commented Oct 21, 2021

Uh oh!

olavloite left a comment

Uh oh!

olavloite Nov 3, 2021

Uh oh!

larkee Nov 15, 2021

Uh oh!

larkee commented Nov 15, 2021

Uh oh!

IlyaFaer Nov 15, 2021

Uh oh!

IlyaFaer commented Nov 15, 2021

Uh oh!

olavloite Nov 15, 2021

Uh oh!

olavloite Nov 15, 2021

Uh oh!

olavloite Nov 15, 2021

Uh oh!

IlyaFaer commented Feb 7, 2022

Uh oh!

IlyaFaer commented Feb 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -391,22 +389,13 @@ def run_statement(self, statement, retried=False):
		self._statements.append(statement)

		if statement.is_insert:

Conversation

larkee commented Oct 21, 2021

Uh oh!

olavloite left a comment

Choose a reason for hiding this comment

Uh oh!

olavloite Nov 3, 2021

Choose a reason for hiding this comment

Uh oh!

larkee Nov 15, 2021

Choose a reason for hiding this comment

Uh oh!

larkee commented Nov 15, 2021

Uh oh!

IlyaFaer Nov 15, 2021

Choose a reason for hiding this comment

Uh oh!

IlyaFaer commented Nov 15, 2021

Uh oh!

olavloite Nov 15, 2021

Choose a reason for hiding this comment

Uh oh!

olavloite Nov 15, 2021

Choose a reason for hiding this comment

Uh oh!

olavloite Nov 15, 2021

Choose a reason for hiding this comment

Uh oh!

IlyaFaer commented Feb 7, 2022

Uh oh!

IlyaFaer commented Feb 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants