model/query/submissions: avoid RETURNING * where possible#1544
model/query/submissions: avoid RETURNING * where possible#1544alxndrsn wants to merge 8 commits intogetodk:masterfrom
Conversation
`INSERT...RETURNING *` is a very convenient SQL structure, but it's often used to return values which are already available to the calling code. When this happens, there is unnecessary overhead to database communications. There is risk to introducing this change, as: 1. new columns may be added to an INSERT statement without noticing that they might also be exluded from the generated list of RETURNING columns 2. a BEFORE INSERT trigger might change supplied values when returned 3. supplied value might be coerced into a different type when returned
| const _defInsert = (id, partial, formDefId, actorId, root, deviceId, userAgent) => sql`insert into submission_defs ("submissionId", xml, "formDefId", "instanceId", "instanceName", "submitterId", "localKey", "encDataAttachmentName", "signature", "createdAt", root, current, "deviceId", "userAgent") | ||
| values (${id}, ${sql.binary(partial.xml)}, ${formDefId}, ${partial.instanceId}, ${partial.def.instanceName}, ${actorId}, ${partial.def.localKey}, ${partial.def.encDataAttachmentName}, ${partial.def.signature}, clock_timestamp(), ${root}, true, ${deviceId}, ${userAgent}) | ||
| returning *`; | ||
| returning ${_defInsertReturnFields}`; |
There was a problem hiding this comment.
how about we just return submissionId and clock_timestamp - only values that are generated by database.
Generally database only generates surrogate keys using pg_get_serial_sequence and timestamp through the application
There was a problem hiding this comment.
how about we just return submissionId and clock_timestamp
That's what's happening, just in a less direct way.
Do you think the code would be simpler if we declared fields we do want returned rather than fields we don't? It would certainly make this code shorter. Maybe a neater approach would be to mark fields as autogenerated when declaring Frames? E.g.
https://github.com/getodk/central-backend/blob/master/lib/model/frames/submission.js#L89-L100
Could become:
table('submission_defs', 'def'),
'id', autogenerated, 'submissionId',
'formDefId', 'submitterId', readable,
'localKey', 'encDataAttachmentName',
'signature', 'createdAt', readable, autogenerated,There was a problem hiding this comment.
I love the idea of autogenerated attribute in the Frame.
That's true, but I was hoping to find it measurably true as well, so I set out to measure a performance difference on the submission path. |
I'd rather there's a demonstrable benefit, as it complicates the code. Maybe we should discuss scale. |
|
It should be demonstrable quite easily if application and database are running on different machines as it is for all our cloud servers. |
|
@brontolosone Can you test on dev to see if there was a demonstrable benefit? |
|
On the benchmarks I've shown you a couple of weeks ago in the pure-sql submission insertion path discussion, this branch showed reduced peak memory usage (roughly halved) inside PostgreSQL compared to mainline, but no throughput speedups. So there's a demonstrable benefit albeit in a benchmark which wasn't particularly designed to measure the impact of this exact thing, and with an N of only 2. We could wait for more & better benchmarks but... the benchmark results make logical sense (smaller resultsets to be kept in buffers in PostgreSQL -> lower memory usage) and it's generally a good idea not to do useless work and it's not a very invasive change, so... I'd be happy to approve it? But it's drafted currently. |
|
@alxndrsn Let's get this ready for review and merge. Less memory usage is a good win. |
👍 I think ideally there would be a less-manual notation required for this - see #1544 (comment) |
|
Closed for #1606; re-adding reviewers there. |
INSERT...RETURNING *is a very convenient SQL structure, but it's often used to return values which are already available to the calling code. When this happens, there is unnecessary overhead to database communications.There is risk to introducing this change, as:
INSERT/UPDATEstatement without noticing that they might also be excluded from the generated list ofRETURNINGcolumnsBEFORE INSERTtrigger might change supplied values when returnedRelated: getodk/central#1170
What has been done to verify that this works as intended?
Why is this the best possible solution? Were any other approaches considered?
How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.
Before submitting this PR, please make sure you have:
make testand confirmed all checks still pass OR confirm CircleCI build passes