OpenRosa attachment update auditing by brontolosone · Pull Request #1646 · getodk/central-backend

brontolosone · 2025-10-24T19:12:23Z

In OpenRosa one can make a submission with 0 or more of its attachments attached in a single POST. In subsequent POSTs one can add the remaining attachments.

In contrast to what the API documentation states, the attachment additions borne from the first submission were actually auditlogged. But, subsequent POSTs' attachment updates were not auditlogged.

This PR makes it so that subsequent OR attachment set additions are also logged.

Additionally, it avoids a bit of needless DB churn that was taking place when binary content was already present.

What has been done to verify that this works as intended?

Tests. The first commit is a test that fails, the later commits prevent the failure.

Why is this the best possible solution? Were any other approaches considered?

The least labour intensive alternative would be to just correct the docs to reflect the current behaviour, rather than amending the behaviour. But it just so happens that I need to have attachment update audit events from OpenRosa subsequent-submissions, and it's generally a good thing that those are logged just the same as attachment additions taking place through the REST API.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

I think everyone will be happy.

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Done.

Before submitting this PR, please make sure you have:

run make test and confirmed all checks still pass OR confirm CircleCI build passes
verified that any code from external sources are properly credited in comments or that everything is internally sourced

…tlogged

- Make inserting a blob which already exists a true no-op (saves DB churn; the sha update of the replaced approach caused a row copy) - Remove comment related to force-a-write-so-we-can-return-the-id (as we do it differently now) - Remove comment about avoiding a TOCTOU issue by sending the file over (potentially unnecessarily). 1. There was almost a TOCTOU issue anyway, but it was elided by touching every used row, which makes concurrent deleters/updaters of that row block until our transaction commit. 2. Avoiding sending the file over *and* avoiding a TOCTOU should be doable with a row lock or advisory lock.

…sequent POSTs (continuations of attachment uploads). Fixes getodk/central#1426

lib/model/query/blobs.js

matthew-white · 2025-12-02T20:39:58Z

Just checking in about this PR with an eye toward regression testing coming up. @ktuite, do you think you'll have time to review this? No worries if not, we could also find a different reviewer.

ktuite · 2025-12-02T23:06:18Z

✨ Report ✨

I wanted to see this in action on the frontend, partly to understand the problem, and partly to check that these events didn't show up in an unexpected way on frontend.

I'm not sure how to get Collect to miss attachments on purpose, so I tried a more direct approach of using python to send requests to the openrosa endpoint. (I first tried pyodk but it doesn't use openrosa.) I verified that I could trigger the "Missing attachment" state, and then get it to go away again but resending the submission with the other attachment.

Note: I haven't thought about submission attachments in many years, but I was surprised to see "Missing attachment" and both attachment checkmarks looking green and happy. Though the missing one has a 404 when you click on it. It's probably always been that way, where there are separate mechanisms for knowing if the submission intends to have an attachment vs. if the blob is actually present.

I verified that in the master branch, the audit logs look like this:

submission.create
submission.attachment.update for one provided attachment
no additional event after updating submission

I verified that in this new PR, more submission.attachment.update events are logged. I see the test of this, too.

Logs

submission.create
submission.attachment.update (original event) - name: cat1.png, newBlobId: 7138
[ after resubmitting with both attachments]
submission.attachment.update (existing attachment) - name: cat1.png, newBlobId: None
submission.attachment.update (updated attachment) - name: cat2.png, newBlobId: 7133

Note: I'll add a specific comment about how in the test, it sends the submission with one attachment and then the other, but the first thing I tried was submission with one attachment and then submission with both attachments, which added an extra and somewhat odd event for reexamining the first attachment.

My original goal was to see what the submission detail page looked like with respect to this new logging.

The answer is that there's nothing about these events on the submission detail page.

But it's not because we're explicitly suppressing these submission.attachment.update events... it's just that none of them are logged with details->submissionId, which is what is used to fetch events for a specific submission.

Sample audit events:

[
 {'acteeId': '2a73370c-abb5-410d-aa20-b8f24b269b77',
  'action': 'submission.attachment.update',
  'actorId': 5,
  'details': {'instanceId': 'uuid:ad59876f-885c-406e-9f69-d286d471414c',
              'name': 'cat1.png',
              'newBlobId': 7138,
              'submissionDefId': 74110},
  'loggedAt': '2025-12-02T21:56:58.596Z',
  'notes': None},
 {'acteeId': '2a73370c-abb5-410d-aa20-b8f24b269b77',
  'action': 'submission.create',
  'actorId': 5,
  'details': {'instanceId': 'uuid:ad59876f-885c-406e-9f69-d286d471414c',
              'submissionDefId': 74110,
              'submissionId': 74104},
  'loggedAt': '2025-12-02T21:56:58.579Z',
  'notes': None},
]

I don't know if this is on purpose (not showing the update event) because it would be show something for every attachment on a submission even when it was first created. It's probably not intentional that we are filtering it out by not logging the submissionId in the event.

Hypothetically, if we were to include the submissionId in these events but changed nothing else, they would automatically show up like this:

If we wanted to do this, we would need to go through and find every place this attachment update event is logged (I found some by searching the code for newBlobId). Then we might want to suppress these events in a different way.

Conclusion

This PR isn't going to make anything weird or unexpected show up on the frontend!

The new logging makes sense and is consistent with how the attachment events are logged elsewhere in the code.

The only issue I can see from poking at it is with the logging when the same attachment is sent in again.

ktuite · 2025-12-02T23:18:48Z

test/integration/api/audits.js

+        )
+      ));
+    });
+


This test currently fails because it has 3 events rather than 2, and in the extra event, blobNow is null.

it('should handle resubmitting partial submission with all attachments', testService((service) => service.login('alice', (asAlice) => asAlice.post('/v1/projects/1/forms?publish=true') .set('Content-Type', 'application/xml') .send(testData.forms.binaryType) .expect(200) .then(() => asAlice.post('/v1/projects/1/submission') .set('X-OpenRosa-Version', '1.0') .attach('xml_submission_file', Buffer.from(testData.instances.binaryType.both), { filename: 'data.xml' }) .attach('my_file1.mp4', Buffer.from('this is test file one'), { filename: 'my_file1.mp4' }) .expect(201) .then(() => asAlice.get('/v1/audits?action=submission.attachment.update') .expect(200)) .then(({ body }) => { body[0].details.name.should.equal('my_file1.mp4'); }) ) .then(() => asAlice.post('/v1/projects/1/submission') .set('X-OpenRosa-Version', '1.0') .attach('xml_submission_file', Buffer.from(testData.instances.binaryType.both), { filename: 'data.xml' }) .attach('my_file1.mp4', Buffer.from('this is test file one'), { filename: 'my_file1.mp4' }) .attach('here_is_file2.jpg', Buffer.from('this is test file two'), { filename: 'here_is_file2.jpg' }) .expect(201) .then(() => asAlice.get('/v1/audits?action=submission.attachment.update') .expect(200)) .then(({ body }) => { body.length.should.equal(2); body[0].details.name.should.equal('here_is_file2.jpg'); body[1].details.name.should.equal('my_file1.mp4'); }) ) ) ));

After poking around, was inspired to write another test:

it.only('should handle resubmitting partial submission with contents of attachment changed', testService((service) => service.login('alice', (asAlice) => asAlice.post('/v1/projects/1/forms?publish=true') .set('Content-Type', 'application/xml') .send(testData.forms.binaryType) .expect(200) .then(() => asAlice.post('/v1/projects/1/submission') .set('X-OpenRosa-Version', '1.0') .attach('xml_submission_file', Buffer.from(testData.instances.binaryType.both), { filename: 'data.xml' }) .attach('my_file1.mp4', Buffer.from('this is test file one'), { filename: 'my_file1.mp4' }) .expect(201) .then(() => asAlice.get('/v1/audits?action=submission.attachment.update') .expect(200)) .then(({ body }) => { body[0].details.name.should.equal('my_file1.mp4'); }) ) .then(() => asAlice.post('/v1/projects/1/submission') .set('X-OpenRosa-Version', '1.0') .attach('xml_submission_file', Buffer.from(testData.instances.binaryType.both), { filename: 'data.xml' }) .attach('my_file1.mp4', Buffer.from('file one contents have changed'), { filename: 'my_file1.mp4' }) .attach('here_is_file2.jpg', Buffer.from('this is test file two'), { filename: 'here_is_file2.jpg' }) .expect(201) .then(() => asAlice.get('/v1/audits?action=submission.attachment.update') .expect(200)) .then(({ body }) => { body.length.should.equal(3); body[0].details.name.should.equal('here_is_file2.jpg'); body[1].details.name.should.equal('my_file1.mp4'); body[2].details.name.should.equal('my_file1.mp4'); body[1].details.newBlobId.should.not.equal(body[2].details.newBlobId); }) ) ) ));

ktuite · 2025-12-02T23:20:40Z

lib/model/query/submission-attachments.js

+      ).then(linkResults =>
+        // auditlog the updates (only those which were added or of which the content changed)
+        linkResults
+          .filter(el => el.blobBefore !== el.blobNow)


If you keep the query in attach() the same, then blobs that you tried to attach but were already set will have a null blobNow, which you could then skip in the logging. Or you could tweak the query.

Suggested change

.filter(el => el.blobBefore !== el.blobNow)

.filter(el => el.blobNow && el.blobBefore !== el.blobNow)

matthew-white · 2025-12-03T05:19:17Z

Very nice, thank you for taking a look! ✨

I was surprised to see "Missing attachment" and both attachment checkmarks looking green and happy. Though the missing one has a 404 when you click on it. It's probably always been that way, where there are separate mechanisms for knowing if the submission intends to have an attachment vs. if the blob is actually present.

I also think it's always been that way, but it's definitely a bit funny. There's a related issue filed at getodk/central#706.

brontolosone added 3 commits October 24, 2025 19:19

add failing test: OpenRosa attachment addition submission is not audi…

c243e63

…tlogged

OpenRosa submissions: auditlog updates to attachment set, also on sub…

b240423

…sequent POSTs (continuations of attachment uploads). Fixes getodk/central#1426

brontolosone force-pushed the 1436_openrosa_attachment_update_auditing branch from b9cf49b to b240423 Compare October 24, 2025 19:19

brontolosone requested a review from alxndrsn October 24, 2025 19:46

brontolosone marked this pull request as ready for review October 24, 2025 19:46

alxndrsn reviewed Oct 25, 2025

View reviewed changes

lib/model/query/blobs.js Show resolved Hide resolved

ktuite self-requested a review October 30, 2025 15:26

ktuite approved these changes Dec 2, 2025

View reviewed changes

Handle resubmitting or altering attachment

ac01a9e

ktuite merged commit e819ca0 into getodk:master Dec 6, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenRosa attachment update auditing#1646

OpenRosa attachment update auditing#1646
ktuite merged 4 commits intogetodk:masterfrom
brontolosone:1436_openrosa_attachment_update_auditing

brontolosone commented Oct 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

matthew-white commented Dec 2, 2025

Uh oh!

ktuite commented Dec 2, 2025

Uh oh!

ktuite Dec 2, 2025

Uh oh!

ktuite Dec 2, 2025

Uh oh!

matthew-white commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	.filter(el => el.blobBefore !== el.blobNow)
	.filter(el => el.blobNow && el.blobBefore !== el.blobNow)

Conversation

brontolosone commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What has been done to verify that this works as intended?

Why is this the best possible solution? Were any other approaches considered?

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Before submitting this PR, please make sure you have:

Uh oh!

Uh oh!

matthew-white commented Dec 2, 2025

Uh oh!

ktuite commented Dec 2, 2025

✨ Report ✨

My original goal was to see what the submission detail page looked like with respect to this new logging.

Conclusion

Uh oh!

ktuite Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

ktuite Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

matthew-white commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brontolosone commented Oct 24, 2025 •

edited

Loading