Skip to content

OpenRosa attachment update auditing#1646

Merged
ktuite merged 4 commits intogetodk:masterfrom
brontolosone:1436_openrosa_attachment_update_auditing
Dec 6, 2025
Merged

OpenRosa attachment update auditing#1646
ktuite merged 4 commits intogetodk:masterfrom
brontolosone:1436_openrosa_attachment_update_auditing

Conversation

@brontolosone
Copy link
Contributor

@brontolosone brontolosone commented Oct 24, 2025

Closes getodk/central#1436

In OpenRosa one can make a submission with 0 or more of its attachments attached in a single POST. In subsequent POSTs one can add the remaining attachments.

In contrast to what the API documentation states, the attachment additions borne from the first submission were actually auditlogged. But, subsequent POSTs' attachment updates were not auditlogged.

This PR makes it so that subsequent OR attachment set additions are also logged.

Additionally, it avoids a bit of needless DB churn that was taking place when binary content was already present.

What has been done to verify that this works as intended?

Tests. The first commit is a test that fails, the later commits prevent the failure.

Why is this the best possible solution? Were any other approaches considered?

The least labour intensive alternative would be to just correct the docs to reflect the current behaviour, rather than amending the behaviour. But it just so happens that I need to have attachment update audit events from OpenRosa subsequent-submissions, and it's generally a good thing that those are logged just the same as attachment additions taking place through the REST API.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

I think everyone will be happy.

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Done.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

- Make inserting a blob which already exists a true no-op
  (saves DB churn; the sha update of the replaced approach caused a row copy)
- Remove comment related to force-a-write-so-we-can-return-the-id
  (as we do it differently now)
- Remove comment about avoiding a TOCTOU issue by sending the file over
  (potentially unnecessarily).
  1. There was almost a TOCTOU issue anyway, but it was elided by
     touching every used row, which makes concurrent deleters/updaters
     of that row block until our transaction commit.
  2. Avoiding sending the file over *and* avoiding a TOCTOU should be
     doable with a row lock or advisory lock.
…sequent POSTs (continuations of attachment uploads). Fixes getodk/central#1426
@brontolosone brontolosone force-pushed the 1436_openrosa_attachment_update_auditing branch from b9cf49b to b240423 Compare October 24, 2025 19:19
@brontolosone brontolosone requested a review from alxndrsn October 24, 2025 19:46
@brontolosone brontolosone marked this pull request as ready for review October 24, 2025 19:46
@ktuite ktuite self-requested a review October 30, 2025 15:26
@matthew-white
Copy link
Member

Just checking in about this PR with an eye toward regression testing coming up. @ktuite, do you think you'll have time to review this? No worries if not, we could also find a different reviewer.

@ktuite
Copy link
Member

ktuite commented Dec 2, 2025

✨ Report ✨

I wanted to see this in action on the frontend, partly to understand the problem, and partly to check that these events didn't show up in an unexpected way on frontend.

I'm not sure how to get Collect to miss attachments on purpose, so I tried a more direct approach of using python to send requests to the openrosa endpoint. (I first tried pyodk but it doesn't use openrosa.) I verified that I could trigger the "Missing attachment" state, and then get it to go away again but resending the submission with the other attachment.

Screenshot 2025-12-02 at 1 29 06 PM

Note: I haven't thought about submission attachments in many years, but I was surprised to see "Missing attachment" and both attachment checkmarks looking green and happy. Though the missing one has a 404 when you click on it. It's probably always been that way, where there are separate mechanisms for knowing if the submission intends to have an attachment vs. if the blob is actually present.

I verified that in the master branch, the audit logs look like this:

  • submission.create
  • submission.attachment.update for one provided attachment
  • no additional event after updating submission

I verified that in this new PR, more submission.attachment.update events are logged. I see the test of this, too.

Logs

  • submission.create
  • submission.attachment.update (original event) - name: cat1.png, newBlobId: 7138
    [ after resubmitting with both attachments]
  • submission.attachment.update (existing attachment) - name: cat1.png, newBlobId: None
  • submission.attachment.update (updated attachment) - name: cat2.png, newBlobId: 7133

Note: I'll add a specific comment about how in the test, it sends the submission with one attachment and then the other, but the first thing I tried was submission with one attachment and then submission with both attachments, which added an extra and somewhat odd event for reexamining the first attachment.

My original goal was to see what the submission detail page looked like with respect to this new logging.

The answer is that there's nothing about these events on the submission detail page.

But it's not because we're explicitly suppressing these submission.attachment.update events... it's just that none of them are logged with details->submissionId, which is what is used to fetch events for a specific submission.

Sample audit events:

[
 {'acteeId': '2a73370c-abb5-410d-aa20-b8f24b269b77',
  'action': 'submission.attachment.update',
  'actorId': 5,
  'details': {'instanceId': 'uuid:ad59876f-885c-406e-9f69-d286d471414c',
              'name': 'cat1.png',
              'newBlobId': 7138,
              'submissionDefId': 74110},
  'loggedAt': '2025-12-02T21:56:58.596Z',
  'notes': None},
 {'acteeId': '2a73370c-abb5-410d-aa20-b8f24b269b77',
  'action': 'submission.create',
  'actorId': 5,
  'details': {'instanceId': 'uuid:ad59876f-885c-406e-9f69-d286d471414c',
              'submissionDefId': 74110,
              'submissionId': 74104},
  'loggedAt': '2025-12-02T21:56:58.579Z',
  'notes': None},
]

I don't know if this is on purpose (not showing the update event) because it would be show something for every attachment on a submission even when it was first created. It's probably not intentional that we are filtering it out by not logging the submissionId in the event.

Hypothetically, if we were to include the submissionId in these events but changed nothing else, they would automatically show up like this:

Screenshot 2025-12-02 at 2 51 14 PM

If we wanted to do this, we would need to go through and find every place this attachment update event is logged (I found some by searching the code for newBlobId). Then we might want to suppress these events in a different way.

Conclusion

This PR isn't going to make anything weird or unexpected show up on the frontend!

The new logging makes sense and is consistent with how the attachment events are logged elsewhere in the code.

The only issue I can see from poking at it is with the logging when the same attachment is sent in again.

)
));
});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test currently fails because it has 3 events rather than 2, and in the extra event, blobNow is null.

      it('should handle resubmitting partial submission with all attachments', testService((service) =>
        service.login('alice', (asAlice) =>
          asAlice.post('/v1/projects/1/forms?publish=true')
            .set('Content-Type', 'application/xml')
            .send(testData.forms.binaryType)
            .expect(200)
            .then(() => asAlice.post('/v1/projects/1/submission')
              .set('X-OpenRosa-Version', '1.0')
              .attach('xml_submission_file', Buffer.from(testData.instances.binaryType.both), { filename: 'data.xml' })
              .attach('my_file1.mp4', Buffer.from('this is test file one'), { filename: 'my_file1.mp4' })
              .expect(201)
              .then(() => asAlice.get('/v1/audits?action=submission.attachment.update')
                .expect(200))
              .then(({ body }) => {
                body[0].details.name.should.equal('my_file1.mp4');
              })
            )
            .then(() => asAlice.post('/v1/projects/1/submission')
              .set('X-OpenRosa-Version', '1.0')
              .attach('xml_submission_file', Buffer.from(testData.instances.binaryType.both), { filename: 'data.xml' })
              .attach('my_file1.mp4', Buffer.from('this is test file one'), { filename: 'my_file1.mp4' })
              .attach('here_is_file2.jpg', Buffer.from('this is test file two'), { filename: 'here_is_file2.jpg' })
              .expect(201)
              .then(() => asAlice.get('/v1/audits?action=submission.attachment.update')
                .expect(200))
              .then(({ body }) => {
                body.length.should.equal(2);
                body[0].details.name.should.equal('here_is_file2.jpg');
                body[1].details.name.should.equal('my_file1.mp4');
              })
            )
        )
      ));

After poking around, was inspired to write another test:

      it.only('should handle resubmitting partial submission with contents of attachment changed', testService((service) =>
        service.login('alice', (asAlice) =>
          asAlice.post('/v1/projects/1/forms?publish=true')
            .set('Content-Type', 'application/xml')
            .send(testData.forms.binaryType)
            .expect(200)
            .then(() => asAlice.post('/v1/projects/1/submission')
              .set('X-OpenRosa-Version', '1.0')
              .attach('xml_submission_file', Buffer.from(testData.instances.binaryType.both), { filename: 'data.xml' })
              .attach('my_file1.mp4', Buffer.from('this is test file one'), { filename: 'my_file1.mp4' })
              .expect(201)
              .then(() => asAlice.get('/v1/audits?action=submission.attachment.update')
                .expect(200))
              .then(({ body }) => {
                body[0].details.name.should.equal('my_file1.mp4');
              })
            )
            .then(() => asAlice.post('/v1/projects/1/submission')
              .set('X-OpenRosa-Version', '1.0')
              .attach('xml_submission_file', Buffer.from(testData.instances.binaryType.both), { filename: 'data.xml' })
              .attach('my_file1.mp4', Buffer.from('file one contents have changed'), { filename: 'my_file1.mp4' })
              .attach('here_is_file2.jpg', Buffer.from('this is test file two'), { filename: 'here_is_file2.jpg' })
              .expect(201)
              .then(() => asAlice.get('/v1/audits?action=submission.attachment.update')
                .expect(200))
              .then(({ body }) => {
                body.length.should.equal(3);
                body[0].details.name.should.equal('here_is_file2.jpg');
                body[1].details.name.should.equal('my_file1.mp4');
                body[2].details.name.should.equal('my_file1.mp4');
                body[1].details.newBlobId.should.not.equal(body[2].details.newBlobId);
              })
            )
        )
      ));

).then(linkResults =>
// auditlog the updates (only those which were added or of which the content changed)
linkResults
.filter(el => el.blobBefore !== el.blobNow)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep the query in attach() the same, then blobs that you tried to attach but were already set will have a null blobNow, which you could then skip in the logging. Or you could tweak the query.

Suggested change
.filter(el => el.blobBefore !== el.blobNow)
.filter(el => el.blobNow && el.blobBefore !== el.blobNow)

@matthew-white
Copy link
Member

Very nice, thank you for taking a look! ✨

I was surprised to see "Missing attachment" and both attachment checkmarks looking green and happy. Though the missing one has a 404 when you click on it. It's probably always been that way, where there are separate mechanisms for knowing if the submission intends to have an attachment vs. if the blob is actually present.

I also think it's always been that way, but it's definitely a bit funny. There's a related issue filed at getodk/central#706.

@ktuite ktuite merged commit e819ca0 into getodk:master Dec 6, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API documentation incorrect wrt submission.attachment.update audit event

4 participants