Skip to content

Add request.form() for multipart form data and file uploads#2626

Merged
simonw merged 12 commits intomainfrom
claude/add-multipart-upload-80IgV
Jan 29, 2026
Merged

Add request.form() for multipart form data and file uploads#2626
simonw merged 12 commits intomainfrom
claude/add-multipart-upload-80IgV

Conversation

@simonw
Copy link
Owner

@simonw simonw commented Jan 22, 2026

New Request.form() method that handles both application/x-www-form-urlencoded
and multipart/form-data content types with streaming parsing.

Features:

  • Streaming multipart parser that doesn't buffer entire body in memory
  • Files spill to disk above 1MB threshold via SpooledTemporaryFile
  • files=False (default) discards file content, files=True stores them
  • Security limits: max_request_size, max_file_size, max_fields, max_files
  • FormData container with dict-like access and getlist() for multiple values
  • UploadedFile class with async read(), seek(), filename, content_type, size
  • Support for RFC 5987 filename* encoding for international filenames

Uses multipart-form-data-conformance test suite for validation.


📚 Documentation preview 📚: https://datasette--2626.org.readthedocs.build/en/2626/

New Request.form() method that handles both application/x-www-form-urlencoded
and multipart/form-data content types with streaming parsing.

Features:
- Streaming multipart parser that doesn't buffer entire body in memory
- Files spill to disk above 1MB threshold via SpooledTemporaryFile
- files=False (default) discards file content, files=True stores them
- Security limits: max_request_size, max_file_size, max_fields, max_files
- FormData container with dict-like access and getlist() for multiple values
- UploadedFile class with async read(), seek(), filename, content_type, size
- Support for RFC 5987 filename* encoding for international filenames

Uses multipart-form-data-conformance test suite for validation.
- Migrate PermissionsDebugView, MessagesDebugView, and CreateTokenView
  from post_vars() to form()
- Add documentation for request.form(), FormData, and UploadedFile classes
@simonw
Copy link
Owner Author

simonw commented Jan 26, 2026

I just tried this out in a branch of datasette-files and it worked well:

@simonw
Copy link
Owner Author

simonw commented Jan 26, 2026

I had GPT-5.2 Thinking do a code review and got some very useful feedback: https://chatgpt.com/share/6976e7c9-2914-8006-8e82-2d4816b3ced1

The summary:

12) A prioritized checklist of changes I’d make first

  1. Event loop safety: stop doing blocking disk writes on the loop thread
    • easiest: await asyncio.to_thread(parser.feed, chunk) in parse_form_data()
  2. Make UploadedFile methods either sync or thread-offloaded async (don’t pretend).
  3. Switch buffering to bytearray (both parser buffer and field bodies) to avoid quadratic behavior.
  4. Enforce max_files and max_file_size even when files=False (DoS protection).
  5. Add header limits (bytes + line count) per part.
  6. In finalize(), error if multipart didn’t reach the closing boundary.
  7. Validate ASGI message types (http.request, http.disconnect).

simonw and others added 6 commits January 28, 2026 09:50
…eption (#2627)

* Fix test isolation bug in test_startup_error_from_plugin_is_click_exception

The test creates a plugin that raises StartupError("boom") and registers it
in the global plugin manager (pm). Without cleanup, this plugin leaks to
subsequent tests, causing test_setting_boolean_validation_false_values to
fail with "Error: boom" instead of "Forbidden".

Add try/finally block to ensure the plugin is unregistered after the test
completes, following the established cleanup pattern used elsewhere in
the test suite.

* Fix blacken-docs formatting in plugin_hooks.rst

Apply blacken-docs formatting to code example that exceeded
the 60 character line limit.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Centralize multipart defaults and expose stricter limits via Request.form().

Enforce header, part, file, and disk space limits even when files are discarded; detect truncated bodies and client disconnects; and move blocking work off the event loop.

Add FormData close/aclose context managers, update internals docs, and expand multipart tests (including len semantics and stricter conformance expectations).
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 93.15673% with 31 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@ffadb5f). Learn more about missing BASE report.
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
datasette/utils/multipart.py 92.79% 31 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2626   +/-   ##
=======================================
  Coverage        ?   90.37%           
=======================================
  Files           ?       52           
  Lines           ?     7983           
  Branches        ?        0           
=======================================
  Hits            ?     7215           
  Misses          ?      768           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@simonw
Copy link
Owner Author

simonw commented Jan 28, 2026

The failing test only affects SQLite 3.25.3 and only for Python 3.10 and 3.12.

_______________ test_upsert[True-initial0-input0-expected_rows0] _______________
[gw1] linux -- Python 3.10.19 /opt/hostedtoolcache/Python/3.10.19/x64/bin/python

ds_write = <datasette.app.Datasette object at 0x7ff30b42a5c0>
initial = {'pk': 'id', 'rows': [{'id': 1, 'title': 'One'}], 'table': 'upsert_test'}
input = {'return': True, 'rows': [{'id': 1, 'title': 'Two'}]}
expected_rows = [{'id': 1, 'title': 'Two'}], should_return = True

    @pytest.mark.asyncio
    @pytest.mark.parametrize(
        "initial,input,expected_rows",
        (
            (
                # Simple primary key update
                {"rows": [{"id": 1, "title": "One"}], "pk": "id"},
                {"rows": [{"id": 1, "title": "Two"}]},
                [
                    {"id": 1, "title": "Two"},
                ],
            ),
            (
                # Multiple rows update one of them
                {
                    "rows": [{"id": 1, "title": "One"}, {"id": 2, "title": "Two"}],
                    "pk": "id",
                },
                {"rows": [{"id": 1, "title": "Three"}]},
                [
                    {"id": 1, "title": "Three"},
                    {"id": 2, "title": "Two"},
                ],
            ),
            (
                # rowid update
                {"rows": [{"title": "One"}]},
                {"rows": [{"rowid": 1, "title": "Two"}]},
                [
                    {"rowid": 1, "title": "Two"},
                ],
            ),
            (
                # Compound primary key update
                {"rows": [{"id": 1, "title": "One", "score": 1}], "pks": ["id", "score"]},
                {"rows": [{"id": 1, "title": "Two", "score": 1}]},
                [
                    {"id": 1, "title": "Two", "score": 1},
                ],
            ),
            (
                # Upsert with an alter
                {"rows": [{"id": 1, "title": "One"}], "pk": "id"},
                {"rows": [{"id": 1, "title": "Two", "extra": "extra"}], "alter": True},
                [{"id": 1, "title": "Two", "extra": "extra"}],
            ),
        ),
    )
    @pytest.mark.parametrize("should_return", (False, True))
    async def test_upsert(ds_write, initial, input, expected_rows, should_return):
        token = write_token(ds_write)
        # Insert initial data
        initial["table"] = "upsert_test"
        create_response = await ds_write.client.post(
            "/data/-/create",
            json=initial,
            headers=_headers(token),
        )
        assert create_response.status_code == 201
        if should_return:
            input["return"] = True
        response = await ds_write.client.post(
            "/data/upsert_test/-/upsert",
            json=input,
            headers=_headers(token),
        )
>       assert response.status_code == 200, response.text
E       AssertionError: {"ok": false, "errors": ["Row 0 is missing primary key column(s): \"rowid\""]}
E       assert 400 == 200
E        +  where 400 = <Response [400 Bad Request]>.status_code

@simonw
Copy link
Owner Author

simonw commented Jan 28, 2026

Retrying in CI caused it to pass, so this is a flaky test.

Comment on lines +729 to +733
async def flush_batch() -> None:
if batch:
data = bytes(batch)
batch.clear()
await asyncio.to_thread(parser.feed, data)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bit that runs the parsing code in a thread (the MultipartParser class uses blocking methods throughout, but gets run in a thread here.)

@simonw
Copy link
Owner Author

simonw commented Jan 29, 2026

Fingers crossed that fixed the flaky tests! Gonna land this now and ship it in an alpha.

@simonw simonw merged commit 40a3730 into main Jan 29, 2026
39 checks passed
@simonw simonw deleted the claude/add-multipart-upload-80IgV branch January 29, 2026 02:41
@simonw
Copy link
Owner Author

simonw commented Jan 29, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants