Skip to content

Apply some performance optimizations based on profiling the table view #2629

@simonw

Description

@simonw

I had Claude Code for web run a profiler and got some useful tips on speeding up the table view (and other parts of Datasette too): https://gisthost.github.io/?bfa82bb7a4ba7c10bdb51e5406395362/index.html

The most interesting optimizations it found:

Executive Summary

After comprehensive benchmarking of the datasette table JSON endpoint with 100,000 rows, here are the recommended optimizations ranked by impact:

Rank Optimization Improvement Priority Effort
1 refresh_schemas() 1s throttle 17% HIGH Low
2 Skip remove_infinites() for non-float tables 15% HIGH Medium
3 orjson for JSON serialization 15% (1000 rows) MEDIUM Low
4 Thread pool warmup 36% first request LOW Low

Combined Impact

Applying optimizations #1, #2, and #3 together could yield approximately 35-45% improvement for typical JSON API requests.


Detailed Results

1. refresh_schemas() Throttling ✅ HIGH PRIORITY

Problem: refresh_schemas() queries the internal database on every request to check for schema changes.

Solution: Add 1-second cooldown between schema checks.

Metric Before After Improvement
refresh_schemas() call 0.39 ms 0.0004 ms 979x faster
JSON endpoint 6.91 ms 5.74 ms 17% faster

Implementation:

async def refresh_schemas(self):
    if time.time() - self._last_schema_refresh < 1.0:
        return
    # ... existing code
    self._last_schema_refresh = time.time()

2. remove_infinites() Skip ✅ HIGH PRIORITY

Problem: remove_infinites() iterates every cell in every row looking for float('inf'), even when the table has no float columns.

Solution: Check column types at schema load, skip for tables without REAL/FLOAT columns.

Metric Before After Improvement
1000 rows (isolated) 0.60 ms 0.0002 ms 3,178x faster
JSON endpoint (1000 rows) 11.87 ms 10.08 ms 15% faster

Implementation: Cache float column information per table during schema refresh.


3. orjson JSON Serialization ✅ MEDIUM PRIORITY

Problem: stdlib json.dumps() is slow for large result sets.

Solution: Use orjson as optional dependency with graceful fallback.

Rows json.dumps orjson Speedup
100 0.199 ms 0.027 ms 7.35x
1,000 1.902 ms 0.267 ms 7.12x
5,000 12.716 ms 1.712 ms 7.43x

Endpoint impact: 5-15% improvement depending on result size.

Implementation:

try:
    import orjson
    JSON_DUMPS = lambda data, **kw: orjson.dumps(data).decode('utf-8')
except ImportError:
    JSON_DUMPS = json.dumps

4. Thread Pool Warmup ⚪ LOW PRIORITY

Problem: First request is slow due to cold thread pool and connection establishment.

Finding:

  • First request: 24.6 ms (cold) → 6.5 ms (warm)
  • Steady state: 6.3 ms regardless
  • Thread pool overhead: 0.08 ms per query (acceptable)

Current state: Already well-optimized with connection caching (4.5x faster than no caching).

Optional: Add startup warmup for production deployments.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions