Apply some performance optimizations based on profiling the table view

I had Claude Code for web run a profiler and got some useful tips on speeding up the table view (and other parts of Datasette too): https://gisthost.github.io/?bfa82bb7a4ba7c10bdb51e5406395362/index.html

The most interesting optimizations it found:

> ## Executive Summary
> 
> After comprehensive benchmarking of the datasette table JSON endpoint with 100,000 rows, here are the recommended optimizations ranked by impact:
> 
> | Rank | Optimization | Improvement | Priority | Effort |
> |------|--------------|-------------|----------|--------|
> | 1 | `refresh_schemas()` 1s throttle | **17%** | HIGH | Low |
> | 2 | Skip `remove_infinites()` for non-float tables | **15%** | HIGH | Medium |
> | 3 | `orjson` for JSON serialization | **15%** (1000 rows) | MEDIUM | Low |
> | 4 | Thread pool warmup | **36%** first request | LOW | Low |
> 
> ## Combined Impact
> 
> Applying optimizations #1, #2, and #3 together could yield approximately **35-45% improvement** for typical JSON API requests.
> 
> ---
> 
> ## Detailed Results
> 
> ### 1. refresh_schemas() Throttling ✅ HIGH PRIORITY
> 
> **Problem**: `refresh_schemas()` queries the internal database on every request to check for schema changes.
> 
> **Solution**: Add 1-second cooldown between schema checks.
> 
> | Metric | Before | After | Improvement |
> |--------|--------|-------|-------------|
> | refresh_schemas() call | 0.39 ms | 0.0004 ms | 979x faster |
> | JSON endpoint | 6.91 ms | 5.74 ms | **17% faster** |
> 
> **Implementation**:
> ```python
> async def refresh_schemas(self):
>     if time.time() - self._last_schema_refresh < 1.0:
>         return
>     # ... existing code
>     self._last_schema_refresh = time.time()
> ```
> 
> ---
> 
> ### 2. remove_infinites() Skip ✅ HIGH PRIORITY
> 
> **Problem**: `remove_infinites()` iterates every cell in every row looking for `float('inf')`, even when the table has no float columns.
> 
> **Solution**: Check column types at schema load, skip for tables without REAL/FLOAT columns.
> 
> | Metric | Before | After | Improvement |
> |--------|--------|-------|-------------|
> | 1000 rows (isolated) | 0.60 ms | 0.0002 ms | 3,178x faster |
> | JSON endpoint (1000 rows) | 11.87 ms | 10.08 ms | **15% faster** |
> 
> **Implementation**: Cache float column information per table during schema refresh.
> 
> ---
> 
> ### 3. orjson JSON Serialization ✅ MEDIUM PRIORITY
> 
> **Problem**: stdlib `json.dumps()` is slow for large result sets.
> 
> **Solution**: Use `orjson` as optional dependency with graceful fallback.
> 
> | Rows | json.dumps | orjson | Speedup |
> |------|------------|--------|---------|
> | 100 | 0.199 ms | 0.027 ms | 7.35x |
> | 1,000 | 1.902 ms | 0.267 ms | 7.12x |
> | 5,000 | 12.716 ms | 1.712 ms | 7.43x |
> 
> Endpoint impact: **5-15%** improvement depending on result size.
> 
> **Implementation**:
> ```python
> try:
>     import orjson
>     JSON_DUMPS = lambda data, **kw: orjson.dumps(data).decode('utf-8')
> except ImportError:
>     JSON_DUMPS = json.dumps
> ```
> 
> ---
> 
> ### 4. Thread Pool Warmup ⚪ LOW PRIORITY
> 
> **Problem**: First request is slow due to cold thread pool and connection establishment.
> 
> **Finding**:
> - First request: 24.6 ms (cold) → 6.5 ms (warm)
> - Steady state: 6.3 ms regardless
> - Thread pool overhead: 0.08 ms per query (acceptable)
> 
> **Current state**: Already well-optimized with connection caching (4.5x faster than no caching).
> 
> **Optional**: Add startup warmup for production deployments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apply some performance optimizations based on profiling the table view #2629

Executive Summary

Combined Impact

Detailed Results

1. refresh_schemas() Throttling ✅ HIGH PRIORITY

2. remove_infinites() Skip ✅ HIGH PRIORITY

3. orjson JSON Serialization ✅ MEDIUM PRIORITY

4. Thread Pool Warmup ⚪ LOW PRIORITY

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Rank	Optimization	Improvement	Priority	Effort
1	`refresh_schemas()` 1s throttle	17%	HIGH	Low
2	Skip `remove_infinites()` for non-float tables	15%	HIGH	Medium
3	`orjson` for JSON serialization	15% (1000 rows)	MEDIUM	Low
4	Thread pool warmup	36% first request	LOW	Low

Metric	Before	After	Improvement
refresh_schemas() call	0.39 ms	0.0004 ms	979x faster
JSON endpoint	6.91 ms	5.74 ms	17% faster

Metric	Before	After	Improvement
1000 rows (isolated)	0.60 ms	0.0002 ms	3,178x faster
JSON endpoint (1000 rows)	11.87 ms	10.08 ms	15% faster

Rows	json.dumps	orjson	Speedup
100	0.199 ms	0.027 ms	7.35x
1,000	1.902 ms	0.267 ms	7.12x
5,000	12.716 ms	1.712 ms	7.43x

Uh oh!

Apply some performance optimizations based on profiling the table view #2629

Description

Executive Summary

Combined Impact

Detailed Results

1. refresh_schemas() Throttling ✅ HIGH PRIORITY

2. remove_infinites() Skip ✅ HIGH PRIORITY

3. orjson JSON Serialization ✅ MEDIUM PRIORITY

4. Thread Pool Warmup ⚪ LOW PRIORITY

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions