Note
This is an AI-generated research report. All text and code in this report was created by an LLM (Large Language Model). For more information on how these reports are created, see the main research repository.
A custom SQLite function for running ripgrep searches directly from SQL queries. Provides both a pure Python implementation and a C extension with table-valued function support.
- Fast code search via ripgrep from within SQL
- Configurable base directory - searches are constrained to a specific directory tree
- Time limit support - prevents runaway searches (inspired by datasette-ripgrep)
- File filtering - use glob patterns like
*.pyor*.js - Multiple arity - 1-3 arguments depending on what you need
- Two implementations:
- Pure Python (scalar function returning JSON)
- C extension (proper table-valued function)
No compilation needed - just import the module:
import sqlite3
from sqlite_ripgrep_python import register_ripgrep_function
conn = sqlite3.connect(':memory:')
register_ripgrep_function(conn, base_directory='/path/to/search')Build the shared library:
# Basic build (default base directory: /tmp)
make
# Build with custom base directory
make RIPGREP_BASE_DIR=/home/user/code
# Build with custom time limit (default: 1.0 seconds)
make RIPGREP_BASE_DIR=/home/user/code RIPGREP_TIME_LIMIT=2.0Load in Python:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.enable_load_extension(True)
conn.load_extension('./sqlite_ripgrep')Or load in SQLite CLI:
.load ./sqlite_ripgrepReturns results as a JSON string:
# Single argument - search pattern only
result = conn.execute("SELECT ripgrep('hello')").fetchone()[0]
# Two arguments - pattern + glob filter
result = conn.execute("SELECT ripgrep('def ', '*.py')").fetchone()[0]
# Three arguments - pattern + glob + time limit
result = conn.execute("SELECT ripgrep('TODO', '*.rs', 5.0)").fetchone()[0]The JSON result structure:
{
"results": [
{
"path": "src/main.py",
"line_number": 42,
"lines": "def hello_world():\n",
"submatches": [{"match": "def ", "start": 0, "end": 4}]
}
],
"count": 1,
"truncated": false,
"time_limit_hit": false,
"error": null
}Use json_each() to expand results as rows:
from sqlite_ripgrep_python import register_ripgrep_with_table_helper
conn = sqlite3.connect(':memory:')
register_ripgrep_with_table_helper(conn, '/path/to/search')
# Query results as table using json_each
rows = conn.execute("""
SELECT
json_extract(value, '$.path') as path,
json_extract(value, '$.line_number') as line_number,
json_extract(value, '$.lines') as line_text
FROM json_each(ripgrep_results('pattern', '*.py'))
""").fetchall()True table-valued function with proper SQL integration:
-- Basic search
SELECT path, line_number, line_text
FROM ripgrep
WHERE pattern = 'hello';
-- With file filter
SELECT * FROM ripgrep
WHERE pattern = 'def \w+' AND glob = '*.py';
-- With time limit
SELECT * FROM ripgrep
WHERE pattern = 'TODO' AND time_limit = 5.0;
-- Override base directory
SELECT * FROM ripgrep
WHERE pattern = 'error' AND base_dir = '/var/log';The C extension creates a virtual table with these columns:
| Column | Type | Description |
|---|---|---|
path |
TEXT | File path (relative to base directory) |
line_number |
INTEGER | Line number of the match |
line_text |
TEXT | Full text of the matching line |
match_text |
TEXT | The actual matched substring |
match_start |
INTEGER | Start offset of match in line |
match_end |
INTEGER | End offset of match in line |
pattern |
TEXT HIDDEN | Search pattern (required constraint) |
glob |
TEXT HIDDEN | File glob filter (optional) |
time_limit |
REAL HIDDEN | Time limit in seconds (optional) |
base_dir |
TEXT HIDDEN | Base directory override (optional) |
Run the test suite:
python3 test_sqlite_ripgrep.pyTests cover:
- Basic search functionality
- File filtering with globs
- Time limit behavior
- Case sensitivity
- Regex patterns
- Edge cases and error handling
- Both Python and C implementations
- ripgrep (
rg) must be installed and in PATH - Python 3.7+
- For C extension: GCC and SQLite development headers
Register the ripgrep scalar function with a SQLite connection.
conn: SQLite connection objectbase_directory: Root directory for searchesfunction_name: Name of the SQL function (default: 'ripgrep')default_time_limit: Default timeout in seconds (default: 1.0)
Same as above, but also registers {function_name}_results() for use with json_each().
Low-level function to run ripgrep directly. Returns a dictionary with results.
The extension automatically creates a ripgrep virtual table when loaded. Use WHERE clauses to provide search parameters.
-
Python table-valued functions: Python's sqlite3 module doesn't support true table-valued functions. The
json_each()workaround is functional but less elegant. -
Cross-join constraints: When joining the ripgrep virtual table with other tables, constraints referencing other table columns won't be pushed through to ripgrep. Use subqueries or CTEs instead.
-
Signal handling (C extension): The C extension uses
SIGALRMfor time limits, which may interfere with other signal handlers in the process.
SELECT path, line_number, line_text
FROM ripgrep
WHERE pattern = 'TODO|FIXME|XXX' AND glob = '*.py'
ORDER BY path, line_number;SELECT path, COUNT(*) as match_count
FROM ripgrep
WHERE pattern = 'import' AND glob = '*.py'
GROUP BY path
ORDER BY match_count DESC;result = run_ripgrep(
pattern='error',
base_directory='/var/log',
context_lines=2, # Show 2 lines before/after
time_limit=5.0
)MIT License
- Time limit implementation inspired by datasette-ripgrep by Simon Willison
- Built on ripgrep by Andrew Gallant