Lexer: Fix possible OOB read in quoted strings#374
Merged
JanJakes merged 5 commits intoApr 28, 2026
Conversation
Add tests to cover the backslash-counting loop in read_quoted_text(). These test cases expose an out-of-bounds string access triggered by unclosed quoted strings with trailing backslashes. Test cases: - Unclosed strings with odd/even trailing backslashes (single/double quotes) - Regression tests for valid escaped strings, doubled quotes, and backticks
The backslash-counting loop in read_quoted_text() ran before the EOF check. When strcspn() reached the end of the string without finding a closing quote, the loop accessed offsets past the string boundary: 1. strcspn() sets $at = strlen($sql) (no quote found). 2. Backslash check finds '\' at $at-1 (last byte), counts it ($i=1). 3. Odd count → treats absent quote as escaped, does $at += 1 (past end). 4. Next iteration: strcspn returns 0, $at stays past end. 5. Backslash check accesses $this->sql[strlen($sql)] → PHP warning. Fix: move the EOF check before the backslash-counting loop so unclosed strings are detected immediately. Also add a lower-bound guard to the backward walk to prevent underflow when a quote appears early in the string.
Add a test that simulates the real-world trigger: a streaming SQL processor splitting input at a chunk boundary that falls inside a quoted string, with a backslash as the last byte. This is the exact scenario from MySQL dump processing where FROM_BASE64() is replaced with regular escaped string literals.
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes an out-of-bounds string offset access in WP_MySQL_Lexer::read_quoted_text() when lexing unclosed quoted strings that end with backslashes (common in streaming/chunked SQL ingestion), and adds regression coverage to ensure the lexer no longer emits PHP warnings/notices in these cases.
Changes:
- Reorders EOF detection in
read_quoted_text()to bail out before backslash-escape scanning when no closing quote exists. - Adds a lower-bound guard to the backslash-counting loop to avoid negative/underflow string offsets.
- Adds PHPUnit tests covering unclosed quoted strings with trailing backslashes and a simulated chunk-boundary scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/mysql-on-sqlite/src/mysql/class-wp-mysql-lexer.php | Adjusts quoted-string scanning to avoid OOB access at EOF and guards backward backslash scanning. |
| packages/mysql-on-sqlite/tests/mysql/WP_MySQL_Lexer_Tests.php | Adds regression tests ensuring no warnings/notices are emitted for unclosed escaped strings and boundary-like inputs. |
Comments suppressed due to low confidence (1)
packages/mysql-on-sqlite/src/mysql/class-wp-mysql-lexer.php:2833
read_quoted_text()is defined with no parameters, but the docblock still documents a$quoteparameter, and there is at least one call site that passes an argument (e.g. handlingN'...'inread_next_token()). In PHP 7+ this will throw anArgumentCountErrorwhen that branch is hit. Either remove the extra argument at the call site (the quote is already at the current offset) and delete the stale@param, or changeread_quoted_text()to accept and use an explicit quote parameter consistently.
* 2. Backslashes escape the next character, unless NO_BACKSLASH_ESCAPES is set.
*/
private function read_quoted_text(): ?int {
$quote = $this->sql[ $this->bytes_already_read ];
$this->bytes_already_read += 1; // Consume the quote.
$no_backslash_escapes = $this->is_sql_mode_active(
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Merged
JanJakes
added a commit
that referenced
this pull request
Apr 28, 2026
## Release `3.0.0-rc.3` Version bump and changelog update for release `3.0.0-rc.3`. **Changelog draft:** * Lexer: Fix possible OOB read in quoted strings ([#374](#374)) * Add support for `NO_AUTO_VALUE_ON_ZERO` SQL mode ([#366](#366)) **Full changelog:** v3.0.0-rc.2...release/v3.0.0-rc.3 ## Next steps 1. **Review** the changes in this pull request. 2. **Push** any additional edits to this branch (`release/v3.0.0-rc.3`). 3. **Merge** this pull request to complete the release. Merging will automatically build the plugin ZIP and create a [GitHub release](https://github.com/WordPress/sqlite-database-integration/releases). > [!NOTE] > This is a **pre-release**. It will not be deployed to [WordPress.org](https://wordpress.org/plugins/sqlite-database-integration/).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes an out-of-bounds string access in
WP_MySQL_Lexer::read_quoted_text()that produces PHP warnings when lexing unclosed quoted strings with trailing backslashes:This appears to happen in streaming SQL processing (e.g., WordPress Playground's
runSqlblueprint step) when a buffer boundary splits a quoted string literal and a backslash falls at the end of the buffer. Standard MySQL dumps with escaped string literals contain thousands of backslashes, making this likely to hit in practice.The bug
The backslash-counting loop in
read_quoted_text()ran before the EOF check. Whenstrcspn()reached the end of the string without finding a closing quote:$atpointed tostrlen($sql)(one past the last byte).$this->sql[$at - 1]— valid, but if it was\:$at += 1(now past end).strcspnreturned 0,$atstayed past end.$this->sql[strlen($sql)]— out of bounds.Fix
Two changes to the
while (true)loop body:Move the EOF check before the backslash-counting loop. When
strcspnreaches end-of-string without finding the quote,$this->sql[$at] ?? nullwon't match$quote, so we returnnullimmediately — the backslash loop is never reached.Add a lower-bound guard to the backslash loop. The
forcondition now includes($at - $i - 1) >= 0to prevent underflow when a quote appears near the start of the string. Belt-and-suspenders — the EOF reorder already prevents the primary bug.Tests
Use of AI
This was diagnosed and implemented with the help of Claude Code.