Skip to content

Lexer: Fix possible OOB read in quoted strings#374

Merged
JanJakes merged 5 commits into
WordPress:trunkfrom
sirreal:fix/lexer-read-quoted-text-oob
Apr 28, 2026
Merged

Lexer: Fix possible OOB read in quoted strings#374
JanJakes merged 5 commits into
WordPress:trunkfrom
sirreal:fix/lexer-read-quoted-text-oob

Conversation

@sirreal

@sirreal sirreal commented Apr 27, 2026

Copy link
Copy Markdown
Member

Summary

Fixes an out-of-bounds string access in WP_MySQL_Lexer::read_quoted_text() that produces PHP warnings when lexing unclosed quoted strings with trailing backslashes:

Uninitialized string offset N in class-wp-mysql-lexer.php on line 2855

This appears to happen in streaming SQL processing (e.g., WordPress Playground's runSql blueprint step) when a buffer boundary splits a quoted string literal and a backslash falls at the end of the buffer. Standard MySQL dumps with escaped string literals contain thousands of backslashes, making this likely to hit in practice.

The bug

The backslash-counting loop in read_quoted_text() ran before the EOF check. When strcspn() reached the end of the string without finding a closing quote:

  1. $at pointed to strlen($sql) (one past the last byte).
  2. The backslash loop accessed $this->sql[$at - 1] — valid, but if it was \:
  3. The loop treated the absent quote as escaped and did $at += 1 (now past end).
  4. Next iteration: strcspn returned 0, $at stayed past end.
  5. The backslash loop accessed $this->sql[strlen($sql)]out of bounds.

Fix

Two changes to the while (true) loop body:

  1. Move the EOF check before the backslash-counting loop. When strcspn reaches end-of-string without finding the quote, $this->sql[$at] ?? null won't match $quote, so we return null immediately — the backslash loop is never reached.

  2. Add a lower-bound guard to the backslash loop. The for condition now includes ($at - $i - 1) >= 0 to prevent underflow when a quote appears near the start of the string. Belt-and-suspenders — the EOF reorder already prevents the primary bug.

Tests

  • Unclosed strings with odd/even trailing backslashes (single and double quotes).
  • Regression tests for valid escaped strings, doubled quotes, and backtick identifiers.
  • Chunk boundary test simulating a streaming SQL processor splitting input at a backslash inside a quoted string.

Use of AI

This was diagnosed and implemented with the help of Claude Code.

sirreal added 5 commits April 24, 2026 23:16
Add tests to cover the backslash-counting loop in read_quoted_text().
These test cases expose an out-of-bounds string access triggered by
unclosed quoted strings with trailing backslashes.

Test cases:
- Unclosed strings with odd/even trailing backslashes (single/double quotes)
- Regression tests for valid escaped strings, doubled quotes, and backticks
The backslash-counting loop in read_quoted_text() ran before the EOF
check. When strcspn() reached the end of the string without finding a
closing quote, the loop accessed offsets past the string boundary:

1. strcspn() sets $at = strlen($sql) (no quote found).
2. Backslash check finds '\' at $at-1 (last byte), counts it ($i=1).
3. Odd count → treats absent quote as escaped, does $at += 1 (past end).
4. Next iteration: strcspn returns 0, $at stays past end.
5. Backslash check accesses $this->sql[strlen($sql)] → PHP warning.

Fix: move the EOF check before the backslash-counting loop so unclosed
strings are detected immediately. Also add a lower-bound guard to the
backward walk to prevent underflow when a quote appears early in the
string.
Add a test that simulates the real-world trigger: a streaming SQL
processor splitting input at a chunk boundary that falls inside a
quoted string, with a backslash as the last byte. This is the exact
scenario from MySQL dump processing where FROM_BASE64() is replaced
with regular escaped string literals.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an out-of-bounds string offset access in WP_MySQL_Lexer::read_quoted_text() when lexing unclosed quoted strings that end with backslashes (common in streaming/chunked SQL ingestion), and adds regression coverage to ensure the lexer no longer emits PHP warnings/notices in these cases.

Changes:

  • Reorders EOF detection in read_quoted_text() to bail out before backslash-escape scanning when no closing quote exists.
  • Adds a lower-bound guard to the backslash-counting loop to avoid negative/underflow string offsets.
  • Adds PHPUnit tests covering unclosed quoted strings with trailing backslashes and a simulated chunk-boundary scenario.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
packages/mysql-on-sqlite/src/mysql/class-wp-mysql-lexer.php Adjusts quoted-string scanning to avoid OOB access at EOF and guards backward backslash scanning.
packages/mysql-on-sqlite/tests/mysql/WP_MySQL_Lexer_Tests.php Adds regression tests ensuring no warnings/notices are emitted for unclosed escaped strings and boundary-like inputs.
Comments suppressed due to low confidence (1)

packages/mysql-on-sqlite/src/mysql/class-wp-mysql-lexer.php:2833

  • read_quoted_text() is defined with no parameters, but the docblock still documents a $quote parameter, and there is at least one call site that passes an argument (e.g. handling N'...' in read_next_token()). In PHP 7+ this will throw an ArgumentCountError when that branch is hit. Either remove the extra argument at the call site (the quote is already at the current offset) and delete the stale @param, or change read_quoted_text() to accept and use an explicit quote parameter consistently.
	 *   2. Backslashes escape the next character, unless NO_BACKSLASH_ESCAPES is set.
	 */
	private function read_quoted_text(): ?int {
		$quote                     = $this->sql[ $this->bytes_already_read ];
		$this->bytes_already_read += 1; // Consume the quote.

		$no_backslash_escapes = $this->is_sql_mode_active(

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JanJakes JanJakes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sirreal Thanks for the fix! LGTM and works great!

@JanJakes JanJakes merged commit bb576ba into WordPress:trunk Apr 28, 2026
15 checks passed
@JanJakes JanJakes mentioned this pull request Apr 28, 2026
JanJakes added a commit that referenced this pull request Apr 28, 2026
## Release `3.0.0-rc.3`

Version bump and changelog update for release `3.0.0-rc.3`.

**Changelog draft:**
* Lexer: Fix possible OOB read in quoted strings
([#374](#374))
* Add support for `NO_AUTO_VALUE_ON_ZERO` SQL mode
([#366](#366))

**Full changelog:**
v3.0.0-rc.2...release/v3.0.0-rc.3

## Next steps

1. **Review** the changes in this pull request.
2. **Push** any additional edits to this branch (`release/v3.0.0-rc.3`).
3. **Merge** this pull request to complete the release.

Merging will automatically build the plugin ZIP and create a [GitHub
release](https://github.com/WordPress/sqlite-database-integration/releases).

> [!NOTE]
> This is a **pre-release**. It will not be deployed to
[WordPress.org](https://wordpress.org/plugins/sqlite-database-integration/).
@sirreal sirreal deleted the fix/lexer-read-quoted-text-oob branch April 28, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants