Escape LIKE metacharacters#3013
Conversation
| $replacePairs[$metaCharacter] = $escapeChar.$metaChar; | ||
| } | ||
|
|
||
| return strtr($untrusted, $replacePairs); |
| } | ||
|
|
||
| abstract protected function getLikeEscapeChar(); | ||
| abstract protected function getLikeMetaCharacters(); |
dd1b35a to
8ed7349
Compare
|
Please tell me if you're fine with the feature, the naming, and I should add implementations, docs and tests. |
|
@greg0ire I have a feeling that in most supported DB platforms, escaping wildcard characters works almost the same. Instead of introducing new |
| */ | ||
| public function escapeStringForLike(string $untrustedString): string | ||
| { | ||
| $replacePairs = []; |
There was a problem hiding this comment.
The escape character itself should also be always escaped. Does this approach account for getLikeMetaCharacters() always returning getLikeEscapeChar() or escapeStringForLike() should handle that itself?
There was a problem hiding this comment.
I think the best is to let escapeStringForLike handle it, and I could rename "meta" to "wildcard", what do you think?
| * Escapes metacharacters in a string intended to be used with a LIKE | ||
| * operator. | ||
| */ | ||
| public function escapeStringForLike(string $untrustedString): string |
There was a problem hiding this comment.
This is not about trust. The input is an arbitrary string with no meaning while the output is a LIKE expression.
There was a problem hiding this comment.
It could indeed come from a trusted source and still need to be escaped because you genuinely want to search for that 👍
|
"Funny" thing: In MS SQL, the escape character has to be picked by the user and has no default: https://docs.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql |
In my personal experience, it defaults to
Why is then the |
Probably because you don't always need to escape characters |
|
For PostgreSQL and MySQL, you can specify an escape char too, but it defaults to backslash |
|
Also IIRC, in IBM DB2 there's also a different default escape character. Probably, we need to gather escaping examples for all DBs and then see if the |
|
MySQL sounds like a lot of fun: |
Well the only one which seems problematic to me is MS SQL, because it seems you have to append |
8ed7349 to
ad8435c
Compare
|
Ok so let's start the gathering
|
ad8435c to
e3f6433
Compare
That's right, I think an optional |
It should be either mandatory or not configurable at all and accessible in a read-only mode (e.g. If mandatory, the caller will always manually pass the character of choice to the escaping function and add it to Otherwise, the caller doesn't know if it's needed for a given string and platform and what to pass to |
|
I think the mandatory parameter might be best: it offers the most control, and it acts as a better reminder since people not knowing what to specify will read the phpdoc, where I can document that they need to also add |
539896f to
c9cf880
Compare
| final public function escapeStringForLike(string $inputString, string $escapeChar) : string | ||
| { | ||
| /* Some RDBMS' can deal with multibyte characters, but let us assume | ||
| * that kind of will not actually be needed */ |
There was a problem hiding this comment.
Tell me if I should allow them instead, or even remove that check and let the RDBMS deal with that.
There was a problem hiding this comment.
I'd say the DB should decide what's valid and what's not. Even if it's known that it should be one character for all supported platforms, there may exist other platform-specific limitations which we'll not be able to cover. Let's not even try.
d6a3807 to
47b0a67
Compare
|
I'd also add a functional test to make sure valid SQL expressions are built by means of the new API. |
| public function testItEscapesStringsForLikeProperly() : void | ||
| { | ||
| self::assertSame( | ||
| '25\% off \\\\o/', |
There was a problem hiding this comment.
Could you add the underscore here as well?
| ); | ||
| } | ||
|
|
||
| public function testItEscapesStringsForLikeProperly() : void |
There was a problem hiding this comment.
The "properly" part seems redundant. It's implied that the test will only pass if the API is implemented properly.
| * operator. | ||
| * | ||
| * @param string $inputString a literal, unquoted string | ||
| * @param string $escapeChar should be exactly one character long and |
There was a problem hiding this comment.
If it's not enforced by the API, I'd omit the "should be exactly one character long" part. It's still a DB abstraction layer where all DB limitations are implied but don't necessarily have to be called out. The rest is valid.
47b0a67 to
85815f6
Compare
|
Not sure if it's outside of the scope but maybe the $pattern = $platform->escapeStringForLike($input, '!') . '%';
$builder->like($pattern, '!');The builder, in turn, may or may not append the Looks a bit complicated but I can't see a simpler approach. |
f8bba7f to
a9e5e5a
Compare
|
I don't really care which way it is, but And there will be lots of dragons. 🐲 That's why I proposed using simple string instead of array. |
|
Agree, let’s revert to the string. It makes more sense with the regexp approach. |
|
Agreement! Great, let's do this! |
| public function testNotLikeWithEscape() | ||
| { | ||
| self::assertEquals( | ||
| "p.description NOT LIKE '20💩%' ESCAPE '💩'", |
Not needed: |
493310f to
ddda207
Compare
|
Oh well, I see it's done anyway. Waiting for CI, then 🚢 |
ddda207 to
fddd873
Compare
A string is made of characters, an array might contain anything.
fddd873 to
f397fe0
Compare
|
Ready to 🚢 |
|
👍 |
…mySelectSQL() For testing purposes and not only, it may be needed to evaluate an arbitrary SQL expression (see doctrine#3013, doctrine#3108). Instead of doing `str_replace()` on an expected string, one could specify the expression explicitly.
…mySelectSQL() For testing purposes and not only, it may be needed to evaluate an arbitrary SQL expression (see doctrine#3013, doctrine#3108). Instead of doing `str_replace()` on an expected string, one could specify the expression explicitly.
…mySelectSQL() For testing purposes and not only, it may be needed to evaluate an arbitrary SQL expression (see doctrine#3013, doctrine#3108). Instead of doing `str_replace()` on an expected string, one could specify the expression explicitly.
There seems to be a lack of tools to fight against wildcard attacks, or to simply provide a way to let your users search for strings that include metacharacters
todo
likeandnotLikemethods.getWildcardCharactersfor RDBMSs that have more than the default wildcard characters.