Skip to content

Conversation

@Majkl578
Copy link
Contributor

@Majkl578 Majkl578 commented Jul 2, 2018

This change makes array_key_exists() actually faster than isset() by ~25% (tested with GCC 8, -O3, march=native, mtune=native).

addresses https://bugs.php.net/bug.php?id=76148
addresses https://bugs.php.net/bug.php?id=71239

With a synthetic benchmark:

<?php

$foo = [123 => 456, 'abc' => 'def'];

$iterations = 500000000;

$a = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
    isset($foo[123]);
}
printf('isset (%dx): %.2fs%s', $iterations, microtime(true) - $a, PHP_EOL);

$b = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
    array_key_exists(123, $foo);
}
printf('array_key_exists (%dx): %.2fs%s', $iterations, microtime(true) - $b, PHP_EOL);

It yields results like:
7.2 (unpatched):

isset (500000000x): 7.65s
array_key_exists (500000000x): 9.03s

7.3 + patch:

isset (500000000x): 7.60s
array_key_exists (500000000x): 5.65s

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! I've left a few implementation notes, in particular in regard to how performance may be improved.

subject = GET_OP2_ZVAL_PTR(BP_VAR_R);
key = GET_OP1_ZVAL_PTR(BP_VAR_R);

if (UNEXPECTED(Z_ISREF_P(subject))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more efficient (and for VM code, more idiomatic) way to handle this is to check for ISREF prior to the error case and use a goto try_again pattern. See https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L4419 for an example.


SAVE_OPLINE();

subject = GET_OP2_ZVAL_PTR(BP_VAR_R);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to use GET_OP2_ZVAL_PTR_UNDEF here and check the UNDEF case for IS_CV inside the branch that generates the type error. For an example of delayed UNDEF handling see https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L59.

zend_error(E_WARNING, "array_key_exists(): The first argument should be either a string or an integer");
result = 0;
}
ZVAL_BOOL(EX_VAR(opline->result.var), result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code between IS_ARRAY and IS_OBJECT is duplicated here (which is actually rather dubious and will handle some cases incorrectly, but that's an existing issue and best solved by deprecating array_key_exists on objects entirely). It would be better to only extract the hashtable using Z_ARRVAL_P and Z_OBJPROP_P in these branches and have only one copy of the common key handling code afterwards.

case ZEND_ASSERT_CHECK:
case ZEND_IN_ARRAY:
case ZEND_ARRAY_KEY_EXISTS:
UPDATE_SSA_TYPE(MAY_BE_FALSE|MAY_BE_TRUE, ssa_ops[i].result_def);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_key_exists() can also return null for the zpp failure case.

Additionally, ARRAY_KEY_EXISTS needs to be handled in https://github.com/php/php-src/blob/master/ext/opcache/Optimizer/sccp.c#L1328. Looking at the ISSET_ISEMPTY_DIM_OBJ case should provide inspiration here.


FREE_OP2();
FREE_OP1();
ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_key_exists() is a function that would benefit from smart branch handling. See https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L8085 how this is done for ZEND_IN_ARRAY. You'll also have to add it to https://github.com/php/php-src/blob/master/Zend/zend_compile.c#L2181.

@Majkl578
Copy link
Contributor Author

Majkl578 commented Jul 3, 2018

@nikic Thanks for your review 🥇, I'll try to address it later this week.

@KalleZ
Copy link
Member

KalleZ commented Jul 3, 2018

Also you should just revert the change to NEWS, usually the merger of the PR will do this part =)

@cmb69
Copy link
Member

cmb69 commented Jul 3, 2018

Also you should just revert the change to NEWS, usually the merger of the PR will do this part =)

For this PR an (additional) entry in UPGRADING appears to be appropriate (UPGRADING usually serves as base for the migration guide in the PHP manual, but NEWS is likely to be overlooked, and this improvement seems to be quite noteworthy).

@laruence
Copy link
Member

laruence commented Jul 4, 2018

Although isset and array_key_exists behavior slightly different, but I think in most cases, array_key_exists can be repalced with isset..

thus, I doubt such optimization is worthy to do...

@nikic
Copy link
Member

nikic commented Jul 4, 2018

@laruence array_key_exists() is common in foundational libraries that need to support null values. These are often very hot code paths and currently the usual way to handle them is if (isset($array[$key]) || \array_key_exists($key, $array)), because directly calling array_key_exists() is too slow.

@nikic
Copy link
Member

nikic commented Jul 4, 2018

@cmb69 Currently we usually do not include performance changes in UPGRADING, unless they come with some observable behavior change. But I think it's a good idea to start doing this, maybe under a new "Performance Improvements" section. We do tend to have a fair bit of improvements in each release, but they aren't really called out anywhere.

@cmb69
Copy link
Member

cmb69 commented Jul 4, 2018

@nikic Even if we won't document performance changes, I still think this very change should be documented, since it can affect how userland code will be written. You already gave a typical example which could be improved wrt. this PR.

@hikari-no-yume
Copy link
Contributor

hikari-no-yume commented Jul 4, 2018

By the way, it would be nice to have a comment above the code for the normal function array_key_exists that mentions this special case. I added one for all the special cases there were some time ago (2015 maybe?), because I had modified a special-cased function's behaviour (in the actual function, not the compiler) and spent some time confused as to why the change wasn't taking effect.

@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from 2265ca4 to 6a76ae4 Compare August 8, 2018 14:38
@Majkl578 Majkl578 changed the base branch from master to PHP-7.3 August 8, 2018 14:39
@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from 6a76ae4 to 4ff60bb Compare August 8, 2018 14:40
@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 8, 2018

@nikic I adressed all your comments, except one: SCCP for this new opcode, since it's a bit beyond my current knowledge of PHP internals. May I ask you or anyone else to help me with this one? Thanks!

@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 8, 2018

gcc-8 -O0

isset (100000000x): 3.57s
array_key_exists (100000000x): 3.33s

gcc-8 -O3

isset (100000000x): 1.49s
array_key_exists (100000000x): 1.34s

@nikic Can you please another round of review? Thanks.

@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from 4ff60bb to 16c5a7b Compare August 8, 2018 15:03
@nikic
Copy link
Member

nikic commented Aug 8, 2018

/cc @cmb69 for whether this can still land in 7.3. It introduces a new opcode, not sure what our policy is there. I guess as it may require further extension changes (xdebug support etc) probably too late?


ZEND_VM_C_LABEL(try_again):

if (UNEXPECTED(Z_ISREF_P(subject)) || UNEXPECTED(Z_ISREF_P(key))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite what I had in mind regarding the reference handling. The idea is to do something like this:

ZEND_VM_C_LABEL(try_again_subject):
	if (EXPECTED(Z_TYPE_P(subject) == IS_ARRAY)) {
		ht = Z_ARRVAL_P(subject);
	} else if (UNEXPECTED(Z_TYPE_P(subject) == IS_OBJECT)) {
		ht = Z_OBJPROP_P(subject);
	} else if (Z_TYPE_P(Z_ISREF_P(subject)) {
        subject = Z_REFVAL_P(subject);
        ZEND_VM_C_GOTO(try_again_subject);
    } else {
		if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(key) == IS_UNDEF)) {
			key = GET_OP1_UNDEF_CV(key, BP_VAR_R);
		}
		if (OP2_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(subject) == IS_UNDEF)) {
			subject = GET_OP2_UNDEF_CV(subject, BP_VAR_R);
		}
		zend_internal_type_error(EX_USES_STRICT_TYPES(), "array_key_exists() expects parameter 2 to be array, %s given", zend_get_type_by_const(Z_TYPE_P(subject)));
		FREE_OP2();
		FREE_OP1();
		ZEND_VM_SMART_BRANCH(result, 0);
		ZVAL_NULL(EX_VAR(opline->result.var));
		ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();
    }

The point is that the check for the reference is only done after the most likely case (subject is an array) is already handled, so that rare cases (references) don't have a runtime impact on common ones (array).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, got it!

zend_internal_type_error(EX_USES_STRICT_TYPES(), "array_key_exists() expects parameter 2 to be array, %s given", zend_get_type_by_const(Z_TYPE_P(subject)));
FREE_OP2();
FREE_OP1();
ZEND_VM_SMART_BRANCH(result, 0);
Copy link
Member

@nikic nikic Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe result is uninitialized here. It should be fine to just drop the smart branch line here, as this is a rare error case anyway.

@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from 16c5a7b to bc6db1f Compare August 8, 2018 15:32
@cmb69
Copy link
Member

cmb69 commented Aug 8, 2018

Regarding PHP 7.3: I cannot assess how many extensions and SAPIs (phpdbg?) would be affected, and how much work it would be to update them, but given that Xdebug is not ready for PHP 7.3 yet, and the long UPGRADING.INTERNALS it might indeed be better to delay this to PHP 7.4. Otherwise this issue should be discussed on internals@.

@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 8, 2018

I only quickly checked code of XDebug and PHPDBG and found no usages of i.e. ZEND_COUNT or ZEND_IN_ARRAY (which is basically the same thing). Of course I don't know if there are any indirect implications for these tools, but it doesn't seem so, unlike i.e. new syntax.

@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 8, 2018

http://news.php.net/php.internals/103064

@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 9, 2018

(Waiting on author label can be removed.)

@Majkl578 Majkl578 changed the base branch from PHP-7.3 to master August 14, 2018 01:37
@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from bc6db1f to c9a918b Compare August 14, 2018 01:38
@Majkl578
Copy link
Contributor Author

Alright, retargeted to master/7.4.

@Majkl578
Copy link
Contributor Author

I'll handle the feedback over next few days. 👍

@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from 9d43818 to f473132 Compare December 26, 2018 17:39
@Majkl578
Copy link
Contributor Author

@dstogov Adressed your comments and also rebased again. 👍

result = zend_hash_exists_ind(ht, ZSTR_EMPTY_ALLOC());
} else if (Z_TYPE_P(key) <= IS_NULL) {
if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(key) == IS_UNDEF)) {
ZVAL_UNDEFINED_OP1();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be OP2_TYPE ans ZVAL_UNDEFINED_OP2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, you changed OP1->OP2 in wrong place. I'll take care about this and merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad. Everything is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. My mistake. It's already merged into master.

@nikic nikic closed this Dec 26, 2018
@nikic
Copy link
Member

nikic commented Dec 26, 2018

I've added the perf improvement section for UPGRADING which I mentioned above: f1c0e67

@Majkl578 Majkl578 deleted the ZEND_ARRAY_KEY_EXISTS-opcode branch December 26, 2018 21:52
@Majkl578
Copy link
Contributor Author

Thanks @nikic @dstogov @cmb69!

TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Nov 17, 2019
Make opcache replace the result with false if the array argument is
known to be empty.
This may be useful when a codebase has placeholders,
e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }`

In zend_inference.c: In php 8, array_key_exists will throw
a TypeError instead of returning null.

I didn't see any discussion of this optimization (for/against)
after a quick search on github, e.g. phpGH-3360

Potential future optimizations:

- convert `in_array($needle, ['only one element'], true)` to `===`?
  (or `==` for strict=false)
- When the number of elements is less than 4, switch to looping instead of hash
  lookup. (exact threshold for better performance to be determined)

  Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
germanow added a commit to germanow/arrays that referenced this pull request Feb 12, 2020
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Dec 20, 2020
Make opcache replace the result with false if the array argument is
known to be empty.
This may be useful when a codebase has placeholders,
e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }`

In zend_inference.c: In php 8, array_key_exists will throw
a TypeError instead of returning null.

I didn't see any discussion of this optimization (for/against)
after a quick search on github, e.g. phpGH-3360

Potential future optimizations:

- convert `in_array($needle, ['only one element'], true)` to `===`?
  (or `==` for strict=false)
- When the number of elements is less than 4, switch to looping instead of hash
  lookup. (exact threshold for better performance to be determined)

  Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
Muqsit added a commit to Muqsit/VanillaGenerator that referenced this pull request Feb 19, 2021
Besides being faster (since php/php-src#3360), array_key_exists is also safer as it narrows down key existence to arrays only and performs type-checking on both parameters.
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Dec 28, 2021
Make opcache replace the result with false if the array argument is
known to be empty.
This may be useful when a codebase has placeholders,
e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }`

In zend_inference.c: In php 8, array_key_exists will throw
a TypeError instead of returning null.

I didn't see any discussion of this optimization (for/against)
after a quick search on github, e.g. phpGH-3360

Potential future optimizations:

- convert `in_array($needle, ['only one element'], true)` to `===`?
  (or `==` for strict=false)
- When the number of elements is less than 4, switch to looping instead of hash
  lookup. (exact threshold for better performance to be determined)

  Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Dec 28, 2021
Make opcache replace the result with false if the array argument is
known to be empty.
This may be useful when a codebase has placeholders,
e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }`

In zend_inference.c: In php 8, array_key_exists will throw
a TypeError instead of returning null.

I didn't see any discussion of this optimization (for/against)
after a quick search on github, e.g. phpGH-3360

Potential future optimizations:

- convert `in_array($needle, ['only one element'], true)` to `===`?
  (or `==` for strict=false)
- When the number of elements is less than 4, switch to looping instead of hash
  lookup. (exact threshold for better performance to be determined)

  Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Jan 6, 2026
This is pretty “hot” code (compare also change I906d5b3f2f, commit
9c89543), and by using \array_key_exists() instead of
array_key_exists(), we inform PHP that we’re looking for the built-in
function and not a potential \MediaWiki\Config\array_key_exists() or
\MediaWiki\array_key_exists() function. This in turn lets opcache
compile it to a custom ARRAY_KEY_EXISTS opcode (see PHP PR #3360 [1] and
commits f5044a12dd [2] + 0bbfebb6d9 [3]).

Ori Livneh benchmarked a version of this change and noticed a ~40%
improvement on getAll() for enwiki, from 0.952 ms to 0.688 ms.

[1]: php/php-src#3360
[2]: php/php-src@f5044a12dd
[3]: php/php-src@0bbfebb6d9

Change-Id: Id4e932e7c9512eba6a9843486a6539b0165a1b61
wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Jan 7, 2026
This is pretty “hot” code (compare also change I906d5b3f2f, commit
9c89543), and by using \array_key_exists() instead of
array_key_exists(), we inform PHP that we’re looking for the built-in
function and not a potential \MediaWiki\Config\array_key_exists() or
\MediaWiki\array_key_exists() function. This in turn lets opcache
compile it to a custom ARRAY_KEY_EXISTS opcode (see PHP PR #3360 [1] and
commits f5044a12dd [2] + 0bbfebb6d9 [3]).

Ori Livneh benchmarked a version of this change and noticed a ~40%
improvement on getAll() for enwiki, from 0.952 ms to 0.688 ms.

[1]: php/php-src#3360
[2]: php/php-src@f5044a12dd
[3]: php/php-src@0bbfebb6d9

Change-Id: Id4e932e7c9512eba6a9843486a6539b0165a1b61
wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Jan 7, 2026
This is pretty “hot” code (compare also change I906d5b3f2f, commit
9c89543), and by using \array_key_exists() instead of
array_key_exists(), we inform PHP that we’re looking for the built-in
function and not a potential \MediaWiki\Config\array_key_exists() or
\MediaWiki\array_key_exists() function. This in turn lets opcache
compile it to a custom ARRAY_KEY_EXISTS opcode (see PHP PR #3360 [1] and
commits f5044a12dd [2] + 0bbfebb6d9 [3]).

Ori Livneh benchmarked a version of this change and noticed a ~40%
improvement on getAll() for enwiki, from 0.952 ms to 0.688 ms.

[1]: php/php-src#3360
[2]: php/php-src@f5044a12dd
[3]: php/php-src@0bbfebb6d9

Change-Id: Id4e932e7c9512eba6a9843486a6539b0165a1b61
wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Jan 7, 2026
This is pretty “hot” code (compare also change I906d5b3f2f, commit
9c89543), and by using \array_key_exists() instead of
array_key_exists(), we inform PHP that we’re looking for the built-in
function and not a potential \MediaWiki\Config\array_key_exists() or
\MediaWiki\array_key_exists() function. This in turn lets opcache
compile it to a custom ARRAY_KEY_EXISTS opcode (see PHP PR #3360 [1] and
commits f5044a12dd [2] + 0bbfebb6d9 [3]).

Ori Livneh benchmarked a version of this change and noticed a ~40%
improvement on getAll() for enwiki, from 0.952 ms to 0.688 ms.

[1]: php/php-src#3360
[2]: php/php-src@f5044a12dd
[3]: php/php-src@0bbfebb6d9

Change-Id: Id4e932e7c9512eba6a9843486a6539b0165a1b61
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants