Skip to content

Port some perf improvement commits#1315

Merged
saghul merged 18 commits intomasterfrom
port-perf
Jan 29, 2026
Merged

Port some perf improvement commits#1315
saghul merged 18 commits intomasterfrom
port-perf

Conversation

@saghul
Copy link
Copy Markdown
Contributor

@saghul saghul commented Jan 12, 2026

See each individual commit for details.

Use expand_fast_array and direct array assignment instead of
JS_CreateDataPropertyUint32 for better performance when creating
arrays from value arrays.

Ref: bellard/quickjs@f4951ef
- Made JS_GetGlobalVar and JS_SetGlobalVar inline with fast paths
- Removed OP_check_var and OP_put_var_strict opcodes
- Simplified optimize_scope_make_global_ref to not use strict mode special code
- Simplified bytecode optimizations that referenced removed opcodes
- Updated microbench.js to use normal functions instead of eval
- Regenerated bytecode for updated opcode indices

Note: This removes full compliance with the spec for strict mode variable
assignment so that they are as fast as in non strict mode (V8, SpiderMonkey
and JavaScriptCore do the same).

Ref: bellard/quickjs@2c90110
Add inline fast paths for common property access cases:
- For OP_get_field and OP_get_field2: Walk prototype chain directly
  for non-exotic objects with normal data properties
- For OP_put_field: Set property directly for writable data properties

Falls back to slow path for exotic objects and special property types.

Ref: bellard/quickjs@57f8ec0
Add inline fast paths for OP_post_inc and OP_post_dec when the
operand is an integer. Fall back to slow path for overflow cases
(INT32_MAX for increment, INT32_MIN for decrement) and non-integer
values.

Ref: bellard/quickjs@e5de89f
Make string_buffer_putc() an inline function with fast paths for common
cases:
- Direct write for characters < 0x10000 in wide mode
- Surrogate pair handling for characters >= 0x10000 with buffer space
- Direct write for 8-bit characters in narrow mode

Rename string_buffer_putc_slow to string_buffer_putc16_slow and add new
string_buffer_putc_slow for full Unicode handling.

Ref: bellard/quickjs@79f3ae2
Replace expensive prototype chain traversal with flag checking. Instead
of iterating through prototypes to verify no numeric properties exist,
the code now:

- Adds std_array_prototype field to JSContext to track whether
  Array.prototype is "normal" (no small index get/set properties)
- Adds is_prototype flag to JSObject to identify prototype objects
- Removes has_small_array_index from JSShape (now handled differently)
- Sets std_array_prototype = false when Array.prototype or
  Object.prototype is modified in relevant ways
- Uses the flag in JS_SetPropertyValue() and OP_put_array_el for fast
  path decisions

This trades one boolean flag check for iterating multiple prototype
objects during common array append operations.

Ref: bellard/quickjs@c8a8cf5
Add fast path for OP_get_length that directly accesses the length
property without calling JS_GetProperty. This mirrors the optimization
already done for OP_get_field and OP_get_field2.

When the object has a simple length property (not a getter/setter),
the value is retrieved directly by walking the prototype chain.

Ref: bellard/quickjs@3e5f2bb
Add fast path for push() on fast arrays that bypasses standard object
handling and directly manipulates the array's internal value store.

The optimization activates when:
- The array is a fast array with standard prototype
- The array is extensible
- The length property is an integer and writable
- The new length doesn't overflow

When conditions are met, elements are bulk-inserted directly into
the internal values array without property lookup overhead.

Ref: bellard/quickjs@9a421b3
Instead of using goto to jump to slow path on int32 overflow, directly
convert to float64 inline. This improves instruction cache locality
and reduces branching overhead.

The change affects:
- OP_add: inline float conversion on overflow
- OP_add_loc: inline float conversion on overflow
- OP_sub: inline float conversion on overflow

Ref: bellard/quickjs@3d0cc29
@saghul saghul force-pushed the port-perf branch 2 times, most recently from 154516f to 8177bd9 Compare January 12, 2026 23:04
- Rename dbuf_realloc to dbuf_claim with clearer semantics: allocate
  'len' more bytes relative to current size instead of absolute size
- Add overflow protection in dbuf_claim
- Change allocation growth from (size * 3 / 2) to (size + size / 2)
  with overflow checks
- Remove unused dbuf_write function
- Update all call sites across quickjs.c, libregexp.c, libunicode.c

Ref: bellard/quickjs@0d4cd2d
Optimize destructuring by avoiding the creation of reference objects
when there are no 'with' statements in the scope chain (which is always
the case in strict mode). This uses depth=0 for direct variable access
instead of depth=2 with reference creation.

Additional optimizations:
- has_with_scope() now skips checking in strict mode (no 'with' allowed)
- In non-strict mode, modifying a function name is now ignored
  (OP_scope_put_var with JS_VAR_FUNCTION_NAME emits OP_drop)

Note: This removes full compliance with the spec for lvalue resolution
when direct eval is present in compound assignments. V8 and other
engines behave the same way.

Ref: bellard/quickjs@e015918
@saghul saghul marked this pull request as ready for review January 13, 2026 05:50
@saghul saghul requested a review from bnoordhuis January 13, 2026 05:50
@saghul saghul force-pushed the port-perf branch 2 times, most recently from 39f0d99 to 3f6275f Compare January 13, 2026 14:12
@saghul
Copy link
Copy Markdown
Contributor Author

saghul commented Jan 13, 2026

@bnoordhuis This is now ready to review. Not sure how you want to go about it though, it grew quite a bit 😅

Replace is_local/is_arg boolean fields in JSClosureVar with a single
closure_type enum (JSClosureTypeEnum) that supports 8 distinct types:
- JS_CLOSURE_LOCAL: local variable in parent function
- JS_CLOSURE_ARG: argument variable in parent function
- JS_CLOSURE_REF: closure variable reference in parent function
- JS_CLOSURE_GLOBAL_REF: global variable reference
- JS_CLOSURE_GLOBAL_DECL: global variable declaration (eval)
- JS_CLOSURE_GLOBAL: global variable (eval)
- JS_CLOSURE_MODULE_DECL: module variable definition (eval)
- JS_CLOSURE_MODULE_IMPORT: module import definition (eval)

Ref: bellard/quickjs@a6816be
- Add pre-computed JSShape objects for arguments, mapped_arguments
- Use fast_array mode with var_refs for JS_CLASS_MAPPED_ARGUMENTS
- Arguments object elements now alias function parameters via JSVarRef
- Add js_mapped_arguments_finalizer and js_mapped_arguments_mark
- Modify JS_NewObjectFromShape to accept props parameter for initialization
- Add js_create_var_ref function for creating detached var_refs
- Add var_refs to GC immediately in get_var_ref (instead of at close time)

The mapped arguments optimization allows arguments[i] to directly reference
function parameters, enabling changes to propagate bidirectionally in
non-strict mode functions.

Ref: bellard/quickjs@9f11034
@saghul
Copy link
Copy Markdown
Contributor Author

saghul commented Jan 19, 2026

Gentle ping @bnoordhuis :-)

@bnoordhuis
Copy link
Copy Markdown
Contributor

Lo siento, señor Corretgé, no lo había olvidado pero no tengo mucho tiempo la semana pasada o hoy. ¡A lo mejor mañana!

(Sí, he practicando español.)

Copy link
Copy Markdown
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM:

  • I'm somewhat skeptical that all that open-coding of property lookups is beneficial. Did you check how it affects benchmarks and release build size?

  • That last commit optimizing arguments is only useful in sloppy mode, right? Doesn't seem very useful, is it to game some benchmarks that don't run in strict mode?

Comment thread quickjs.c
Comment on lines +10477 to +10483
/* fast path */
p = JS_VALUE_GET_OBJ(ctx->global_obj);
prs = find_own_property(&pr, p, prop);
if (prs) {
if (likely((prs->flags & JS_PROP_TMASK) == 0))
return js_dup(pr->u.value);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fast path is a deoptimization for things like getters on the global object, because they're looked up here but not acted on, then JS_GetPropertyInternal does the same lookup again.

(Also true for auto-init properties but that's a one-time cost so not so significant.)

Getting rid of two bytecode opcodes is nice though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this was more of an observation so I'm keeping it as-is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, a real-world example that's pessimized by this change is performance.now() as it's implemented in node or firefox, where performance is a getter on the global object.

The irony when a performance optimization slows down performance.now, right?

Comment thread quickjs.c Outdated

static JSValue JS_GetGlobalVar(JSContext *ctx, JSAtom prop,
bool throw_ref_error)
static inline JSValue JS_GetGlobalVar(JSContext *ctx, JSAtom prop,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • inline is currently probably a no-op; there's only one call site so if the compiler thinks it's eligible for inlining, it's going to do it anyway

  • if additional call sites arise, you probably don't want it inlined because of code bloat/duplication (not that a compiler is obliged to actually inline it, it's only a hint)

  • if the one call site for JS_GetGlobalVar were to get removed, inline stops the compiler from issuing an unused function warning

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread quickjs.c
Comment thread quickjs.c Outdated
Comment thread quickjs.c Outdated
Comment thread quickjs.c Outdated
@saghul
Copy link
Copy Markdown
Contributor Author

saghul commented Jan 24, 2026

Thanks for the review Ben! I'll take a look!

@gengjiawen
Copy link
Copy Markdown

great job, looks like the same

Benchmark (Higher scores are better) QuickJS-ng PR #1315 QuickJS Bellard (local build) V8 --jitless
Richards 908 838 1157
DeltaBlue 882 825 1084
Crypto 797 716 1304
RayTrace 1243 1202 4400
EarleyBoyer 1784 1723 6072
RegExp 211 343 4187
Splay 2480 2451 8285
NavierStokes 1202 1263 1995
Score 986 1010 2723

@saghul
Copy link
Copy Markdown
Contributor Author

saghul commented Jan 24, 2026

This is useful, thanks! How are you taking those, so I can test locally?

@gengjiawen
Copy link
Copy Markdown

gengjiawen commented Jan 24, 2026

This is useful, thanks! How are you taking those, so I can test locally?

https://github.com/gengjiawen/js-engines-playground/blob/038a911a228b1b96fa73eb92aec82a4701d3251a/benchmark/benchmark.js#L5 replace first two with local binary.

use this to get all mainstream engines npm i -g jsvu && yes | jsvu || true

@saghul
Copy link
Copy Markdown
Contributor Author

saghul commented Jan 29, 2026

LoL, I had forgotten I adapted your runner 2 years ago already :-) https://github.com/quickjs-ng/benchmarks?tab=readme-ov-file#v8

Results (best of 3) on my machine:

Benchmark (Higher scores are better) NG (master) NG (perf) bellard
Richards 1051 1125 1040
DeltaBlue 1048 1116 1096
Crypto 944 1009 894
RayTrace 1436 1882 1784
EarleyBoyer 2344 2448 2353
RegExp 280 278 455
Splay 3154 3142 3061
NavierStokes 1531 1565 1591
Score 1215 1296 1325

I'm going to address Ben's points now and check the results.

@saghul
Copy link
Copy Markdown
Contributor Author

saghul commented Jan 29, 2026

  • I'm somewhat skeptical that all that open-coding of property lookups is beneficial. Did you check how it affects benchmarks and release build size?

Benchmarks are up (compounded as a whole) and the binary size if up by ~30KB, so I think we're good.

  • That last commit optimizing arguments is only useful in sloppy mode, right? Doesn't seem very useful, is it to game some benchmarks that don't run in strict mode?

The commit did a few things, one is that, another is to make the creation of arguments faster, AFAICT.

@saghul
Copy link
Copy Markdown
Contributor Author

saghul commented Jan 29, 2026

Benchmarks after:

Benchmark (Higher scores are better) NG (master) NG (perf) NG (perf2) bellard
Richards 1076 1154 1152 1063
DeltaBlue 1043 1118 1115 1081
Crypto 938 1001 1002 894
RayTrace 1395 1866 1872 1736
EarleyBoyer 2248 2365 2373 2254
RegExp 273 270 270 446
Splay 3154 3121 3105 2887
NavierStokes 1525 1555 1554 1574
Score 1201 1286 1285 1300

Within the noise margin (my system is not necessarily iddle :-)).

I'm merging after the CI is 💚 , assuming you agree here @bnoordhuis since you approved.

@saghul saghul merged commit b66d10c into master Jan 29, 2026
131 of 132 checks passed
@saghul saghul deleted the port-perf branch January 29, 2026 10:31
@gengjiawen
Copy link
Copy Markdown

Benchmarks are up (compounded as a whole) and the binary size if up by ~30KB, so I think we're good.

30KB is actually a lot for embedded system, can this be a optional ?

@saghul
Copy link
Copy Markdown
Contributor Author

saghul commented Feb 1, 2026

Not really. It's either we unroll or we don't.

You are welcome to try and undo the parts that Ben mentioned to see if you get the same numbers.

In the grand scheme of things, the engine will get bigger and new JS features are implemented. For such tight constraints microquickjs might be a better fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants