-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Optimise xplat UtcTimeFromStrCore #4035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise xplat UtcTimeFromStrCore #4035
Conversation
|
@obastemur Not the prettiest but it does the job. With the micro benchmark from #4008 without this fix average time was around 380ms - with this fix 48ms. (testing on a 2.5ghz macbook pro intel i7 running macos Sierra) |
Penguinwizzard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things could be a little clearer.
| //only targets the UTF16 Latin alphabet - but that's all we need here | ||
| //used instead of the PAL _wcslwr_s which takes ~100x as long to run | ||
| int i; | ||
| for (i = 0 ; pszSrc[i] != 0; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have a length here, I'd prefer it were used rather than the null terminator.
| int i; | ||
| for (i = 0 ; pszSrc[i] != 0; i++) | ||
| { | ||
| if(pszSrc[i] > 64 && pszSrc[i] < 91) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to >= 'A' and <= 'Z', as that's a little clearer?
| { | ||
| if(pszSrc[i] > 64 && pszSrc[i] < 91) | ||
| { | ||
| pszSrc[i] += 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+= ('a' - 'A')
| size_t size = sizeof(char16) * (ulength + 1); | ||
| js_memcpy_s(pszSrc, size, psz, size); | ||
|
|
||
| #ifdef _WIN32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessarily platform-dependent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Native windows _wcslwr_s is already fast - whilst this alternate path would work on windows it wouldn't be an optimisation there.
I'll look at putting it into PAL wcslwr as suggested by obastemur below - just will have to add some extra checks to only put it in when wanted.
|
@rhuanjl Thanks for the effort! In order this to be merged; 1- Please make the changes under PAL wcslwr... |
|
FYI; |
|
@Penguinwizzard I've revised in line with obastemur's instruction to do the change within PAL_towlower in wchar.cpp. Couple of things I'm unsure on:
|
…s a significant optimisation for xplat in DateImplementation::UtcTimeFromStrCore relevant microbenchmark from ~350ms -> ~50ms.
|
You're fine leaving the code where it is for now; we will most likely want to revise _wsclwr_s separately based on 1. and 2. |
|
@MSLaguana: Can you pick this up for node-cc perf testing and check-in? |
|
Change looks good to me and passes tests; thanks for your contribution @rhuanjl ! |
edit: This description (other than this first paragraph) is out of date - see comments below, this is now an update to PAL_towlower to give it a fast path for ascii characters. Key outcome is the ~70% performance gain for DateImplementation::UtcTimeFromStrCore on xplat.
Added a fast inline conversion from upper case to lowercase for xplat in DateImplementation::UtcTimeFromStrCore
Note this method will not work for non-latin alphabet - but the function only uses the latin alphabet and this is ~100x faster than the PAL _wcslwr_s.
SIDENOTE: I did not revise _wcslwr_s as it appears to be used nowhere else in the source tree - I note that if it is important for any other applications in any performance critical path it should be revised, it uses a number of malloc operations equal to the length of the string + 1.
Resolves #4008