Skip to content

Conversation

@rhuanjl
Copy link
Collaborator

@rhuanjl rhuanjl commented Oct 21, 2017

edit: This description (other than this first paragraph) is out of date - see comments below, this is now an update to PAL_towlower to give it a fast path for ascii characters. Key outcome is the ~70% performance gain for DateImplementation::UtcTimeFromStrCore on xplat.

Added a fast inline conversion from upper case to lowercase for xplat in DateImplementation::UtcTimeFromStrCore

Note this method will not work for non-latin alphabet - but the function only uses the latin alphabet and this is ~100x faster than the PAL _wcslwr_s.

SIDENOTE: I did not revise _wcslwr_s as it appears to be used nowhere else in the source tree - I note that if it is important for any other applications in any performance critical path it should be revised, it uses a number of malloc operations equal to the length of the string + 1.

Resolves #4008

@rhuanjl
Copy link
Collaborator Author

rhuanjl commented Oct 21, 2017

@obastemur Not the prettiest but it does the job.

With the micro benchmark from #4008 without this fix average time was around 380ms - with this fix 48ms. (testing on a 2.5ghz macbook pro intel i7 running macos Sierra)

Copy link
Contributor

@Penguinwizzard Penguinwizzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple things could be a little clearer.

//only targets the UTF16 Latin alphabet - but that's all we need here
//used instead of the PAL _wcslwr_s which takes ~100x as long to run
int i;
for (i = 0 ; pszSrc[i] != 0; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have a length here, I'd prefer it were used rather than the null terminator.

int i;
for (i = 0 ; pszSrc[i] != 0; i++)
{
if(pszSrc[i] > 64 && pszSrc[i] < 91)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to >= 'A' and <= 'Z', as that's a little clearer?

{
if(pszSrc[i] > 64 && pszSrc[i] < 91)
{
pszSrc[i] += 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+= ('a' - 'A')

size_t size = sizeof(char16) * (ulength + 1);
js_memcpy_s(pszSrc, size, psz, size);

#ifdef _WIN32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessarily platform-dependent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native windows _wcslwr_s is already fast - whilst this alternate path would work on windows it wouldn't be an optimisation there.

I'll look at putting it into PAL wcslwr as suggested by obastemur below - just will have to add some extra checks to only put it in when wanted.

@obastemur
Copy link
Collaborator

@rhuanjl Thanks for the effort! In order this to be merged;

1- Please make the changes under PAL wcslwr...
2- your changes apply only when the char16 data is ASCII. Otherwise we use the existing path.

@obastemur
Copy link
Collaborator

FYI; PAL_towlower is the exact place for these changes.

@rhuanjl
Copy link
Collaborator Author

rhuanjl commented Oct 22, 2017

@Penguinwizzard I've revised in line with obastemur's instruction to do the change within PAL_towlower in wchar.cpp.

Couple of things I'm unsure on:

  1. Your point on doing a loop using the length; following my revision this is now using the existing loop in the function _wcslwr which doesn't receive the length parameter - I could revise to take the loop to the the function _wcslwr_s which does have the length parameter (though would require moving it to wchar.cpp or making wtolower visible in palrt.h)

  2. the wrapper of _wcslwr_s (within palrt.h) I can't work out what this is doing of any use - it copies the full buffer, sends the copy to be edited by _wcslwr then copies the result back - the overhead of these two copies and the associated malloc and free add a detectable amount of time at execution and I can't work out the benefit - if it's meant to be to avoid a buffer overflow the key check I'd want would be for the existence of a null terminator (or just iterate vs the length instead) I can't see how copying it twice helps - left in for now as I assume there is a purpose to the action that I'm just not aware of.

…s a significant optimisation for xplat in DateImplementation::UtcTimeFromStrCore relevant microbenchmark from ~350ms -> ~50ms.
@Penguinwizzard
Copy link
Contributor

You're fine leaving the code where it is for now; we will most likely want to revise _wsclwr_s separately based on 1. and 2.

@Penguinwizzard
Copy link
Contributor

@MSLaguana: Can you pick this up for node-cc perf testing and check-in?

@MSLaguana
Copy link
Contributor

Change looks good to me and passes tests; thanks for your contribution @rhuanjl !

@chakrabot chakrabot merged commit 696b4d8 into chakra-core:release/1.7 Oct 25, 2017
chakrabot pushed a commit that referenced this pull request Oct 25, 2017
Merge pull request #4035 from rhuanjl:fastXplatDateFromString

Added a fast path for standard ascii characters to PAL_tolower

Resolves #4008
chakrabot pushed a commit that referenced this pull request Oct 26, 2017
Merge pull request #4035 from rhuanjl:fastXplatDateFromString

Added a fast path for standard ascii characters to PAL_tolower

Resolves #4008
@rhuanjl rhuanjl deleted the fastXplatDateFromString branch October 26, 2017 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants