Allow to, optionally, keep Unicode escape sequences in stringToPDFString (PR 17331 follow-up)#19884
Conversation
Whenever we cannot find a destination we'll fallback to checking all destinations, to account for e.g. out-of-order NameTrees, and in those cases any subsequent destination-lookups can be made a tiny bit more efficient by immediately checking the already cached destinations.
|
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/c3901ba98dd851b/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/835a3cbad009c87/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/c3901ba98dd851b/output.txt Total script time: 29.68 mins
|
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/835a3cbad009c87/output.txt Total script time: 60.27 mins
|
…ring` (PR 17331 follow-up) Currently *some* of the links[1] on page three of the `issue19835.pdf` test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty. The reason is that these destinations include the character `\x1b`, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section [7.9.2.2 Text String Type](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1957385) in the PDF specification. Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are: - Parsing named destinations[2] and URLs. - Handling "strings" that are actually /Name-instances. - Building a lookup Object/Map based on some PDF data-structure. *NOTE:* The issue that prompted this patch is obviously related to destinations, however I've gone through the `src/core/` folder and updated various other `stringToPDFString` call-sites that (directly or indirectly) fit the categories listed above. --- [1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27". [2] Unfortunately just skipping `stringToPDFString` in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues 14847 and 14864.
d763b3d to
b629baf
Compare
|
/botio unittest |
From: Bot.io (Linux m4)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/468d84bcae5e873/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/17c380f3774666e/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/468d84bcae5e873/output.txt Total script time: 2.41 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/17c380f3774666e/output.txt Total script time: 8.19 mins
|
Currently some of the links[1] on page three of the
issue19835.pdftest-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty.The reason is that these destinations include the character
\x1b, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section 7.9.2.2 Text String Type in the PDF specification.Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are:
NOTE: The issue that prompted this patch is obviously related to destinations, however I've gone through the
src/core/folder and updated various otherstringToPDFStringcall-sites that (directly or indirectly) fit the categories listed above.[1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27".
[2] Unfortunately just skipping
stringToPDFStringin this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues #14847 and #14864.