Skip to content

Encoder indent suppression: str.replace → re.sub (fixes #170)#171

Merged
dave-doty merged 1 commit intoUC-Davis-molecular-computing:devfrom
cgevans:speedup-encoder
Mar 17, 2021
Merged

Encoder indent suppression: str.replace → re.sub (fixes #170)#171
dave-doty merged 1 commit intoUC-Davis-molecular-computing:devfrom
cgevans:speedup-encoder

Conversation

@cgevans
Copy link
Copy Markdown
Contributor

@cgevans cgevans commented Mar 16, 2021

Using re.sub (re is already imported), along with a replacement function that pulls from the replacement map, avoids repeated string replacements within Python and significantly speeds up encoding for large files. Note that I think this will raise a KeyError if the user has something with a name fitting "@@(\d+)@@" where the number is larger than the largest unique_id, and will corrupt the file if it is not. However, this is similar to the current situation, where I think it will never raise an error, but could corrupt the file.

This also changes the type of _replacement_map to avoid a mypy warning: it should be a correct narrowing of the value from Any to str, however.

Trying this with the design I'm trying to repeatedly save, the change brings write time down from 20 seconds to a little over 1 second.

…cular-computing#170)

Using re.sub (re is already imported), along with a replacement
function that pulls from the replacement map, avoids repeated string
replacements within Python and significantly speeds up encoding for
large files.  Note that I think this will raise a KeyError if the user has
something with a name fitting "@@(\d+)@@" where the number is larger
than the largest unique_id, and will corrupt the file if it is not.
However, this is similar to the current situation.
@cgevans cgevans requested a review from dave-doty as a code owner March 16, 2021 23:59
Copy link
Copy Markdown
Member

@dave-doty dave-doty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what's going on in this new line of code, but it doesn't seem to break any unit tests, so I'll take you word that it works.

@dave-doty dave-doty merged commit 5b73c2f into UC-Davis-molecular-computing:dev Mar 17, 2021
@cgevans cgevans deleted the speedup-encoder branch March 17, 2021 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants