Aho-Corasick string replacements to help clean up replaceText spam#82
Merged
Cyberboss merged 1 commit intotgstation:masterfrom Dec 24, 2021
Merged
Conversation
Contributor
Author
|
updated benchmarks, the old code was running the setup every time... and it was still faster |
Mothblocks
requested changes
Dec 24, 2021
|
|
||
| [features] | ||
| default = ["cellularnoise", "dmi", "file", "git", "http", "json", "log", "noise", "sql", "time", "toml", "url"] | ||
| default = ["cellularnoise", "dmi", "file", "git", "http", "json", "log", "noise", "sql", "time", "toml", "url", "acreplace"] |
| time = [] | ||
| toml = ["serde", "serde_json", "toml-dep"] | ||
| url = ["url-dep", "percent-encoding"] | ||
| acreplace = ["aho-corasick"] |
| #[cfg(feature = "worleynoise")] | ||
| pub mod worleynoise; | ||
| #[cfg(feature = "acreplace")] | ||
| pub mod acreplace; |
Comment on lines
+32
to
+35
| match map.get(&key.to_owned()) { | ||
| Some(replacements) => Some(replacements.automaton.replace_all(text, &replacements.replacements)), | ||
| None => None | ||
| } |
Member
There was a problem hiding this comment.
Suggested change
| match map.get(&key.to_owned()) { | |
| Some(replacements) => Some(replacements.automaton.replace_all(text, &replacements.replacements)), | |
| None => None | |
| } | |
| let replacements = map.get(&key.to_owned())?; | |
| Some(replacements.automaton.replace_all(text, &replacements.replacements)) |
|
|
||
| struct Replacements { | ||
| pub automaton: AhoCorasick, | ||
| pub replacements: Vec<String> |
Member
There was a problem hiding this comment.
Suggested change
| pub replacements: Vec<String> | |
| pub replacements: Vec<String>, |
Rustfmt should catch these, but I don't think we currently CI enforce it (though I should some day)
Comment on lines
+17
to
+19
| byond_fn! { setup_acreplace(key, patternsjson, replacementsjson) { | ||
| let patterns: Vec<String> = serde_json::from_str(patternsjson.clone()).ok()?; | ||
| let replacements: Vec<String> = serde_json::from_str(replacementsjson.clone()).ok()?; |
Member
There was a problem hiding this comment.
Suggested change
| byond_fn! { setup_acreplace(key, patternsjson, replacementsjson) { | |
| let patterns: Vec<String> = serde_json::from_str(patternsjson.clone()).ok()?; | |
| let replacements: Vec<String> = serde_json::from_str(replacementsjson.clone()).ok()?; | |
| byond_fn! { setup_acreplace(key, patterns_json, replacements_json) { | |
| let patterns: Vec<String> = serde_json::from_str(patterns_json.clone()).ok()?; | |
| let replacements: Vec<String> = serde_json::from_str(replacements_json.clone()).ok()?; |
On that note, why are you using .clone() instead of just &?
Member
|
WHY :NOOO: |
Member
|
You also did not put it in the README |
Merged
Crossedfall
pushed a commit
to BeeStation/rust-g
that referenced
this pull request
Mar 11, 2023
* Updates the byond testing version to tgstation ver (tgstation#66) * runs cargo fmt (tgstation#63) * Sets the debug and test linux build to use all features (tgstation#67) Problem: PRs and all features aren't properly running created unit tests. Nor are they actually being fully compiled (only checked) on Linux. Solution: Only have the release builds that are distributed as artifacts run with the default features. This way, tests can cover all features. * Base64 encoding (tgstation#65) * Adds base64 encoding * Adds tests * Commented out import * Fixes the naming of variables * ...of course it was a typo * Update Cargo.toml Co-authored-by: William Wallace <me@wiox.me> * Clippy tweaks (tgstation#62) * this `else { if .. }` block can be collapsed * question mark operator is useless here * this call to `from_str_radix` can be replaced with a call to `str::parse` * equality checks against true are unnecessary * Revert "question mark operator is useless here" This reverts commit 01272c5. * Cargo update (tgstation#64) * updates cargo dependencies versions to latest * Fixes perlin noise generation to use an inclusive range * Update src/cellularnoise.rs Co-authored-by: William Wallace <me@wiox.me> * Updates cargo.lock Co-authored-by: William Wallace <me@wiox.me> * Basic version getter (tgstation#68) * Bump Version sto v0.4.8 * Setup cross compiling using the cross project (tgstation#70) * Make the url feature default (tgstation#71) Co-authored-by: Jared-Fogle <35135081+Jared-Fogle@users.noreply.github.com> * 0.4.9 * Runs cargo update (tgstation#72) most importantly git2, which made it error when trying to build on linux with the latest rust version * Stringify url_encode and decode input before sending it to Rust (tgstation#73) Co-authored-by: Jordan Brown <Cyberboss@users.noreply.github.com> * 0.4.10 * Add worley noise optional feature (tgstation#74) * worley_noise * a simple fix, forgot to ad dmsort as a necessary package * replaces .unwrap() with ? * forgot to save :cryingintoheavens: * IM SORRY I CANT USE ? IN CLOSURES * Completely reworks how the code works, massively optimizing it. Generation lost a tiny bit of it's fidelity but oh well, looks fantastic rn * one last change * tfw errors * makes the noise generate pretties noise * cargo fmt * Add TOML feature (tgstation#75) Adds rustg_read_toml_file, which takes a filepath and spits out the output after decoding (a list, probably). Uses the rust-toml lib * 0.5.0 Cargo updates * Adds TOTP generator to the hash feature (tgstation#76) * Adds TOTP generator to the hash feature * Updated TOTP based on requested changes, added a function that allow specification of a tolerance, also updated the hash.dm file to reflect the new functions * Remove debug print, convert offset output to be JSON, added a test * Added comments * Added some error handling * Improves error handling again * cargo fmt hash.rs * separate mod block for hash tests * Functions for more precise time measurement (tgstation#77) * fix linux ci by removing pkg-config requirement This isnt needed * Aho-Corasick string replacements to help clean up replaceText spam (tgstation#82) * Cleanups and additions for the Aho-Corasick replacement stuff (tgstation#83) * Fix clippy lints, Rustfmt, put both on CI, and update Cargo.lock (tgstation#85) Co-authored-by: Mothblocks <35135081+Jared-Fogle@users.noreply.github.com> * Adds Redis Pub/Sub integration (tgstation#80) dds the ability to connect to Redis, subscribe to messages and publish them as well. See here for example usage. The API, prefixed with rustg_redis_: connect(addr) - Connects to a Redis instance using the given address, for example redis://127.0.0.1/. Returns an empty string on success, returns the error otherwise. disconnect() - Closes the connection to Redis and stops the thread managing it. Call this before restarting or attempting to reconnect after an error. subscribe(channel) - Subscribes to a given channel and starts receiving messages from it. get_messages() - Returns all received messages as a JSON string, in the format of {"channel_1": ["msg1", "msg2", "msg3", ...], "channel_2": ["msg1", "msg2", "msg3", ...]}. Also includes errors, which appear on the channel "RUSTG_REDIS_ERROR_CHANNEL". publish(channel, message) - Publishes a message on the given channel. Remember to check the error channel every time you call get_messages(). If any occur, you need to call disconnect(), then connect() again, then resubscribe to desired channels. * fixes warning in redis (tgstation#88) * Fix clippy CI, format byond_fn (tgstation#89) * Adds alphabetical order tests for README.md, lib.rs and Cargo.toml (tgstation#84) Adds alphabetical order tests for README.md, lib.rs and Cargo.toml tgstation#84 Also renamed "non-default features" to "additional features" in Cargo.toml for consistency with readme updated the cargo.lock and ran rustfmt * Bump to 0.6.0 (tgstation#90) * Fixes redis dmsrc (tgstation#91) * Cleaner and saner http request parsing. (tgstation#93) * Add functions for seeking lines and getting line counts in functions (tgstation#95) dds rustg_file_get_line_count for getting the line count of a given filename, and rustg_file_seek_line for getting a specific line. These are useful as to not load the entire file into memory in DM, which is useful for very large files like dictionaries. * 0.7.0 (tgstation#96) * Adds a new type of noise: Discrete Batched Perlin-like Noise (DBP) (tgstation#99) * Updates WorleyNoise to be multi-threaded and faster (tgstation#102) Another noise algorithm updated in my streak, this time it's worleynoise, since i didn't know jack shit and i heavily abused Rc<> to acomplish my goals it was really shit. Now it is multi-threaded, blazing fast and really sexy lookin. NOTE: this update changes ABI of worley noise slightly. * Updates CellularNoise to be faster and multi-threaded (tgstation#101) Hello, I guess i'm on a streak. I reimplemented CAnoise to be multi-threaded and much faster as i removed a considerable amount of branches. No change to behaviour or ABI noted, it's just faster now. * toml2json now returns errors to byond (tgstation#98) Co-authored-by: ZeWaka <zewakagamer@gmail.com> * Updates cargo packages (tgstation#104) * 0.8.0 * Tiny README update * Small improvements for before 1.0 (tgstation#105) * 1.0.0 (tgstation#106) * Pallette Encoding Image Fix (tgstation#107) * Uploads rust_g.dm as an Actions Artifact (tgstation#108) * 1.0.1 * add note to README about not using the .so * Returns the TOML library to previous functionality (tgstation#112) * 1.0.2 * Clippy - unneeded returns (tgstation#114) * Add an a* pathfinder (tgstation#113) Returns the shortest path in a static node map. That is made for TGMC wich uses manually placed nodes for pathfinding. Benchmark : image That's the average number of path computed in one second, out of 30 runs with random nodes. Size of the node map is roughly 800 On average 32 times faster than tgmc a* implementation. Another possible comparison is with TG's JPS system, and according to the benchmark done here it's 7000 times faster (tgstation/tgstation#56780). Not exactly the same use cases though * Add rustg_toml_encode (tgstation#116) Add rustg_toml_encode * 1.1.0 * Fixes cellular noise not working for diff height and width values (tgstation#117) * Update README.md update readme with `pathfinder` feature and rust-analyzer note, re: tgstation#115 * Fix the `pathfinder` feature's loc in the README * Package debug symbols with rust-g (tgstation#119) * 515 Support (tgstation#121) * add a unix_timestamp extern (tgstation#122) Co-authored-by: Mothblocks <35135081+Mothblocks@users.noreply.github.com> * 1.2.0 * Remove useless features from the image crate (tgstation#128) * lmao we don't need jpeg dependencies for image * update `noise` to remove the rest of the junk deps * small clippy fixes (cherry picked from commit 65cf48a) * Update rust.yml * adds all-features flag to linux and windows build * reverts actions to tg's current * Update rust.yml * Update rust.yml * Update rust.yml * Update rust.yml --------- Co-authored-by: ZeWaka <zewakagamer@gmail.com> Co-authored-by: William Wallace <me@wiox.me> Co-authored-by: oranges <email@oranges.net.nz> Co-authored-by: Mothblocks <35135081+Mothblocks@users.noreply.github.com> Co-authored-by: Jared-Fogle <35135081+Jared-Fogle@users.noreply.github.com> Co-authored-by: Mark Suckerberg <29362068+MarkSuckerberg@users.noreply.github.com> Co-authored-by: Gamer025 <33846895+Gamer025@users.noreply.github.com> Co-authored-by: Jordan Brown <Cyberboss@users.noreply.github.com> Co-authored-by: EdgeLordExe <42111655+EdgeLordExe@users.noreply.github.com> Co-authored-by: adamsong <adamsong@users.noreply.github.com> Co-authored-by: pali <6pali6@gmail.com> Co-authored-by: AffectedArc07 <25063394+AffectedArc07@users.noreply.github.com> Co-authored-by: vuonojenmustaturska <naksu@youzen.ext.b2.fi> Co-authored-by: MCHSL <56649176+MCHSL@users.noreply.github.com> Co-authored-by: AnturK <AnturK@users.noreply.github.com> Co-authored-by: tralezab <40974010+tralezab@users.noreply.github.com> Co-authored-by: san7890 <the@san7890.com> Co-authored-by: BraveMole <bsouchu@gmail.com> Co-authored-by: Zephyr <12817816+ZephyrTFA@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Quite a few codebases have procs that look like 20 lines of
message = replacetext(message, " foo ", " "), with this handy trick you can make that code look likeThe aho-corasick crate only adds
memchras a dependency, so the feature should be fairly slim and I feel its inclusion in the default features is justified.The underlying crate is very powerful and can provide both case-sensitive and insensitive replacing as well as matching with multiple different match semantics. Only the bare minimum for what I needed is implemented. The good stuff is at https://docs.rs/aho-corasick/latest/aho_corasick/struct.AhoCorasickBuilder.html
Replacing the constant parts of goon's shakespearify proc with this, I was able to benchmark the following:
{ "name": "/proc/shakespearify", "self": 2.319, "total": 2.319, "real": 2.322, "over": 2.297, "calls": 5000 }, { "name": "/proc/shakespearify_new", "self": 0.356, "total": 0.356, "real": 0.358, "over": 0.353, "calls": 5000 },