Integrate optional speed and memory improvements by token merging (via dbolya/tomesd)#9256
Conversation
|
amazing one thanks. can i somehow subscribe this and get notified when merged? |
|
the ratio thingy should be in prompt itself not as command arg, restarting ui to change the ratio will be unconvenient |
|
When I tried it (with the extension, not this PR) I couldn't use LORAs with it enabled. Does this PR fix that bug? |
I will see what everyone thinks. There are also other options from tomesd that could be included. I think it probably fits best as a separate set of options within the settings tab. Changing options for tomesd isn't an instant change like with say the batch size slider.
It's in the ballpark. According to table 4 in the paper that I linked, the author achieves a ~46% speedup using a ratio of 0.5. The effect is dependent upon the amount of tokens used, and it is noted that it is less effective on smaller resolutions in section 5. Not knowing your exact generation settings, I think that your speedup probably does fall within expectations. |
Is this the extension you're referring to? I hadn't seen this extension before making this PR but, at least for the loras I have tried, they seem to load like normal. |
|
with an AMD RX 6900 XT, torch: 2.1.0.dev20230317+rocm5.4.2, using --opt-sdp-attention, generating 768x768 image, euler a, 20 steps I went from 3.34 it/s to 5.44it/s Loras seem to work fine, vram usage seems halved so I can do 1024x1024 without OOM issues now. |
|
Would it be better to use |
Yes, I will change that. |
It probably can't be part of the prompt because the logic is applied during model loading. However, personally I think it should be a setting rather than a command arg so that like @2blackbar said, we can change it without restarting. Thoughts? |
I think I will probably add it to the settings area and have the ratio setting there just override the |
i'd actually suggest the opposite - cmdline flags typically override settings. so use settings as default and override from cmdline (although i don't think cmdline is necessary at all if we have settings). |
Sorry, I worded that poorly. I meant that I will override the default ratio setting of 0.5. |
|
I commented in the discussion thread, but reposting it here for visibility: Hi, original author here. Some quick suggestions:
Note that the tome patch can be applied right before generation just fine, it doesn't need to be on model load. Applying the patch is free (it just sets some class variables) and it can be applied any number of times to the same model without adverse effects (e.g., to change the parameters). The way I implemented it for testing was in the txt2img processing function, right before sampling. I think the best way to implement it would be to add one of those boxes under the generation parameters, like the controlnet plugin. |
Tome increases sampling speed at the expense of quality.
I haven't used them before, but another option might be to use a code component? I think if we put an 'advanced tweaking' toggle in the token merging settings section for it, that might be a good compromise. It would clean things so that only the ratio slider would need to be primarily displayed. Then, the box could be toggled visible for those that want it to be. For those that do want to tinker with tomesd, it offers them the full palette of options and should transparently allow them to tinker with any options you add upstream in the future. |
Hmm that could work. Another option is if I add a "settings string" option to the For instance, the default settings would be: |
|
what's the reason for installer from also, why note "compatible with xformers"? note something that is not working as expected, not general positives. |
Will change.
I think for such general speedups users expect that they won't stack. Swapping from args to settings though, it could make more sense.
I think @dbolya mentioned it being compatible in the paper. |
Yeah, just tested by upgrading to torch 2.0 for the webui and using the option |
|
Suggestion: add options to Settings tab so that user can tune the ratio on the fly |
|
Ok i had to disable it, for now it craps on controlnet , you cant generate it , errors out and some other extensions, i suspect when its gonna be merged, lotsa people will complain |
I don't know this for certain, but I'm going to go forward right now with the assumption that AUTO would not want the main enable/disable toggle in the txt2img/img2img panes themselves so I will put it in the settings section. |
I knew about that issue when making the PR. A lot of users still don't use controlnet. So, I think just adding a note about it being incompatible next to the toggle will be good enough for now. |
|
I think maybe we should mark this PR as Draft until we get more opinions on the various points above. |
|
@asagi4 Should be working now. (you'll have to swap to the dev branch to test it though) |
|
I installed the branch and am testing it on my M1 8 GB RAM MacBook. Made a few simple images, but don't see much difference in performance. Running I turned on most checkboxes without knowing what they're doing just to see if I can see a difference. How do I know tomesd is applied and that it works? |
You should see increased performance, but particularly with larger high-res pass scale. There should also be info added to the image indicating it was used as well. You can turn off "apply only to hr pass" and that will get you more performance by applying to the first pass also. However, the speed-up would be negligible. |
|
Just wondering, now that auto merged dev into master, how I would be able to use this PR on master instead of dev? |
|
@papuSpartan Thanks! @Panchvzluck Looking at https://github.com/papuSpartan/stable-diffusion-webui/tree/tomesd, it seems the branch is being updated with updates from upstream, last time yesterday. |
|
My generic task of making a 512x768 image and upscaling it to 2x goes from 18.5 seconds to 14-15. I played with settings and it seems only the main slider really has effect on performance - All those settings can be useful for exploring the method, and they would be appropriate for an extension, but for the main repo I would like to have a simple configuration. There can be a single slider for token merging from 0 to 0.9, set to 0 by default. And another slider for hires fix, from 0 to 0.9, also at 0 by default. 0 disables token merging. No comandline settings. |
|
From the commits it looks like you reworked it the way I asked. If you don't plan to add more and don't have any complaints about reduction in number of options do write back and I'll merge it into dev. |
|
@AUTOMATIC1111 Go ahead and merge this much into dev. |
|
There were 3 commits made at dbolya/tomesd after the 0.1.2 release, to improve support for mps and "fix rand generator". Should a request be made for a new tomesd release at this time, to support this merge? |
|
@jrittvo The version of the package is specified loosely as |
Auto already changed it to an explicit == 0.1.2. It will need to be changed to 0.1.3 as @jrittvo suggested because of some issues MPS users were having. |
* Autofix Ruff W (not W605) (mostly whitespace) * Make live previews use JPEG only when the image is lorge enough * Bump versions to avoid downgrading them * fix --data-dir for COMMANDLINE_ARGS move reading of COMMANDLINE_ARGS into paths_internal.py so --data-dir can be properly read * Set PyTorch version to 2.0.1 for macOS * launch.py: make git_tag() and commit_hash() work even when WEBUI_LAUNCH_LIVE_OUTPUT * Get rid of check_run + run_python * Redirect Gradio phone home request This request is sent regardless of Gradio analytics being enabled or not via the env var. Idea from text-generation-webui. * Define default fonts for Gradio theme Allows web UI to (almost) be ran fully offline. The web UI will hang on load if offline when these fonts are not manually defined, as it will attempt (and fail) to pull from Google Fonts. * update changelog for release * fix broken prompts from file * update changelog for release * Wait for DOMContentLoaded until checking whether localization should be disabled Refs AUTOMATIC1111#9955 (comment) * Requested changes * minor fix * remove command line option * Allow bf16 in safe unpickler * heavily simplify * move to stable-diffusion tab * fix for img2img * Move localization to its own script block and load it first * Make dump translations work again * add an option to always refer to lora by filenames never refer to lora by an alias if multiple loras have same alias or the alias is called none * fix upscalers disappearing after the user reloads UI * Merge pull request AUTOMATIC1111#10339 from catboxanon/bf16 Allow bf16 in safe unpickler * Merge pull request AUTOMATIC1111#10324 from catboxanon/offline Allow web UI to be ran fully offline * update readme * Merge pull request AUTOMATIC1111#10335 from akx/l10n-dis-take-2 Localization fixes * update readme * Add/modify CFG callbacks Required by self-attn guidance extension https://github.com/ashen-sensored/sd_webui_SAG * Update script_callbacks.py * remove output_altered flag from AfterCFGCallbackParams * updates for AUTOMATIC1111#9256 * Add Tiny AE live preview * Add live preview mode check * set up a system to provide extra info for settings elements in python rather than js add a bit of spacing/styling to settings elements add link info for token merging * allow jpeg for extra network preview * add a bunch of descriptions and reword a lot of settings (sorry, localizers) * remove auto live previews format option, fix slow PNG generation * fix model loading twice in some situations * add a possible fix for 'LatentDiffusion' object has no attribute 'lora_layer_mapping' * update readme for release * Add GPU device Add GPU option to troubleshoot. * Downgrade Gradio * Modify pytorch command * Update bug_report.yml * fix xyz checkpoint * launch.py: Don't involve shell for running Python or Git for output Fixes Linux regression in 451d255 * Revert Gradio version * Change to extra-index-url * Minor changes * Fix extra networks save preview image geninfo * Add Python version Many users still use unverified versions of Python and file version-specific issues, often without mentioning version information, making troubleshooting difficult. * xyz token merging * Show "Loading..." for extra networks when displaying for the first time * load extensions' git metadata in parallel to loading the main program to save a ton of time during startup * update extensions table: show branch, show date in separate column, and show version from tags if available * add visual progress for extension installation from URL * suppress ENSD infotext for samplers that don't use it * add second_order field to sampler config * restore nqsp in footer that was lost during linting * add info link for Negative Guidance minimum sigma * Minor change * Use ngrok-py library * Remove max width for model dropdown Removing the max width for the model dropdown allows the user to see the full name of a model especially when it is long. Model names are getting more complex and longer and the current width almost always cuts off model names. If a user leverages folders than it pretty much always cuts off the name... * Fix remove `textual inversion` prompt * not clear checkpoints cache when config changes * add credits * return live preview defaults to how they were only download TAESD model when it's needed return calculations in single_sample_to_image to just if/elif/elif blocks keep taesd model in its own directory * Replace state.need_restart with state.server_command + replace poll loop with signal * Add option for /_stop route (for graceful shutdown) * Copy s_min_uncond to Processed Should fix AUTOMATIC1111#10416 * Bump pytorch for AMD Users So apparently it works now? Before you would get "Pytorch cant use the GPU" but not anymore. * Fixing webui.sh If only i proofread what i wrote * TAESD fix * simplify single_sample_to_image * Add basic ESLint configuration for formatting This doesn't enable any of ESLint's actual possible-issue linting, but just style normalization based on the Prettier configuration (but without line length limits). * Add ESLint to CI * Run `eslint --fix` (and normalize tabs to spaces) * fix inability to run with --freeze-settings * Fixed: AUTOMATIC1111#10460 * use a local variable instead of dictionary entry for sd_merge_models in merge model metadata code * when adding tooltips, do not scan whole document and instead only scan added elements * Fix typo in syntax * move some settings to the new Optimization page add slider for token merging for img2img rework StableDiffusionProcessing to have the token_merging_ratio field fix a bug with applying png optimizations for live previews when they shouldn't be applied * isn't there something you forgot, AUTOMATIC1111#10483? * use a single function for saving images with metadata both in extra networks and main mode for AUTOMATIC1111#10395 * add options to show/hide hidden files and dirs, and to not list models/files in hidden directories * add /sdapi/v1/script-info api * alternative solution to fix styles load when edited by human AUTOMATIC1111#9765 as suggested by akx * simplify name pattern setting tooltips * add option to reorder tabs fix Reload UI not working * remove some code duplication from AUTOMATIC1111#9348 * option to specify editor height for img2img * rework AUTOMATIC1111#8863 to work with all img2img tabs * Error Improvement for install torch * extend eslint config * eslint related file edits * eslint the merged code * keep old option for ngrok * python linter fixes * add --gradio-allowed-path commandline option * Modify xformers instead of pytorch * if sd_model is None, do not always try to load it * add messages about Loras that failed to load to UI * .change -> .release for hires input Improves overall UI responsiveness. * rework hires prompts/sampler code to among other things support different extra networks in first/second pass rework quoting for infotext items that have commas in them to use json (should be backwards compatible except for cases where it didn't work previously) add some locals from processing function into the Processing class as fields * bump gradio * Reorder variable assignment * Revert "Merge pull request AUTOMATIC1111#10440 from grimatoma/increaseModelPickerWidth" This reverts commit 4b07f2f, reversing changes made to 4071fa4. * Fix blinking text of hr and scale res goodbye * make it possible for scripts to add cross attention optimizations add UI selection for cross attention optimization * rework AUTOMATIC1111#10519 * Use name instead of hash in xyz_grid X/Y/Z grid was still using the old hash, prone to collisions. This changes it to use the name instead. Should fix AUTOMATIC1111#10521. * set Navigate image viewer with gamepad option to false by default, by request * update CHANGELOG * fix linter issues * Added the refresh-loras post request * rewrite uiElementIsVisible rewrite visibility checking to be more generic/cleaner as well as add functionality to check if the element is scrolled on screen for more intuitive paste-target selection * Spel chek changelog some * make links to http://<...>.git git extensions work in the extension tab * change upscalers to download models into user-specified directory (from commandline args) rather than the default models/<...> * Apply suggestions from code review Co-authored-by: Aarni Koskela <akx@iki.fi> * split visibility method and sort instead split out the visibility method for pasting and use a sort inside the paste handler to prioritize on-screen fields rather than targeting ONLY on screen fields * linter fixes * make it actually work after suggestions * Add .git-blame-ignore-revs * Fix ruff lint * eslintrc: Sort eslint rules * eslintrc: Use a file-local `global` comment for module * eslintrc: mark most globals read-only * eslintrc: enable no-redeclare but with builtinGlobals: false * Replace args_to_array (and facsimiles) with Array.from * get_tab_index(): use a for loop with early-exit for performance * Fix typo "intialize" * Deduplicate default extra network registration * Simplify CORS middleware configuration * Make load_scripts create new runners (removes reload_scripts) * Refactor validate_tls_options out, fix typo (keyfile was there twice) * Refactor configure_sigint_handler out * Refactor configure opts_onchange out * Note pending PR for app_kwargs * Refactor gradio auth * Deduplicate webui.py initial-load/reload code * Overhaul tests to use py.test * simplify PR page * change width/heights slider steps to 64 from 8 * allow hiding buttons in ui-config.json * calculate hashes for Lora add lora hashes to infotext when pasting infotext, use infotext's lora hashes to find local loras for <lora:xxx:1> entries whose hashes match loras the user has * Use Optional[] to preserve Python 3.9 compatability * preserve declarations * extensions clone depth 1 * Update keyboard shortcut instructions for MacOS users in text selection guidance * Support edit attn shortcut in hires fix prompts * hiresfix prompt: add classes, update css sel * Add a couple `from __future__ import annotations`es for Py3.9 compat * add DPM-Solver++(2M) SDE from new k-diffusion * Discard penultimate sigma for dpmpp_2m_sde * --filter=blob:none Co-Authored-By: Aarni Koskela <akx@iki.fi> Co-Authored-By: catboxanon <122327233+catboxanon@users.noreply.github.com> * revert git describe --always --tags for extensions because it seems to be causing issues * reworking launch.py: rename * reworking launch.py: add references to renamed file * Revert "change width/heights slider steps to 64 from 8" This reverts commit 9a86932. * update readme * fix bad styling for thumbs view in extra networks AUTOMATIC1111#10639 * possible fix for empty list of optimizations AUTOMATIC1111#10605 * update the changelog to mention 1.3.0 version * fix serving images that have already been saved without temp files function that broke after updating gradio * fix linter issue for 1.3.0 * add quoting for infotext values that have a colon in them * Merge pull request AUTOMATIC1111#10785 from nyqui/fix-hires.fix fix "hires. fix" prompt sharing same labels with txt2img_prompt * Merge pull request AUTOMATIC1111#10804 from AUTOMATIC1111/fix-xyz-clip Fix get_conds_with_caching() * Merge pull request AUTOMATIC1111#10808 from AUTOMATIC1111/fix-disable-png-info fix disable png info * fix [Bug]: LoRA don't apply on dropdown list sd_lora AUTOMATIC1111#10880 * add changelog * assign devices.dtype early because it's needed before the model is loaded * update readme * revert default cross attention optimization to Doggettx make --disable-opt-split-attention command line option work again * update readme --------- Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: AUTOMATIC1111 <16777216c@gmail.com> Co-authored-by: papuSpartan <30642826+papuSpartan@users.noreply.github.com> Co-authored-by: w-e-w <40751091+w-e-w@users.noreply.github.com> Co-authored-by: brkirch <brkirch@users.noreply.github.com> Co-authored-by: catboxanon <122327233+catboxanon@users.noreply.github.com> Co-authored-by: Sakura-Luna <53183413+Sakura-Luna@users.noreply.github.com> Co-authored-by: Keith <1868690+wk5ovc@users.noreply.github.com> Co-authored-by: bobzilladev <bob@ngrok.com> Co-authored-by: grimatoma <grimatoma@gmail.com> Co-authored-by: Weiming <ciici123@hotmail.com> Co-authored-by: dennissheng <dennismtsg@gmail.com> Co-authored-by: Baptiste Rajaut <pokexpert30@gmail.com> Co-authored-by: Iheuzio <97270760+Iheuzio@users.noreply.github.com> Co-authored-by: Kamil Krzyżanowski <kamnxt@kamnxt.com> Co-authored-by: ryankashi <ryankashi@berkeley.edu> Co-authored-by: Thottyottyotty <thot@thiic.cc> Co-authored-by: anonCantCode <133663594+anonCantCode@users.noreply.github.com> Co-authored-by: shinshin86 <beagles1986@gmail.com>
|
Hello! To use token merging, do I have to install any package or extension? The reason I ask is, that I've tested the token merge ratio slider (quicksetting) and it did not effect the outcome whatever I chose there. |
The settings are within the settings of the newest update itself. Check under Optimizations. Only thing i myself can't find is the one that you can see the images being generated when you're doing multiple ones :/ |

Describe what this pull request is trying to achieve.
This pull request integrates the speed and memory improvements from the paper Token Merging for Fast Stable Diffusion via dbolya/tomesd. These changes do alter reproducibility of past generations. For that reason, I have set this as an option that is disabled by default but can be tweaked through its settings pane.
dbolya/tomesd
Additional notes and description of your changes
All settings can be found in its section in under the settings tab.
From direct suggestion of the author, the default behavior is to not apply token merging except to the hi-res fix pass. From my testing so far, this generally results in the end image looking cleaner. Undoubtedly, this is due to hr-fix being handed a more coherent image to begin with. If the user still wants to apply token merging globally though, that option is made available in the settings pane.
Environment this was tested in
Windows 11
Citations
Bolya, D., & Hoffman, J. (2023). Token Merging for Fast Stable Diffusion. ArXiv.
Bolya, D., Fu, C.-Y., Dai, X., Zhang, P., Feichtenhofer, C., & Hoffman, J. (2023). Token Merging: Your ViT but Faster. International Conference on Learning Representations.