Skip to content

(fix) Use module ids for final render order#19184

Merged
alexander-akait merged 2 commits intowebpack:mainfrom
dmichon-msft:render-stable-sort
Feb 5, 2025
Merged

(fix) Use module ids for final render order#19184
alexander-akait merged 2 commits intowebpack:mainfrom
dmichon-msft:render-stable-sort

Conversation

@dmichon-msft
Copy link
Contributor

@dmichon-msft dmichon-msft commented Feb 4, 2025

Fixes #19141

Specifically, addresses determinism of sort order across platforms by leveraging the fact that a developer who desires stable output cross-platform will have already ensured that module ids are stable, so it necessarily follows that sorting by module id will result in a deterministic sort order.

What kind of change does this PR introduce?
bugfix

Did you add tests for your changes?

Added a unit test that verifies that Javascript modules are rendered in sorted order.

Does this PR introduce a breaking change?

The sort order of modules in rendered output will change, but a one-time alteration to that order is no different than changing a salt value.

What needs to be documented once your changes are merged?

@webpack-bot
Copy link
Contributor

For maintainers only:

  • This needs to be documented (issue in webpack/webpack.js.org will be filed when merged)
  • This needs to be backported to webpack 4 (issue will be created when merged)

@alexander-akait alexander-akait merged commit 4815712 into webpack:main Feb 5, 2025
63 checks passed
@alexander-akait
Copy link
Member

Thanks

@sokra
Copy link
Member

sokra commented Mar 7, 2025

This need to be solved differently. Sorting by module I'd basically puts modules in a kind of random order when deterministic module IDs is used. This hurts gzip compression quite a lot compared to ordered by identifier. By identifier puts modules with a similar path next to each other and that compresses much better.
I made a similar change a few years ago and it needed to be reverted...

@dmichon-msft
Copy link
Contributor Author

This need to be solved differently. Sorting by module I'd basically puts modules in a kind of random order when deterministic module IDs is used. This hurts gzip compression quite a lot compared to ordered by identifier. By identifier puts modules with a similar path next to each other and that compresses much better. I made a similar change a few years ago and it needed to be reverted...

I'd expect the order to only meaningfully affect compressibility in unminified builds, though? The paths shouldn't appear in code in a minified build, so I don't see how sorting by identifier would help.

The alternative we considered was to update the sort by identifier function to perform the same portabilization that the DeterministicModuleIdsPlugin does on the identifiers before comparing, but that's likely to hurt performance quite a bit, since even with a cache, it's a lot of lookups to add to the comparison.

@sokra
Copy link
Member

sokra commented Mar 8, 2025

Compression is affected by the order as it depends on which modules are placed next to each other in the chunk. Gzip only has a certain context window to copy similar sequences and copying from nearby code is cheaper. Imagine you have a folder with 100 icon components that all lock exactly the same except for the icon path. Sorting by identifier places them all next to each other, which sorting by module id does not.

@alexander-akait
Copy link
Member

@dmichon-msft @sokra What is our final solution, because the problem of non-deterministic builds on different OS also brings problems, previously, I received feedback quite often to pay attention to this issue.

@hai-x
Copy link
Member

hai-x commented May 4, 2025

It seems that the issue is caused by file path across different OS.

So can we

  1. Simply normalize the module identifier by replace(/\\/g, '/') then perform the actual sorting
  2. Use Intl.Collator.prototype.compare(),

And I tested it locally using the repo mentioned in the issue above. Sorting by module identifier compress better (71.0% vs. 69.4%), resulting in a ~2 KB smaller compressed size.

➜  gzip -l ./dist-identifiers/jszip-bundle-repro.js.gz 
  compressed uncompressed  ratio uncompressed_name
       35306       122129  71.0% ./dist-identifiers/jszip-bundle-repro.js
➜  gzip -l ./dist-moduleIds/jszip-bundle-repro.js.gz  
  compressed uncompressed  ratio uncompressed_name
       37305       122129  69.4% ./dist-moduleIds/jszip-bundle-repro.js

@dmichon-msft
Copy link
Contributor Author

It's not just the / vs \\ in path separators; identifier is also an absolute path and therefore external modules (which have identifiers like "external react" will sort differently on Windows against actual modules depending on which drive the file is on.

@hai-x
Copy link
Member

hai-x commented May 6, 2025

@dmichon-msft

Could you provide additional examples or detailed clarify? I’m sorry, but I’m not fully understanding your idea.

@dmichon-msft
Copy link
Contributor Author

c:/git/project/some-file.js
external react
g:/git/project/some-file.js

The identifier for external react sorts differently depending on the path to the folder where the project is checked out to on Windows.

Granted, the technically correct names would be C:/git/project/some-file.js and G:/git/project/some-file.js, but it is valid to have the lowercase letter.

@dmichon-msft
Copy link
Contributor Author

Most likely the correct long-term solution here is that we should use for sorting the result of getFullModuleName(module, compiler.context, compiler.root) instead of module.identifier() or the assigned id.

@dmichon-msft
Copy link
Contributor Author

Most likely the correct long-term solution here is that we should use for sorting the result of getFullModuleName(module, compiler.context, compiler.root) instead of module.identifier() or the assigned id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Non-deterministic ordering of modules in bundle output when using deterministic module IDs on different platforms

5 participants