Skip to content

core: Use file stat data to fingerprint assets #21297

@bdonlan

Description

@bdonlan

Describe the feature

When fingerprinting large files, use the file stat data (inode, mtime, size) to fingerprint, rather than file content, to reduce the amount of time spent fingerprinting.

Use Case

We've been having issues with slow builds of our CDK projects for some time now. When profiling to identify the cause of these performance issues, I noticed that a significant portion of our execution time was going to asset fingerprinting, and specifically to the digest operation that occurs during fingerprinting. In particular, we are fingerprinting the same relatively large (>300MB) source files multiple times in both our tests and production synthesis.

Proposed Solution

I have a PR that I plan to publish for this feature request soon.

Note that this feature may result in additional false negatives for asset caching, for customers who fingerprint large assets that are in different files (or where the mtime changes between fingerprints), or where very large assets differ in LR/CRLF. It's unlikely to result in false positives, unless the mtime is deliberately manipulated.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

git-f66f94e9201b9c9d5e0f1b713a6f30194b323b28

Environment details (OS name and version, etc.)

Linux (AL2)

Metadata

Metadata

Assignees

Labels

@aws-cdk/coreRelated to core CDK functionalityeffort/smallSmall work item – less than a day of effortfeature-requestA feature should be added or improved.p2

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions