Optimizer: cache runtime artifacts on filesystem#747
Conversation
This will download all runtime parameters once and stores them in an in memory cache.
fstanis
left a comment
There was a problem hiding this comment.
Some additional thoughts:
FileSystemCachehas no eviction mechanism and no size limit on values you put in the map. You may run out of memory. I'd limit the number of keys at least (but better to use a proper LRU cache).- You'll make copies of duplicate files (two urls returning same data). This can be avoided you keep an index of url -> hash(data) and use hash(data) instead of hash(url) for the filename.
- If you can't write a generic
fetch-compatible wrapper, perhaps you can narrow down whatfetchdoes to something simpler, e.g. async function that only takes a URL and returns a string? That would make it easier to wrap in a cached call.
packages/core/lib/FileSystemCache.js
Outdated
| */ | ||
| 'use strict'; | ||
| const {join} = require('path'); | ||
| const {rmdir, readdir, readFile, writeFile, unlink, existsSync, mkdirSync} = require('fs'); |
There was a problem hiding this comment.
Considering you have this many imports, I'd argue destructuring isn't a good idea, some of these are more readable as part of fs (e.g. fs.unlink vs unlink).
There was a problem hiding this comment.
lol - yeah. Frog in boiling water...
packages/core/lib/FileSystemCache.js
Outdated
| 'use strict'; | ||
| const {join} = require('path'); | ||
| const {rmdir, readdir, readFile, writeFile, unlink, existsSync, mkdirSync} = require('fs'); | ||
| const {promisify} = require('util'); |
There was a problem hiding this comment.
Do we still need this? Unless you want to support Node.js 9 and below (which is probably no longer needed... oldest supported LTS is 10), use fs.promises.
There was a problem hiding this comment.
Good point. Old habit...fixed.
packages/core/lib/FileSystemCache.js
Outdated
| const unlinkAsync = promisify(unlink); | ||
|
|
||
| class FileSystemCache { | ||
| static get( |
There was a problem hiding this comment.
get to me implies a singleton-like pattern, but this actually creates a new instance (also, you used create below for a similar pattern). Also, do you really need this, since it doesn't do much other than provide a default (which you can just do in the constructor).
There was a problem hiding this comment.
Renamed to create. Leftover from a previous implementation.
packages/core/lib/FileSystemCache.js
Outdated
| * limitations under the License. | ||
| */ | ||
| 'use strict'; | ||
| const {join} = require('path'); |
There was a problem hiding this comment.
Nit: you're importing path twice. Also, I suggest not destructuring, join can be very confusing (as opposed to path.join).
packages/core/lib/FileSystemCache.js
Outdated
| await Promise.all( | ||
| entries.map((entry) => { | ||
| let fullPath = path.join(dir, entry.name); | ||
| return entry.isDirectory() ? this.deleteDir_(fullPath) : unlinkAsync(fullPath); |
There was a problem hiding this comment.
This is potentially dangerous: directories can be cyclic. That said, I guess this is not a big deal and hard to fix. I'd consider using fs.rmdir which is experimental or rimraf. Or add a max depth.
Actually, if I understand correctly, nested directories are never created by this cache. I'd throw if there are non-files in a directory.
There was a problem hiding this comment.
rmdir is unfortunately only Node >=12. rimraf has to many dependencies for a non-dev dependency.
There was a problem hiding this comment.
Thinking about this further, there's another danger here: if there's a symlink outside this directory (worst case, to ~ or /), this will delete everything it can. You can argue that's not very likely, but I'd prefer if this was impossible.
A few options:
- Ignore directories and only delete files (since directories are never created this way anyway).
- Check if a directory is a symlink first.
- Resolve symlinks and check if the directory has the initial directory as a prefix.
Regarding fs.rmdir, you can add an if to check if it's available and fallback to your own implementation if not. But probably not needed anyway. Just make sure it doesn't recursively delete my home folder. :)
There was a problem hiding this comment.
sg - changed to only delete files in cache dir
| /** | ||
| * @private | ||
| */ | ||
| async function downloadAmpRuntimeStyles_(config, runtimeCssUrl) { |
There was a problem hiding this comment.
As discussed, consider moving to a more generic fetch wrapper.
packages/optimizer/scripts/init.js
Outdated
|
|
||
| async function warmupCaches() { | ||
| // run a dummy transformation to pre-fill the caches | ||
| await ampOptimizer.transformHtml('<h1>hello world</h1>', { |
| const cacheFile = this.createCacheFileName(key); | ||
| try { | ||
| const content = await readFileAsync(cacheFile, 'utf-8'); | ||
| value = JSON.parse(content); |
There was a problem hiding this comment.
Why must it be JSON? If you use a raw buffer, your cache becomes binary-compatible (and less error prone and less code).
There was a problem hiding this comment.
Have you got an example for this?
There was a problem hiding this comment.
const content = await fs.readFile(cacheFile); // returns Buffer
this.cache.set(key, value); // Buffer in Map
...
fs.writeFile(cacheFile, value); // writes Buffer to file, no need to specify encoding
JSON.parse also works on a Buffer, by coercing it to string first (i.e. JSON.parse(str) === JSON.parse(Buffer.from(str))).
There was a problem hiding this comment.
Got it, so the advantage is that you don't need to mess with encoding?
| await initValidatorRules(runtimeParameters, customRuntimeParameters, config); | ||
| await initRuntimeVersion(runtimeParameters, customRuntimeParameters, config); | ||
| await initRuntimeStyles(runtimeParameters, config); |
There was a problem hiding this comment.
These will execute sequentially, but it can be done in parallel with one (annoying) tranformation.
There was a problem hiding this comment.
good tip! done
sebastianbenz
left a comment
There was a problem hiding this comment.
Thanks Filip! Comments:
- re LRU Cache: not if that's a realistic scenario with releases once a week, but better save than sorry. Using an LRU Cache now.
- same value for different URLs is not a use case as we only store version specific artifacts which are immutable.
- fetch wrapper: I don't see the need here. There are three different fetches happening (ValidatorRules with built-int fetch, JSON and text), some require a max-age some don't. Unifying this feels like over-engineering at this point. This might change if we need to fetch more artifacts.
| await initValidatorRules(runtimeParameters, customRuntimeParameters, config); | ||
| await initRuntimeVersion(runtimeParameters, customRuntimeParameters, config); | ||
| await initRuntimeStyles(runtimeParameters, config); |
There was a problem hiding this comment.
good tip! done
packages/core/lib/FileSystemCache.js
Outdated
| */ | ||
| 'use strict'; | ||
| const {join} = require('path'); | ||
| const {rmdir, readdir, readFile, writeFile, unlink, existsSync, mkdirSync} = require('fs'); |
There was a problem hiding this comment.
lol - yeah. Frog in boiling water...
packages/core/lib/FileSystemCache.js
Outdated
| 'use strict'; | ||
| const {join} = require('path'); | ||
| const {rmdir, readdir, readFile, writeFile, unlink, existsSync, mkdirSync} = require('fs'); | ||
| const {promisify} = require('util'); |
There was a problem hiding this comment.
Good point. Old habit...fixed.
packages/core/lib/FileSystemCache.js
Outdated
| const unlinkAsync = promisify(unlink); | ||
|
|
||
| class FileSystemCache { | ||
| static get( |
There was a problem hiding this comment.
Renamed to create. Leftover from a previous implementation.
| const cacheFile = this.createCacheFileName(key); | ||
| try { | ||
| const content = await readFileAsync(cacheFile, 'utf-8'); | ||
| value = JSON.parse(content); |
There was a problem hiding this comment.
Have you got an example for this?
packages/optimizer/scripts/init.js
Outdated
|
|
||
| async function warmupCaches() { | ||
| // run a dummy transformation to pre-fill the caches | ||
| await ampOptimizer.transformHtml('<h1>hello world</h1>', { |
packages/core/lib/FileSystemCache.js
Outdated
| * limitations under the License. | ||
| */ | ||
| 'use strict'; | ||
| const {join} = require('path'); |
packages/core/lib/MaxAge.js
Outdated
| * @param {Number} timestampInMs time when max-age value was received | ||
| * @param {Number} value max-age value in seconds | ||
| **/ | ||
| static fromJson(timestampInMs, value) { |
There was a problem hiding this comment.
You're right, renamed to from/toObject. I miss Java in these cases...
| const cacheFile = this.createCacheFileName(key); | ||
| try { | ||
| const content = await readFileAsync(cacheFile, 'utf-8'); | ||
| value = JSON.parse(content); |
There was a problem hiding this comment.
const content = await fs.readFile(cacheFile); // returns Buffer
this.cache.set(key, value); // Buffer in Map
...
fs.writeFile(cacheFile, value); // writes Buffer to file, no need to specify encoding
JSON.parse also works on a Buffer, by coercing it to string first (i.e. JSON.parse(str) === JSON.parse(Buffer.from(str))).
packages/core/lib/FileSystemCache.js
Outdated
| async set(key, value) { | ||
| const cacheFile = this.createCacheFileName(key); | ||
| try { | ||
| this.cache[key] = value; |
There was a problem hiding this comment.
I think the test worked because it was always getting a cache miss and loading from a file. You could maybe add a test that clears the folder on the filesystem after a set to make sure it loads it from the LRU.
packages/core/lib/FileSystemCache.js
Outdated
| await Promise.all( | ||
| entries.map((entry) => { | ||
| let fullPath = path.join(dir, entry.name); | ||
| return entry.isDirectory() ? this.deleteDir_(fullPath) : unlinkAsync(fullPath); |
There was a problem hiding this comment.
Thinking about this further, there's another danger here: if there's a symlink outside this directory (worst case, to ~ or /), this will delete everything it can. You can argue that's not very likely, but I'd prefer if this was impossible.
A few options:
- Ignore directories and only delete files (since directories are never created this way anyway).
- Check if a directory is a symlink first.
- Resolve symlinks and check if the directory has the initial directory as a prefix.
Regarding fs.rmdir, you can add an if to check if it's available and fallback to your own implementation if not. But probably not needed anyway. Just make sure it doesn't recursively delete my home folder. :)
2677bb4 to
8427903
Compare
|
@fstanis agree, only removing the files now Thanks everyone for the review! |
This PR addresses the problem that currently transformations rely on working access to the internet:
postinstallhook that will download the latest runtime artifacts to prime the cacheFixes #378 #719 #650
@mdmower would be great if you could check if this works on windows.