Cache compiled classes by vertex ID#11479
Cache compiled classes by vertex ID#11479colinsurprenant wants to merge 4 commits intoelastic:masterfrom
Conversation
yaauie
left a comment
There was a problem hiding this comment.
In general, I think the approach is sound, so long as a vertex's id is truly unique.
I haven't spent a lot of time in the JEE, so pardon my ignorance in the following questions.
- where does a vertex's id come from, and what is the effective risk of accidental collisions?
- would there be value in ever invalidating elements from the cache? (e.g., with frequent pipeline reloads do we orphan elements in the cache?)
logstash-core/src/main/java/org/logstash/config/ir/compiler/DatasetCompiler.java
Outdated
Show resolved
Hide resolved
logstash-core/src/main/java/org/logstash/config/ir/compiler/ComputeStepSyntaxElement.java
Outdated
Show resolved
Hide resolved
The by using the corresponding plugin id or uniquely generated by computing a hash from the source logstash/logstash-core/src/main/java/org/logstash/config/ir/graph/Vertex.java Lines 194 to 213 in 715295c for conditionals. So a pipeline plugins and conditional This raises a good question though - although plugin ids are required to be unique within a pipeline I believe identical ids can be reused across multiple pipelines which might be a problem here because the cache is a static field thus global across the whole runtime and not a per-pipeline cache... 🤔
That is a good question; there was no previous invalidation logic but I believe if a pipeline is reloaded the corresponding generated cached classes should be cleared if we assume that when reloading a pipeline we want to recompile everything and not try to, say, reuse unchanged parts of the config which seems an potential optimization we may tackle but certainly not now. Maybe we could namespace the cache per pipeline id or maybe have that cache live in the |
|
🤔 If a vertex ID can be sourced from a plugin id, and the plugin id can be specified in the pipeline configuration, this change may introduce a bug in which reloads of any plugin that explicitly give their id in the pipeline configuration will be silently ignored. Is this okay? Are the directives that are a part of that plugin's invocation also cached (e.g., if they change which host an explicitly-id'd elasticsearch output points to, will that change get picked up when recompilation is skipped)? I think you're right that there is a chance of collisions in the multiple pipeline scenario. If we move the cache up into I agree that cache eviction is beyond the scope for this PR, but we may want to file a separate issue to track it. |
Yes you are right. This will be a problem here. Good catch.
AFAICT they are not.
True, but I think this might be a good thing actually:
WDYT? |
|
I created #11482 as an alternate implementation as per above. |
|
closing in favour of #11482 |
Fixes #11105
Relates to #11175
This fixes the compiled classes cache problem where the compiled classes were cached using the
ComputeStepSyntaxElementinstance object as the cache key. The problem is that newComputeStepSyntaxElementinstances are created per worker threadsCompiledExecutionthus the cached classes were never reused.This solution proposes to use the
DataSetassociated vertex ID as the cache key. For this theComputeStepSyntaxElementclass now has a unique IDStringfield which is propagated from theCompiledPipelinewhere the vertex ID is known and used as the cache key.