Skip to content

Shapes: Add an option to share memo tables accros reductions#11020

Closed
voodoos wants to merge 2 commits intoocaml:trunkfrom
voodoos:shape-reduce-shared-memo
Closed

Shapes: Add an option to share memo tables accros reductions#11020
voodoos wants to merge 2 commits intoocaml:trunkfrom
voodoos:shape-reduce-shared-memo

Conversation

@voodoos
Copy link
Copy Markdown
Contributor

@voodoos voodoos commented Feb 16, 2022

While developing a new tool that indexes uids from cmt files we realised that performing so many shapes reductions would benefit from a shared memo table. (In the current implementation a new memo table is created for each call to reduce.)

In this PR I propose a new option that is passed during the functor instantiation to choose to use persistent memo tables instead of reduce-bound ones. In that case the memo tables are created during the functor instantiation and are thus shared by all subsequent reduce calls.

This led to a nice performance improvement when indexing the cmts of functor-hungry projects: on Irmin the indexing time for all the cmts of the main lib drops from 14.5s to 1.3s.

@voodoos voodoos force-pushed the shape-reduce-shared-memo branch from 850aa64 to 013e724 Compare February 16, 2022 15:07
Co-authored-by: Nicolás Ojeda Bär <n.oje.bar@gmail.com>
@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 16, 2022

Sharing the memoization table across reductions is only valid if the global_env is the same (or, more precisely, if the subset of the global environment actually used by the term is the same). How do you know that this is valid in your Merlin use-case? Could we surface this semantic constraint in the API (for example, with a global_env comparison function, or some other way to know for sure that the tables can be reused) instead of your approach that risks producing incorrect results?

@voodoos
Copy link
Copy Markdown
Contributor Author

voodoos commented Feb 18, 2022

We discussed with @trefis and we think that:

  • inside a same compilation unit it should be safe to call reduce with various local environments since all idents are assigned a different number in increasing order.
  • across compilation units it is wrong in the general case because a memoised value could be wrong when it depends from path-based search in the environment which happens in the Var case. For instance it could happen if a functor argument had the same Ident than a recursive module in another compilation unit.

However it is not clear to us how we should make this clear from the API. Given the rare use-cases for this it might be sufficient to document that clearly in the functor's signature ?

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 18, 2022

I see three approaches that I think would be always correct:

  • In the signature of the Make_reduce functor, we could ask the user for a comparison or hashing function on global environments. Then we can always memoize reductions that share the same global environment. If I understand correctly, this corresponds to your "inside the same compilation unit" case, and possibly some more. But the question is whether you know how to compare/hash your global environments. Both proposals below assume that you don't know, and work around the problem inside the functor.
  • We could keep track in the memoization tables of which terms did not use the global environment at all, and share those across reductions. (More precisely: the module has "global" memo tables for "local" terms that don't depend on the global environment, and reduction-specific memo tables (as today) for those that do.)
  • Or we could keep track, for each value in the memo table, of the associative list of queries they made to the global environment and their result (a "global query set"). When we query a value in the memo table, there may be several matches corresponding to different global-querysets. We query our own global environment to know if any of the memoized querysets coincides with it -- if so, return the corresponding result, otherwise recompute. This is the option that allows the highest amount of sharing, but it is also more complex.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Feb 18, 2022

Let me try to rephrase a bit what @voodoos just said, because there's a potential for a vocabulary clash.
What he calls "a local environment" is not reduce's local_env but a typing environment (so a Env.t) local to the place he calls reduce from.

So, here's my version of his argument: reusing the cache across calls made with different Env.ts could produce the wrong result if the value found in cache depends on a previous environments.
A value in the cache depends on an environment if calls to Params.find_shape happened while reducing it.
This happens when a Var id shape is encountered during reduction.

A first thing to note (which is @voodoos's first point) is that Params.find_shape is given an Ident.t, which are never reused inside compilation units, meaning the cached value should never be incorrect. So reusing the cache inside a given compilation unit should always be correct.

Now, if we wanted to share the cache across compilation units, we don't have any such guarantee that a particular Ident.t might not come from two different units, and so mean different things.
So it is not, in general, safe to reuse the cache across compilation units. (Though I guess it would currently still be acceptable if neither unit defines recursive modules?)

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 18, 2022

Yes, Params.find_shape is the only way to have a reduction depend on what I call the "global environment", and you call the "local environment". If your global environment is just an Env.t, I guess we could just compare/hash them by summaries, and maybe just moving to a pair (<name of the current compilation unit>, <Env.t>) would get you excellent performance. (You could even cheat in the functor argument and only compare/hash the first argument, if you insist on living the dangerous life.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants