I'm becoming more and more convinced that identity substitutions in evars being non-constant-time is the source of the next large wave of performance issues in Coq tactics (the previous "wave" having been related to pervasive evar-normalization and having been mostly fixed by EConstr). I'm creating this issue in part as a reference, and in part so that we have a place to discuss this particular change rather than having comments scattered across a multitude of issues.
Probably related issues: #8237, #8244, #8245, #9582, #11896, #12487, #12524
Probably tangentially related issues / issues on which related discussion occurred: #4964, #8231, #10206
Here is a collection of some comments about this topic:
This seems like it might be that evar creation is linear(?) in the size of the context of the evar
An evar is represented in the evar map as a sequent: context |- ?123 : type, hence the size of these things are linear in the size of the context.
Originally posted by @gares in #8237 (comment)
One issue is that the evar context is named while the term context might
just be de Bruijn variables (usually a mix of a named goal context and a
local de Bruijn context). So there is not only a copy but a translation
involved. I think that computation is shared as much as possible now,
though.
Originally posted by @mattam82 in #8237 (comment)
In Coq evars are equipped with an explicit substitution that has the size of the context, even if this substitution is the identity one. You proof term, before pose, is
let H1 := I : True in ... ?1[H1 -> H1, ....]
and after pose they are
let H1 := I : True in ... let Hn := I : True in ?2[H1 -> H1, ...., Hn -> Hn]
and in the evar map
H1 ... |- ?1 := let Hn := I : True in ?2[H1 -> H1, ... Hn -> Hn]
The explicit substitutions are useful only if some reduction is involved ( in this case they are not the identity)
Given the current representation I really don't see how things can be made non-quadratic easily.
Sure, one could carefully share the previous explicit susbtitution and just extend it, but I don't think it is easy to do in practice (eg, they are Arrays IIRC).
In Matita we had a smarter representation for evars applied to the identity substitution, eg
type exp = Id of int | Subst of term list
Not sure this is the only root of the quadratic complexity, but for sure it is one.
Originally posted by @gares in #8237 (comment)
@gmalecha This wouldn't work, because existential substitutions represent named substitutions whereas esubst stands for de Bruijn substitutions. More generally, the problem here is not so much the fact that the substitution is delayed, because we don't perform any in practice, they are all identity. Rather, it's the fact that the representation is very redundant for substitutions that are mostly identity.
Originally posted by @ppedrot in #8237 (comment)
This is a pretty naive question whose consequences I don't evaluate well: would it be worth to introduce the long-mentioned abstract representation of the identity substitution on named variables at the same time?
Originally posted by @herbelin in #11896 (comment)
@herbelin My understanding (though I can't now find the comment where @ppedrot told this to me) is that introducing that optimization complicates things somewhat, because then we need access to the environment/context to reconstruct the substitution, so we have to thread that through many more functions.
Originally posted by @JasonGross in #11896 (comment)
@JasonGross:
because then we need access to the environment/context to reconstruct the substitution, so we have to thread that through many more functions.
Yes, I think my question indeed reduces to how "many" is the "many more". Is it mainly the functions of the form occur_var or replace_vars, as well as a few functions about evar unification, or more?
Originally posted by @herbelin in #11896 (comment)
@herbelin adding identity substitutions is much heavier in terms of API. A lot of code needs to be adapted to handle the quotient, and this requires threading additional data. Furthermore it is not even clear how to handle them in some cases, e.g. are we supposed to expand identity substitutions on-the-fly in term iterators? Also, is the quotient 'canonical' in the sense that we forbid identity substitutions to be represented by the non-compact representation? In this case we also need to change the API for mkEvar, and so on so forth.
So, I believe that the current change is much more self-contained than additionally introducing efficient identity representations.
Originally posted by @ppedrot in #11896 (comment)
@ppedrot: Indeed, that'd be much more work. Spontaneously, I would lean for a canonical representation. For iterators, maybe having two versions of them, or, maybe, passing a function which tells what to do on the evar instance (I agree, this would be a rather noticeable change, but if the gain is worth...).
My (intuitive) 2p.
Originally posted by @herbelin in #11896 (comment)
I'm becoming more and more convinced that identity substitutions in evars being non-constant-time is the source of the next large wave of performance issues in Coq tactics (the previous "wave" having been related to pervasive evar-normalization and having been mostly fixed by EConstr). I'm creating this issue in part as a reference, and in part so that we have a place to discuss this particular change rather than having comments scattered across a multitude of issues.
Probably related issues: #8237, #8244, #8245, #9582, #11896, #12487, #12524
Probably tangentially related issues / issues on which related discussion occurred: #4964, #8231, #10206
Here is a collection of some comments about this topic:
Originally posted by @gares in #8237 (comment)
Originally posted by @mattam82 in #8237 (comment)
Originally posted by @gares in #8237 (comment)
Originally posted by @ppedrot in #8237 (comment)
Originally posted by @herbelin in #11896 (comment)
Originally posted by @JasonGross in #11896 (comment)
Originally posted by @herbelin in #11896 (comment)
Originally posted by @ppedrot in #11896 (comment)
Originally posted by @herbelin in #11896 (comment)