Skip to content

printtyp: cache short path data in wrap_env#9889

Merged
gasche merged 2 commits intoocaml:trunkfrom
Octachron:pprintyp_short_path_cache
Sep 9, 2020
Merged

printtyp: cache short path data in wrap_env#9889
gasche merged 2 commits intoocaml:trunkfrom
Octachron:pprintyp_short_path_cache

Conversation

@Octachron
Copy link
Copy Markdown
Member

@Octachron Octachron commented Sep 7, 2020

This PR makes Printtyp.wrap_env cache the short-path data associated with the input environment .
Extracted from #8929 .

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2020

The gap between how simple the code looks and how delicate it is to review is quite baffling for this PR. The rest of this comment is going to be a brain dump of where I am in my review.

  1. This patch is not the best way to implement this feature (despite being the approach I recommended). It implements a cache local to wrap_env that interacts with pre-existing caching logic of set_printing_env; so now we have to check that the interaction between the two caching layers is correct. The "best" approach would be to have a single caching layer, or at least split the caching logic in subparts to avoid redundancy between the two functions. But this patch has the advantage of being local to wrap_env: it is a "quick" way to do what we want, and minimally invasive. In light of @lpw25's proposal to submit a completely new -short-paths implementation by the soon-coming May 2020 deadline, the minimal change proposed in this PR makes a lot more sense.

  2. With (1) out of the door, we have to check indeed that the two caching layers interact correctly. One difficulty is printing_old, whose invariants are unclear (what does this part of the global state mean?). I have to reverse-engineer its meaning to review the PR (maybe a comment would help); my current proposal would be that printing_{map,depth,cont} depend only on printing_env and printing_pers, and that the purpose of printing_old is to remember the last printing_env value for which {map,depth,cont} were correctly updated. (The cache-ability checks with respect to printing_pers are done in same_printing_env.) If this patch now sets printing_old without recomputing {map,dep,cont} from scratch, we have to check that they are set in the exact same way that they would have been recomputed anyway, so that the printing_old invariant still holds.

  3. Some of the ideas in (2) could have been comments in the code, before the present PR or introduced in this PR to ease reviewing and further maintenance.

  4. The key question I guess is whether printing_{map,depth,cont} really only depends on printing_env and printing_pers when they are re-computed in the set_printing_env slow path. Is this an assumption you are making, or did you check this? I'm not sure from the cont-computation code what the dependencies are, and in particular it is not obvious how the code depends on Env.used_persistent / printing_pers (I would assume that this comes from the call to normalize_type_path?).

  5. I have no idea what printing_depth means. It seems to be initialized at 0 when set_printing_env recomputes stuff, and be incremented gradually during further usage of the printing map. If I read the code correctly, before this PR it would typically be reset to 0 by wrap_env, and now it gets to keep its old value. I would optimistically guess that this is fine, but could someone explain why?

@Octachron
Copy link
Copy Markdown
Member Author

In term of dependencies, the printing_map depends on the immutable environment and the mutable persistent environment (because any request sent to env implictly depends on the persistent environment) . The printing_pers snapshot only provides an approximation for the persistent part of the dependencies of printing_map.

However, since the set of modules loaded in the persistent environment only grows with time, knowing that this snapshot is equal to the current set of persistent modules is enough to know that the persistent environment did not change between the snapshot and the exit of wrap_env.

The two variables printing_cont and printing_depth are implementation details of the lazy computation strategy for printing_map. The counter printing_depth corresponds to the length of the paths that have been explored in the environment, and printing_cont stores the continuation necessary to explore the environment one step further.

One consequence is that with the current PR, we lose some progress in the computation of the printing_map when fenv is the identity, which could be avoided, but it is fine (if pessimistic) to restore it to a previous state.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2020

Thanks! Here is what I understand now.

We are doing key/value caching with just one cache slot, where

  • printing_old and printing_pers are the "key"
  • printing_{map, depth, cont} are the "value"

with the subtlety that the value is not constant for a given key, it changes over time, but monotonically: it always correct to reset the cached value of a key to a previous state.

set_printing_env is responsible for computing (and caching) the value for a key, but will not recompute it if already in cache.

The new wrap_env code caches the cache: it ensures that the cache is in the same state before changing the environment and after it was changed -- and we are about to call set_printing_env to set it back to its previous state.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2020

With what I understand now, I believe the PR is correct. Still, don't you think that it would be useful to codify some of this now-shared knowledge into code comments?

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2020

Wrapping with an identity fenv is done in two places: to print signatures, and in the corner case of a functor with an unnamed parameter (functor (_ : S) -> ...). If I understand correctly, the new code actually behaves worse than the previous one in common instances of theses cases. (Previously I thought that, in those cases, they would be reset to their initial value for the key, but not if the same_printing_env criteria applies.)

Would the following avoid the reset?

let wrap_env fenv ftree arg =
  let old_env = !printing_env in
  let old_depth = !printing_depth in
  let old_cont = !printing_cont in
  let old_pers = !printing_pers in
  let old_map = !printing_map in
  let new_env = fenv old_env in
  set_printing_env new_env;
  let tree = ftree arg in
  printing_pers := old_pers;
  printing_old := old_env;
  if old_env != new_env then begin
    printing_map := old_map;
    printing_depth := old_depth;
    printing_cont := old_cont;
  end;
  (* set_printing_env checks that persistent modules did not change *)
  set_printing_env env;
  tree

I also have a new naive question: why are we calling wrap_env with the same environment in the first place? This is not a no-op, given that it synchronizes the printer state with the persistent environment, but is this the reason why we are doing it? Do we only synchronize in this way at wrap_env or set_printing_env callsites; do we know they synchronize "often enough"?

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2020

  if old_env != new_env then begin

in fact this should probably be

  if old_env != !printing_env then begin

or

  if !printing_old != !printing_env then begin

@Octachron
Copy link
Copy Markdown
Member Author

I would suggest the slightly less efficient:

  if not (same_printing_env old_env) then ...
  set_printing_env old_env

I could certainly add comments that we are snapshotting the short-path cache, with printing_old and printing_pers as keys. Did you have other points in mind?

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2020

I trust you to add the right amount of comments; I would leave it to you so that we can go back to reviewing the rest of this exciting codebase.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2020

Above I meant that you should rebase your PR with comments as you deem them necessary, but that I'm not necessarily planning to do a careful pass of re-review on them, so you should feel free to go ahead and merge.

On the other hand, I feel like a Changes entry would be good to have. The entire reason you are doing this work is to avoid a possible performance regression. If we screwed up and a regression there is, having a Changes entry will help our angry users track us down faster.

@Octachron
Copy link
Copy Markdown
Member Author

I agree that that a changelog entry is a good idea, since the change is not that trivial.

Concerning the use of wrap_env in tree_of_signature, the motivation seems unclear since the function requires the printing environment to have been set up before being called. Moreover, removing the call to wrap_env does not trigger the ire of the test suite.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2020

Moreover, removing the call to wrap_env does not trigger the ire of the test suite.

You spent time reading Jacques' code, and somehow it changed you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants