Skip to content

Cache all available methods list for expression completion#1583

Closed
printercu wants to merge 10 commits intopry:masterfrom
printercu:master
Closed

Cache all available methods list for expression completion#1583
printercu wants to merge 10 commits intopry:masterfrom
printercu:master

Conversation

@printercu
Copy link
Copy Markdown

@printercu printercu commented Nov 13, 2016

Idea from #1548 with specs and jruby support.

For MRI we simply use RubyVM.stat as cache version, for other rubies we count
all defined instance methods. While we still iterates over ObjectSpace
we can have improvements from not sorting methods list.

However, counting instance methods for jruby may give errors (on module has been gc'ed, and other was defined with same methods count, and we won't see this methods) it seems to be not related to the most of real apps.

@printercu
Copy link
Copy Markdown
Author

Uh. Old rubies doesn't have RubyVM.stat. I think I'll just use fallback same as jruby for them.

@printercu
Copy link
Copy Markdown
Author

@jazzonmymind according to you comment #1548 (comment), later all_available_methods_cached can be just moved to instance methods without mutex.

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Nov 13, 2016

https://github.com/pry/pry/blob/master/lib/pry/pry_instance.rb#L136 has to be updated to hold onto the completer instance. something like this could work:

@completer ||= config.completer.new(config.input, self)
@completer.call(...)

… probably take a while to review and test this.

@printercu
Copy link
Copy Markdown
Author

printercu commented Nov 13, 2016

Should I add it to this PR?

@printercu
Copy link
Copy Markdown
Author

@jazzonmymind does pry creates new instance on every binding.pry?

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Nov 13, 2016

yup.

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Nov 13, 2016

yup to both questions ;-) locally you could create a new branch from the work you've already done, and confirmed it has no bugs or unintended consequences, then merge into this branch and push but it's your call.

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Nov 13, 2016

it won't be cleared all the time. it will be cleared between pry sessions, and i think that's fine.

@printercu
Copy link
Copy Markdown
Author

Here is it, tests are green.

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Nov 13, 2016

thanks, but why are all_available_methods_cached and all_available_methods still class methods? i think we can make them private instance methods, like all_available_methods is.

@printercu
Copy link
Copy Markdown
Author

printercu commented Nov 13, 2016

.methods_cache_version and .all_available_methods doesn't depend on instance's state, so it's better to have them on class level.

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Nov 13, 2016

okay, but what utility do they provide as class methods? they seem to be better suited as private instance methods to me, since they serve no purpose as a public API.

@printercu
Copy link
Copy Markdown
Author

Ready

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Nov 13, 2016

thanks. i will test this soon.

@printercu
Copy link
Copy Markdown
Author

@R-obert any updates on it?

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Dec 3, 2016

sorry, i've been busy with work and this weekend i'm travelling. it looks good to me, i just want to clone it locally and test how it works.

# all defined instance methods. While we still iterate over ObjectSpace
# we can have improvements from not sorting methods list.
def methods_cache_version
RubyVM.stat if defined?(RubyVM.stat)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to return here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed. Nice catch, thanks!

@dgutov
Copy link
Copy Markdown
Contributor

dgutov commented Dec 12, 2016

@printercu Could you see whether #1588 improves your use case as well, and if so, by how much?


# For MRI we simply use RubyVM.stat, for other rubies we count
# all defined instance methods. While we still iterate over ObjectSpace
# we can have improvements from not sorting methods list.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that sorting is the slow part?

In my case (using MRI), and with my branch, the ObjectSpace scan takes like 110ms, and the sort+uniq took just 45ms. Moving uniq! forward to be called first turned that into 30ms.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my benchmarks from benchmark-ips.

  • methods_cache_version - implementation from this PR, without first line (counts all methods, for jruby and old rubies).
  • all_available_methods - implementation from this pr
  • all_available_methods_2 - with swapped sort & uniq
  • all_available_methods_im - from your pr
  • all_available_methods_im_2 - with swapped sort and uniq back (as in master)

My app setup:

> methods_cache_version
=> 33752 # methods count
> ObjectSpace.each_object(Module).count
=> 7024
methods_cache_version:      27.9 i/s
all_available_methods_im_2: 11.6 i/s - 2.41x  slower
all_available_methods_im:   10.8 i/s - 2.57x  slower
all_available_methods:      0.6 i/s  - 42.90x  slower
all_available_methods_2:    0.6 i/s  - 44.13x  slower

Adding RubyVM.stat if defined?(RubyVM.stat) as :mri:

                 mri:   905997.4 i/s
methods_cache_version:      27.0 i/s - 33540.07x  slower
all_available_methods_im:   12.7 i/s - 71228.09x  slower
all_available_methods_im_2: 11.4 i/s - 79257.78x  slower
all_available_methods_2:    0.6 i/s  - 1577378.45x  slower
all_available_methods:      0.6 i/s  - 1635400.51x  slower

So:

  • Caching is faster than not-caching x2.4 times on old rubies, and jruby.
  • and x33540 times on modern mri.
  • I'll add stuff from your pr.

Really great, thanks! Do you agree we should use cache?

@printercu
Copy link
Copy Markdown
Author

Copied #1583 (comment)

Here are my benchmarks from benchmark-ips.

  • methods_cache_version - implementation from this PR, without first line (counts all methods, for jruby and old rubies).
  • all_available_methods - implementation from this pr
  • all_available_methods_2 - with swapped sort & uniq
  • all_available_methods_im - from your pr
  • all_available_methods_im_2 - with swapped sort and uniq back (as in master)

My app setup:

> methods_cache_version
=> 33752 # methods count
> ObjectSpace.each_object(Module).count
=> 7024
methods_cache_version:      27.9 i/s
all_available_methods_im_2: 11.6 i/s - 2.41x  slower
all_available_methods_im:   10.8 i/s - 2.57x  slower
all_available_methods:      0.6 i/s  - 42.90x  slower
all_available_methods_2:    0.6 i/s  - 44.13x  slower

Adding RubyVM.stat if defined?(RubyVM.stat) as :mri:

                 mri:   905997.4 i/s
methods_cache_version:      27.0 i/s - 33540.07x  slower
all_available_methods_im:   12.7 i/s - 71228.09x  slower
all_available_methods_im_2: 11.4 i/s - 79257.78x  slower
all_available_methods_2:    0.6 i/s  - 1577378.45x  slower
all_available_methods:      0.6 i/s  - 1635400.51x  slower

So:

  • Caching is faster than not-caching x2.4 times on old rubies, and jruby.
  • and x33540 times on modern mri.
  • I'll add stuff from your pr.

Really great, thanks! Do you agree we should use cache?

@dgutov
Copy link
Copy Markdown
Contributor

dgutov commented Dec 12, 2016

@printercu Could you please clarify the difference between methods_cache_version and all_available_methods?

@printercu
Copy link
Copy Markdown
Author

printercu commented Dec 12, 2016

methods_cache_version runs on every completion, and its result is used to check if cache is stale or not.
all_available_methods only runs to populate cache when its stale.

@dgutov
Copy link
Copy Markdown
Contributor

dgutov commented Dec 13, 2016

Ah, so these are the method names. Thanks, now I understand the results.

Regarding the non-MRI code path, I'm a bit uneasy that the computation of the cache key takes about 40% of the original computation. Have you tried asking JRuby guys about an alternative to RubyVM.stat? I could use it in a different project as well.

@@ -0,0 +1,64 @@
class Pry
module Helpers
module CompleterHelpers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm a fan of this change, personally. Big-ish classes are not that bad.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly because this methods are static. I don't like the idea to have them inside instance methods. But due to #1583 (comment), I've moved them out of class << self.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd second the "private instance methods" suggestion actually, if they aren't called from anywhere else.

@dgutov
Copy link
Copy Markdown
Contributor

dgutov commented Dec 13, 2016

Do you agree we should use cache?

If 100ms is still too much for you, then sure. And on recent MRI it's blazing fast, so that's good.

But IMHO we should merge my change first: it's simpler, API-compatible, and while there's some risk in the module-traversal approach (as usual, due to misbehaving gems), it seems to hold up so far. Maybe even into the release branch?

@printercu
Copy link
Copy Markdown
Author

printercu commented Dec 13, 2016

I've included your change here. If your one would be merged first, I would rebase too much. I think it's just unnecessary work. And this one also does not provide any api change. omg

@dgutov
Copy link
Copy Markdown
Contributor

dgutov commented Dec 13, 2016

I've included your change here.

I didn't ask for that, though.

If your one would be merged first, I would rebase too much. I think it's just unnecessary work.

With some history rewriting, it shouldn't be too hard. Maybe by removing the commits that added my change, or just by taking the latest versions of the files, copying them over and making one commit. Keeping the intermediate is probably not worth it here.

Anyway, if my PR gets merged first, I can do the rebasing for you.

And this one also does not provide any api change. omg

It's a minor one, sure. But maybe there's a completer out there that depends on being instantiated anew each time, IDK.

Also, this patch might use a bit more cooking. You haven't replied to the question about JRuby's alternative to RubyVM.stat.

@printercu
Copy link
Copy Markdown
Author

Why don't you want to talk to jruby guys yourself? Anything else here I should do for you?

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Apr 8, 2017

i would prefer to merge #1588 first, it includes a significant speed improvement without using a cache (5s to 0.16s), and the implementation is less complex than this one. we could add the cache in a follow up PR, and since this PR adopted the changes in #1588 it should be possible to do.

@printercu
Copy link
Copy Markdown
Author

@R-obert this pr includes #1588

@printercu
Copy link
Copy Markdown
Author

#ignored_modules is that method. I've applied it, and added specs for it.

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Apr 8, 2017

i know, read my message again :)

@printercu
Copy link
Copy Markdown
Author

printercu commented Apr 8, 2017

omg. i just don't understand why this hard way is required, if you still want to accept this pr. we will get exactly the same result, but i will also resolve conflicts.

@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Apr 8, 2017

#1588 is preferable because its implementation is much more simple but it also improves performance significantly. we might not need a cache anymore, and if we do it's better as a PR based on that work, so we can see the added complexity and if it is worth it or not. @dgutov also made some good points about issues in this PR.

@mvz mvz mentioned this pull request May 13, 2017
@0x1eef
Copy link
Copy Markdown
Contributor

0x1eef commented Aug 20, 2017

Going to close this. #1588 was merged. If you want to add caching on top, please open another pull request.

@0x1eef 0x1eef closed this Aug 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants