Skip to content

WIP Lru cache#40

Merged
andrewvc merged 4 commits intologstash-plugins:masterfrom
andrewvc:lru_cache
Sep 15, 2015
Merged

WIP Lru cache#40
andrewvc merged 4 commits intologstash-plugins:masterfrom
andrewvc:lru_cache

Conversation

@andrewvc
Copy link
Copy Markdown
Contributor

No description provided.

@andrewvc
Copy link
Copy Markdown
Contributor Author

This provides a nearly 50% speedup on a test dataset.

~/D/logstash-1.5.4 $ sh -c 'sleep 120 && killall -9 java' & bin/logstash -w2 -f complete.conf | pv -l > /dev/null
 927k 0:02:00 [7.73k/s] [                     <=>                                                  ]
fish: Process 40059, 'bin/logstash' from job 2, 'bin/logstash -w2 -f complete.co…' terminated by signal SIGKILL (Forced quit)
'sh -c 'sleep 120 && killall -9…' has ended
~/D/logstash-1.5.4 $ emacs Gemfile
~/D/logstash-1.5.4 $ sh -c 'sleep 120 && killall -9 java' & bin/logstash -w2 -f complete.conf | pv -l > /dev/null
1.57M 0:02:00 [13.1k/s] [                     <=>                                                  ]
fish: Process 40108, 'bin/logstash' from job 2, 'bin/logstash -w2 -f complete.co…' terminated by signal SIGKILL (Forced quit)
'sh -c 'sleep 120 && killall -9…' has ended

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of an ivar we could use an accessor method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to change that strategy where "3rd party" gems are actually required in the register method instead of in the module top? we all agree & understand that this is an anti-pattern and we should avoid requiring elsewhere than at module top (there are totally acceptable exceptions) but in our case, the purpose for this, IIRC was to actually requiring all 3rd party gems unless the plugins were actually specified in a config. Maybe this is not a concern anymore, not sure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an anti-pattern too and we should move all the require at the top of the file and stop doing lazy require of library. There is no real gain in lazy mode, since the plugins need to be installed and loaded via Plugin#lookup to actually load the required files, this also help us track errors more easily.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonetheless I'd like to make sure we understand why this was previously needed and why it is not now, not just "guessing" it should be ok now and change a construct that was purposefully valid at some point. It is certainly the case that it is not required now but I'd like that to not be guess work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have a habit of fixing little things as I go. I'm fairly certain this was just 'the style at the time' (I used to wear an onion on my belt), so I don't think its dangerous.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly certain it was not the style at the time and it was purposely there to avoid mass requiring all 3rd party gems when all the plugins were installed but not configured or something around these lines.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STORY TIME!

The reason Logstash historically put require "some third party thing" in the register method was because .... computers.

The libraries we used in Logstash (JRuby) were not always available or installed for MRI, and at the time, I used MRI to generate the docs, and because the docs generator calls eval on the code 😃 those top-of-the-file require calls would fail because the gems weren't available. Why MRI for docs building (at the time)? Because, way back when, the only "good" markdown libraries for Ruby only worked in MRI. Uuuuugh computers.

So that's the story. require in register because of markdown and mri and jruby.

Today? I don't think these requirements are necessarily true anymore, though it would be nice to be able to build the docs without needing the runtime dependencies, we can probably hack around that as needed.

+1 on, long term, moving requires to the top of the file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awwwwwwww thanks for the background on this.

@untergeek
Copy link
Copy Markdown
Contributor

These LRU Cache additions are awesome, and very straightforward. 👍

@ph
Copy link
Copy Markdown
Contributor

ph commented Sep 15, 2015

I am OK with this PR! Test run!

I have looked at the geoip code, there is an option to load the entire dataset into memory, but I think the LRU approach is more memory friendly and will provide an awesome speed boost!

@ph
Copy link
Copy Markdown
Contributor

ph commented Sep 15, 2015

LGTM!

@untergeek
Copy link
Copy Markdown
Contributor

Indeed. Compressed, the entire dataset is around 20MB, I think. Loading that into memory could be a bit much.

@colinsurprenant
Copy link
Copy Markdown
Contributor

Did we measure the uncompressed db memory size? 20MB is peanuts and ~100MB would still be peanuts. it could be configurable? LRU lookups should always be faster than even an in-memory db since there is no parsing to do but a combination or LRU & in-memory could be an extreme combo!

@untergeek
Copy link
Copy Markdown
Contributor

» ls -lh GeoLiteCity-2013-01-18.dat
-rw-r--r-- 1 buh buh 18M Sep 15 10:49 GeoLiteCity-2013-01-18.dat

Not sure what it's compressed with, but there it is.

@ph
Copy link
Copy Markdown
Contributor

ph commented Sep 15, 2015

created this #41 for investigating preload of the database.

andrewvc added a commit that referenced this pull request Sep 15, 2015
@andrewvc andrewvc merged commit 3cad838 into logstash-plugins:master Sep 15, 2015
@andrewvc andrewvc deleted the lru_cache branch September 15, 2015 17:09
@andrewvc andrewvc mentioned this pull request Sep 15, 2015
@suyograo suyograo mentioned this pull request Mar 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants