Conversation
|
This provides a nearly 50% speedup on a test dataset. |
lib/logstash/filters/geoip.rb
Outdated
There was a problem hiding this comment.
Instead of an ivar we could use an accessor method?
There was a problem hiding this comment.
Do we want to change that strategy where "3rd party" gems are actually required in the register method instead of in the module top? we all agree & understand that this is an anti-pattern and we should avoid requiring elsewhere than at module top (there are totally acceptable exceptions) but in our case, the purpose for this, IIRC was to actually requiring all 3rd party gems unless the plugins were actually specified in a config. Maybe this is not a concern anymore, not sure.
There was a problem hiding this comment.
I believe this is an anti-pattern too and we should move all the require at the top of the file and stop doing lazy require of library. There is no real gain in lazy mode, since the plugins need to be installed and loaded via Plugin#lookup to actually load the required files, this also help us track errors more easily.
There was a problem hiding this comment.
Nonetheless I'd like to make sure we understand why this was previously needed and why it is not now, not just "guessing" it should be ok now and change a construct that was purposefully valid at some point. It is certainly the case that it is not required now but I'd like that to not be guess work.
There was a problem hiding this comment.
Sorry, I have a habit of fixing little things as I go. I'm fairly certain this was just 'the style at the time' (I used to wear an onion on my belt), so I don't think its dangerous.
There was a problem hiding this comment.
I'm fairly certain it was not the style at the time and it was purposely there to avoid mass requiring all 3rd party gems when all the plugins were installed but not configured or something around these lines.
There was a problem hiding this comment.
STORY TIME!
The reason Logstash historically put require "some third party thing" in the register method was because .... computers.
The libraries we used in Logstash (JRuby) were not always available or installed for MRI, and at the time, I used MRI to generate the docs, and because the docs generator calls eval on the code 😃 those top-of-the-file require calls would fail because the gems weren't available. Why MRI for docs building (at the time)? Because, way back when, the only "good" markdown libraries for Ruby only worked in MRI. Uuuuugh computers.
So that's the story. require in register because of markdown and mri and jruby.
Today? I don't think these requirements are necessarily true anymore, though it would be nice to be able to build the docs without needing the runtime dependencies, we can probably hack around that as needed.
+1 on, long term, moving requires to the top of the file.
There was a problem hiding this comment.
awwwwwwww thanks for the background on this.
|
These LRU Cache additions are awesome, and very straightforward. 👍 |
|
I am OK with this PR! Test run! I have looked at the geoip code, there is an option to load the entire dataset into memory, but I think the LRU approach is more memory friendly and will provide an awesome speed boost! |
|
LGTM! |
|
Indeed. Compressed, the entire dataset is around 20MB, I think. Loading that into memory could be a bit much. |
|
Did we measure the uncompressed db memory size? 20MB is peanuts and ~100MB would still be peanuts. it could be configurable? LRU lookups should always be faster than even an in-memory db since there is no parsing to do but a combination or LRU & in-memory could be an extreme combo! |
» ls -lh GeoLiteCity-2013-01-18.dat
-rw-r--r-- 1 buh buh 18M Sep 15 10:49 GeoLiteCity-2013-01-18.datNot sure what it's compressed with, but there it is. |
|
created this #41 for investigating preload of the database. |
No description provided.