Skip to content

better year handling when format does not contain year#4

Merged
jsvd merged 2 commits intologstash-plugins:masterfrom
jsvd:new_year_bug
Nov 4, 2015
Merged

better year handling when format does not contain year#4
jsvd merged 2 commits intologstash-plugins:masterfrom
jsvd:new_year_bug

Conversation

@jsvd
Copy link
Copy Markdown
Member

@jsvd jsvd commented Jan 6, 2015

if the time format does not have year then:

  • use local year if month is the same as current one
  • use previous year if current month is January but event is December
  • use next year if current month is December but event is January
  • use local year otherwise

fixes #3

has very little impact on formats that have year.
For year-less formats there's extra allocation overhead per event (Time.now, time parsing is no longer direct to long, has to go through 2 DateTime objects)

Performance impact for this change was not analyzed.

@jsvd jsvd added the bug label Jan 6, 2015
@jordansissel
Copy link
Copy Markdown
Contributor

Eyeballed the code and tests. Looks ok!

I have not tested and I have also not done perf analysis.

@suyograo
Copy link
Copy Markdown
Contributor

@purbon how can we use the new perf test suite to analyze the implications of this PR? this is a good candidate to validate our perf platform

@purbon purbon assigned purbon and unassigned jsvd Jul 27, 2015
@purbon
Copy link
Copy Markdown

purbon commented Jul 27, 2015

@suyograo makes complete sense, will setup an example test case for this.

@purbon
Copy link
Copy Markdown

purbon commented Jul 27, 2015

@jsvd I think you might need to rebase this PR 😸

@jsvd
Copy link
Copy Markdown
Member Author

jsvd commented Jul 28, 2015

@purbon done 👍

@purbon
Copy link
Copy Markdown

purbon commented Jul 28, 2015

cool, will run this throw the performance gem, kinda want to see how it looks there 😺

@purbon
Copy link
Copy Markdown

purbon commented Aug 7, 2015

@jsvd running test with this PR make this output in my machine:


Failures:

  1) LogStash::Filters::Date parsing with java SimpleDateFormat syntax "{"mydate":"Nov 24 01:29:01 -0800"}" when processed
     Failure/Error: insist { subject["@timestamp"].time } == Time.iso8601(output).utc
     Insist::Failure:
       Expected 2015-11-24 09:29:01 UTC, but got 2015-08-07 12:51:30 UTC
     # ./spec/filters/date_spec.rb:96:in `(root)'

  2) LogStash::Filters::Date fill last year if december events arrive in january "Dec 31 23:59:00" when processed
     Failure/Error: insist { subject["@timestamp"].year } == 2013
       (<Time (class)>).now(no args)
           expected: 2 times with any arguments
           received: 3 times
     # ./lib/logstash/filters/date.rb:262:in `filter'
     # ./lib/logstash/filters/date.rb:232:in `filter'
     # ./lib/logstash/filters/date.rb:225:in `filter'
     # (eval):33:in `filter_func'
     # ./spec/filters/date_spec.rb:388:in `(root)'
     # ./spec/filters/date_spec.rb:388:in `(root)'

  3) LogStash::Filters::Date fill next year if january events arrive in december "Jan 01 01:00:00" when processed
     Failure/Error: insist { subject["@timestamp"].year } == 2014
       (<Time (class)>).now(no args)
           expected: 2 times with any arguments
           received: 3 times
     # ./lib/logstash/filters/date.rb:262:in `filter'
     # ./lib/logstash/filters/date.rb:232:in `filter'
     # ./lib/logstash/filters/date.rb:225:in `filter'
     # (eval):33:in `filter_func'
     # ./spec/filters/date_spec.rb:405:in `(root)'
     # ./spec/filters/date_spec.rb:405:in `(root)'

Finished in 1.35 seconds (files took 2.18 seconds to load)
50 examples, 3 failures

Failed examples:

I guess there might be something fishy going on here.

  • purbon

@jsvd
Copy link
Copy Markdown
Member Author

jsvd commented Aug 7, 2015

Strange:

% bundle exec rspec
Using Accessor#strict_set for specs
Run options: exclude {:redis=>true, :socket=>true, :performance=>true, :couchdb=>true, :elasticsearch=>true, :elasticsearch_secure=>true, :export_cypher=>true, :integration=>true, :windows=>true}
..................................................

Finished in 1.23 seconds (files took 2.99 seconds to load)
50 examples, 0 failures

Randomized with seed 42455

Can you provide me with a seed for a failing spec run?

@purbon
Copy link
Copy Markdown

purbon commented Aug 7, 2015

Hi,
checked again, this is the seed used.

Randomized with seed 28114

  • purbon

On Fri, Aug 7, 2015 at 3:16 PM João Duarte notifications@github.com wrote:

Strange:

% bundle exec rspec
Using Accessor#strict_set for specs
Run options: exclude {:redis=>true, :socket=>true, :performance=>true, :couchdb=>true, :elasticsearch=>true, :elasticsearch_secure=>true, :export_cypher=>true, :integration=>true, :windows=>true}
..................................................

Finished in 1.23 seconds (files took 2.99 seconds to load)
50 examples, 0 failures

Randomized with seed 42455

Can you provide me with a seed for a failing spec run?


Reply to this email directly or view it on GitHub
#4 (comment)
.

@purbon
Copy link
Copy Markdown

purbon commented Aug 7, 2015

I check with your seed and I get the same error. :-(

  • purbon

On Fri, Aug 7, 2015 at 3:32 PM Pere Urbon-Bayes pere.urbon@elastic.co
wrote:

Hi,
checked again, this is the seed used.

Randomized with seed 28114

  • purbon

On Fri, Aug 7, 2015 at 3:16 PM João Duarte notifications@github.com
wrote:

Strange:

% bundle exec rspec
Using Accessor#strict_set for specs
Run options: exclude {:redis=>true, :socket=>true, :performance=>true, :couchdb=>true, :elasticsearch=>true, :elasticsearch_secure=>true, :export_cypher=>true, :integration=>true, :windows=>true}
..................................................

Finished in 1.23 seconds (files took 2.99 seconds to load)
50 examples, 0 failures

Randomized with seed 42455

Can you provide me with a seed for a failing spec run?


Reply to this email directly or view it on GitHub
#4 (comment)
.

@ph
Copy link
Copy Markdown
Contributor

ph commented Aug 7, 2015

on a clean clone, one test is failing on me.


Failures:

  1) LogStash::Filters::Date fill last year if december events arrive in january "Dec 31 23:59:00" when processed
     Failure/Error: insist { subject["@timestamp"].year } == 2013
     Insist::Failure:
       Expected 2013, but got 2014
     # ./spec/filters/date_spec.rb:388:in `(root)'

Finished in 1.56 seconds (files took 2.48 seconds to load)
50 examples, 1 failure

Failed examples:

rspec ./spec/filters/date_spec.rb:385 # LogStash::Filters::Date fill last year if december events arrive in january "Dec 31 23:59:00" when processed

Randomized with seed 54520

@ph
Copy link
Copy Markdown
Contributor

ph commented Aug 7, 2015

used @purbon's seed I have the same issue

Failures:

  1) LogStash::Filters::Date fill last year if december events arrive in january "Dec 31 23:59:00" when processed
     Failure/Error: insist { subject["@timestamp"].year } == 2013
     Insist::Failure:
       Expected 2013, but got 2014
     # ./spec/filters/date_spec.rb:388:in `(root)'

Finished in 1.69 seconds (files took 2.18 seconds to load)
50 examples, 1 failure

Failed examples:

rspec ./spec/filters/date_spec.rb:385 # LogStash::Filters::Date fill last year if december events arrive in january "Dec 31 23:59:00" when processed

Randomized with seed 28114

/t/logstash-filter-date git:pr/4 ❯❯❯ ruby -v                                                                                                                                                                                                                                                                                                                            ⏎
jruby 1.7.20 (1.9.3p551) 2015-05-04 3086e6a on Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26 +jit [darwin-x86_64]

@purbon
Copy link
Copy Markdown

purbon commented Aug 7, 2015

@ph interesting you see one issue, I see three. Might be we should set another variables in the config to setup the same environment for everyone. I have the feeling this is working different in three completely different machines (PT/DE/Canada).

  • purbon

@ph
Copy link
Copy Markdown
Contributor

ph commented Aug 7, 2015

Yeah, could be related to some locales

@jsvd
Copy link
Copy Markdown
Member Author

jsvd commented Aug 7, 2015

Ok so making sure that we set a timezone in the test plugin instance should make this work. awesome failure !

@purbon
Copy link
Copy Markdown

purbon commented Aug 7, 2015

😺

@jsvd
Copy link
Copy Markdown
Member Author

jsvd commented Aug 7, 2015

@purbon can you check that at least 2 of the 3 errors are gone now?

@purbon
Copy link
Copy Markdown

purbon commented Aug 7, 2015

my execution looks like this now:

purbon-elastic% bundle exec rspec
Using Accessor#strict_set for specs
Run options: exclude {:redis=>true, :socket=>true, :performance=>true, :couchdb=>true, :elasticsearch=>true, :elasticsearch_secure=>true, :export_cypher=>true, :integration=>true, :windows=>true}
................................................

Finished in 1.36 seconds (files took 1.84 seconds to load)
48 examples, 0 failures

Randomized with seed 33520

errors are gone for me 😸

@jsvd
Copy link
Copy Markdown
Member Author

jsvd commented Aug 7, 2015

@purbon can you teach me to run benchmarks on this pr?

@purbon
Copy link
Copy Markdown

purbon commented Aug 7, 2015

yeah!

@purbon
Copy link
Copy Markdown

purbon commented Aug 7, 2015

I was actually going to do it myself, when I encounter this errors ... but is good if I can teach you too 😄

@purbon
Copy link
Copy Markdown

purbon commented Sep 7, 2015

Hi,
super interesting, I was running test here in order to continue doing benchmarking, however I found out that I still see:

Failures:

  1) LogStash::Filters::Date parsing with java SimpleDateFormat syntax "{"mydate":"Nov 24 01:29:01 -0800"}" when processed
     Failure/Error: insist { subject["@timestamp"].time } == Time.iso8601(output).utc
     Insist::Failure:
       Expected 2015-11-24 09:29:01 UTC, but got 2015-09-07 20:11:45 UTC
     # ./spec/filters/date_spec.rb:96:in `(root)'

  2) LogStash::Filters::Date fill next year if january events arrive in december "Jan 01 01:00:00" when processed
     Failure/Error: insist { subject["@timestamp"].year } == 2014
       (<Time (class)>).now(no args)
           expected: 2 times with any arguments
           received: 3 times
     # ./lib/logstash/filters/date.rb:262:in `filter'
     # ./lib/logstash/filters/date.rb:232:in `filter'
     # ./lib/logstash/filters/date.rb:225:in `filter'
     # (eval):33:in `filter_func'
     # ./spec/filters/date_spec.rb:407:in `(root)'
     # ./spec/filters/date_spec.rb:407:in `(root)'

  3) LogStash::Filters::Date fill last year if december events arrive in january "Dec 31 23:59:00" when processed
     Failure/Error: insist { subject["@timestamp"].year } == 2013
       (<Time (class)>).now(no args)
           expected: 2 times with any arguments
           received: 3 times
     # ./lib/logstash/filters/date.rb:262:in `filter'
     # ./lib/logstash/filters/date.rb:232:in `filter'
     # ./lib/logstash/filters/date.rb:225:in `filter'
     # (eval):33:in `filter_func'
     # ./spec/filters/date_spec.rb:389:in `(root)'
     # ./spec/filters/date_spec.rb:389:in `(root)'

Finished in 1.36 seconds (files took 1.92 seconds to load)
50 examples, 3 failures

Failed examples:

rspec ./spec/filters/date_spec.rb:94 # LogStash::Filters::Date parsing with java SimpleDateFormat syntax "{"mydate":"Nov 24 01:29:01 -0800"}" when processed
rspec ./spec/filters/date_spec.rb:404 # LogStash::Filters::Date fill next year if january events arrive in december "Jan 01 01:00:00" when processed
rspec ./spec/filters/date_spec.rb:386 # LogStash::Filters::Date fill last year if december events arrive in january "Dec 31 23:59:00" when processed

Randomized with seed 44127

when running the specs, I am missing something?

/cheers

@jsvd
Copy link
Copy Markdown
Member Author

jsvd commented Sep 8, 2015

hum..my first guess was local timezone, but I can't make it fail on others:

logstash-filter-date (git)-[new_year_bug] % TZ=Europe/Berlin bundle exec rspec --seed 44127
Using Accessor#strict_set for specs
Run options: exclude {:redis=>true, :socket=>true, :performance=>true, :couchdb=>true, :elasticsearch=>true, :elasticsearch_secure=>true, :export_cypher=>true, :integration=>true, :windows=>true}
..................................................

Finished in 1.1 seconds (files took 2.65 seconds to load)
50 examples, 0 failures

Randomized with seed 44127

@purbon
Copy link
Copy Markdown

purbon commented Nov 3, 2015

@jsvd I tested several times and everything works, also used another machine in a different time zone.

@purbon
Copy link
Copy Markdown

purbon commented Nov 3, 2015

LGTM

jsvd added 2 commits November 4, 2015 09:54
if the time format does not have year then:

* use local year if month is the same as current one
* use previous year if current month is January but event is December
* use next year if current month is December but event is January
* use local year otherwise
@jsvd jsvd merged commit cfd8949 into logstash-plugins:master Nov 4, 2015
@jordansissel
Copy link
Copy Markdown
Contributor

date filter version 2.1.0 published with this fix.

# Before
% bin/logstash -e 'input { generator { message => "Jan 01 01:12:23" count => 1 } } filter { date { match => [ "message", "MMM dd HH:mm:ss" ] } }'
{
  "message"    => "Jan 01 01:12:23",
...
  "@timestamp" => "2015-01-01T09:12:23.000Z",
}

# Upgrade
% bin/plugin update logstash-filter-date
Updating logstash-filter-date
Updated logstash-filter-date 2.0.2 to 2.1.0

# After
% bin/logstash -e 'input { generator { message => "Jan 01 01:12:23" count => 1 } } filter { date { match => [ "message", "MMM dd HH:mm:ss" ] } }'
{
  "message"    => "Jan 01 01:12:23",
...
  "@timestamp" => "2016-01-01T09:12:23.000Z",
}

@rtoma
Copy link
Copy Markdown

rtoma commented Jan 12, 2016

This bug bit me on Logstash 1.5.x - we now have ES indices dated 2015 with 2016 data.

A backport to 1.5.x is too late of course, but fixing the most popular version would have made a lot of sense.

@jsvd jsvd deleted the new_year_bug branch January 20, 2016 16:26
talevy added a commit to talevy/elasticsearch that referenced this pull request Jul 6, 2017
This is intended to keep parity of behavior between Logstash and Elasticsearch

here was the original PR: logstash-plugins/logstash-filter-date#4
@tishma
Copy link
Copy Markdown

tishma commented May 28, 2019

I know I'm quite late for this discussion, yet - I can't think of a case where we want result to appear in future unless event is in January, and we hit end-of-year clock skew.
I would consider using previous year as an ultimate default, say, whenever result is more than a few days in the future - otherwise, we end up with future timestamps in case we are indexing October or November in January or February for whatever reason.

@goller goller mentioned this pull request Jan 22, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default year gets stale in running instance during calendar year change

7 participants