Skip to content
This repository was archived by the owner on Dec 23, 2023. It is now read-only.
This repository was archived by the owner on Dec 23, 2023. It is now read-only.

Clock Skew(?) can cause the distruptor thread to crash #2068

@steveniemitz

Description

@steveniemitz

Please answer these questions before submitting a bug report.

What version of OpenCensus are you using?

0.24.0, but looks to be present in all versions

What JVM are you using (java -version)?

1.8.0

Occasionally, at process start, we'll see this error in our logs:

Exception in thread "OpenCensus.Disruptor-0" java.lang.RuntimeException: java.lang.IllegalArgumentException: Current time must be within or after the last bucket.
	at com.lmax.disruptor.FatalExceptionHandler.handleEventException(FatalExceptionHandler.java:45)
	at com.lmax.disruptor.dsl.ExceptionHandlerWrapper.handleEventException(ExceptionHandlerWrapper.java:18)
	at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:187)
	at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Current time must be within or after the last bucket.
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
	at io.opencensus.implcore.stats.MutableViewData$IntervalMutableViewData.refreshBucketList(MutableViewData.java:308)
	at io.opencensus.implcore.stats.MutableViewData$IntervalMutableViewData.record(MutableViewData.java:262)
	at io.opencensus.implcore.stats.MeasureToViewMap.record(MeasureToViewMap.java:160)
	at io.opencensus.implcore.stats.StatsManager$StatsEvent.process(StatsManager.java:101)
	at io.opencensus.impl.internal.DisruptorEventQueue$DisruptorEventHandler.onEvent(DisruptorEventQueue.java:229)
	at io.opencensus.impl.internal.DisruptorEventQueue$DisruptorEventHandler.onEvent(DisruptorEventQueue.java:222)
	at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168)
	... 2 more

If this happens, all publisher threads eventually will hang forever, since there is no longer a consumer of the LMAX Disruptor queue. It seems like the cause of this is clock skew causing measurements to occur before the bucket start, and there is even a TODO in the code to handle this:
https://github.com/census-instrumentation/opencensus-java/blob/master/impl_core/src/main/java/io/opencensus/implcore/stats/MutableViewData.java#L307

I'm not sure what the correct behavior should be when the measurement timestamp is before the bucket start though, drop the event possibly?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions