Skip to content

Time series: concurrent single-row SQL INSERTs silently lose samples (sealed-slot lost update) #4453

@robfrank

Description

@robfrank

Summary

Under many concurrent single-row INSERT INTO <timeseriesType> SET ts=…, <tag>=…, <field>=… statements (each in its own transaction, as happens over the HTTP/wire protocols), the time series engine silently drops samples. The client receives success on every insert (HTTP 200, no exception, no ConcurrentModificationException), yet the stored row count comes up short. Confirmed real data loss: a full table scan returns the same short count as count(*), and a GROUP BY sensor_id shows specific series missing rows.

The batched InfluxDB line-protocol path (POST /api/v1/ts/{db}/write, which appends many samples in one engine call/transaction) is not affected. Single-threaded inserts are not affected.

Impact

Silent, unacknowledged data loss on the time series write path under concurrency. Severity is high because there is no error surfaced to the client — an application doing concurrent per-row ingestion over SQL (HTTP or gRPC) loses samples without any signal.

Environment

  • Observed on 26.6.1-SNAPSHOT (== the arcadedata/arcadedb:latest image at time of writing).
  • Reproduced both embedded (in-process engine) and over HTTP against the Docker image.

Reproduction

A. Embedded (fast, deterministic at high concurrency)

A disabled regression test is included in PR #4450:
engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesConcurrentInsertTest.java

It creates a TIMESERIES TYPE … SHARDS 1, runs 48 threads × 5000 single-row INSERTs (each in its own database.transaction(...)), then asserts count(*) == 48*5000. Result: e.g. 239911 / 240000 — 89 samples lost, with no exception and no failed transaction.

Enable it by removing the @Disabled annotation.

B. Over HTTP (against the published image)

CREATE TIMESERIES TYPE sensor TIMESTAMP ts TAGS (sid STRING) FIELDS (v DOUBLE) SHARDS 1

Then 16 threads, each POSTing 5000 distinct single-row inserts to POST /api/v1/command/{db}:

{"language":"sql","command":"INSERT INTO sensor SET ts=<unique>, sid='s<thread>', v=<i>"}

SELECT count(*) FROM sensor returns short by 10–22 out of 80000 across runs.

Observations on the trigger

  • Concurrency level (thread count) is the driver, not total row count. 5 threads often look clean; 16–48 threads reliably lose.
  • SHARDS 1 maximizes it (all appends contend on one shard).
  • Single-threaded never loses.
  • The unfiltered SELECT count(*) reflects the true (short) count; a full row scan agrees with it (so it is real loss, not a count(*) artifact).

Root cause analysis

The write position for a sample is a read-modify-write of the per-data-page sample count. In TimeSeriesBucket.appendSamples (engine/.../engine/timeseries/TimeSeriesBucket.java):

final int sampleCountInPage = dataPage.readShort(DATA_SAMPLE_COUNT_OFFSET) & 0xFFFF; // read slot
final int rowOffset = DATA_ROWS_OFFSET + sampleCountInPage * rowSize;
dataPage.writeLong(rowOffset, timestamps[i]);                                         // write row at slot
...
dataPage.writeShort(DATA_SAMPLE_COUNT_OFFSET, (short) (sampleCountInPage + 1));        // advance slot

TimeSeriesShard.appendSamples serializes appends per shard with a appendLock around a nested db.begin()/append/db.commit() on getWrappedDatabaseInstance() (and the comment there notes the lock exists specifically to avoid a commit-time ConcurrentModificationException on page 0). The append commit is nested inside the enclosing request transaction (LocalDatabase.begin() pushes a nested TransactionContext when a transaction is already active), and the durable publish is effectively deferred to the outer transaction commit (intentionally, so the bucket page writes ship to followers via the parent Raft WAL TX_ENTRY — the code references issue #4382). That outer commit happens outside appendLock.

A temporary collision detector added to appendSamples (recording fileId:pageNumber:slot per append) confirmed the mechanism: a following serialized append reads a stale sampleCountInPage — the prior append's count → N+1 increment is not visible — so two appends compute the same rowOffset and write the same slot N; the later commit overwrites the earlier row. The page version advances normally, and PageManager.updatePageVersion's version check does not raise a conflict for these commits, so the existing outer-transaction retries never kick in.

Sample collision-detector output (48-thread embedded run):

DIAG COLLISION slot=1:1:0 pageVer=239 prevThread=74 curThread=88
DIAG COLLISION slot=1:1:1 pageVer=240 prevThread=68 curThread=87
DIAG COLLISION slot=1:1:2 pageVer=241 prevThread=42 curThread=68
...

(slot reused across threads; page version high and advancing while the slot/count read is stale.)

Why the batched path is safe

TimeSeriesEngine.appendBatch groups all of a shard's samples into a single transaction per shard, so there is never a second concurrent transaction reading a stale slot on the same page.

Relevant code

  • engine/.../query/sql/executor/SaveElementStep.javasaveToTimeSeries routes a SQL INSERT into engine.appendSamples.
  • engine/.../engine/timeseries/TimeSeriesEngine.javaappendSamples (per-row) vs appendBatch (line protocol).
  • engine/.../engine/timeseries/TimeSeriesShard.javaappendSamples (appendLock + nested begin/commit on the wrapped instance).
  • engine/.../engine/timeseries/TimeSeriesBucket.javagetOrCreateActiveDataPage + appendSamples (the DATA_SAMPLE_COUNT_OFFSET slot RMW).
  • engine/.../engine/PageManager.javaupdatePageVersion (the version conflict check that is not catching this).

Suggested direction (for discussion)

The core tension is that appendLock serializes the in-memory append but the durable publish is deferred to the enclosing transaction's commit (kept that way so writes ship via the parent Raft WAL). Any fix needs to make the slot read-modify-write atomic with respect to the actual publish without breaking HA replication — e.g. ensuring the per-shard append publishes/serializes the slot under the lock, or ensuring the concurrent same-page commits raise a conflict so the existing outer-transaction retry path heals them. This is a delicate change to the transaction/HA path and warrants careful design plus HA + performance verification.

Notes

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions