Add CRUD doc to the DistributedArchitectureGuide by inespot · Pull Request #144710 · elastic/elasticsearch

inespot · 2026-03-22T14:01:45Z

Documents the full write path under the Indexing / CRUD subsection (coordinator to primary to replication), plus seq_no / primary term and checkpoint behavior (including out-of-order replication and MSU vs LCP).

github-actions · 2026-03-22T14:03:34Z

🔍 Preview links for changed docs

docs/internal/DistributedArchitectureGuide.md

github-actions · 2026-03-22T14:03:35Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

inespot · 2026-03-25T14:45:56Z

I left out Lucene and locking details here since they’re covered (or slated) in other parts of the doc. But happy to reconsider and add short subsection in this section as a FLUP if preferred.

elasticsearchmachine · 2026-03-25T14:46:46Z

Pinging @elastic/es-distributed (Team:Distributed)

szybia · 2026-03-25T15:29:02Z

docs/internal/DistributedArchitectureGuide.md

+[routedBasedOnClusterVersion](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L1021)
+field. Each forward updates it to the forwarding node’s cluster state version so the next receiver
+[is on at least](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L898)
+that version before it redirects again.


learning: if i read code correctly, it will then retry once again on next cluster update, what happens if that fails? 🤔 fail but then what happens after

not sure whether worth adding info in here

If I am not mistaken, the retries are bounded by the request's timeout. So on each retry, the node will check whether or not the timeout has expired and fail the request if so.

Can add a note about this in the doc! 196dc4

szybia · 2026-03-25T15:31:09Z

docs/internal/DistributedArchitectureGuide.md

+The primary will first try to
+[acquire](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L484)
+an operation permit via [IndexShardOperationPermits]. Note that recovery and relocation are operations that can 
+intentionally grab all available permits with the goal of blocking in-flight writes. If the permit is successfully 


learning: what happens if we can't grab a permit 🤔

We delay the operation until permits are again available in "expected" situations (e.g. recovery or relocation holding all permits) or throw a scary error in unexpected ones, which then fails the request.
Happy to add a note about the delayed case!

Done in 196dc4

szybia · 2026-03-25T15:32:27Z

docs/internal/DistributedArchitectureGuide.md

+
+The Distributed team owns Elasticsearch's write path. A typical write begins as an HTTP/REST request, is routed to
+the appropriate primary shard, is applied to the shard's engine and translog, and is finally replicated across replicas
+before an ack is sent back to the client.


not sure whether worth documenting stateless behavior here 🤔

public knowledge AFAIK and might be handy, up to you

Suggested change

before an ack is sent back to the client.

the appropriate primary shard, is applied to the shard's engine and translog, and is finally replicated across replicas

before an ack is sent back to the client.

On stateless ack is sent after operation is added to translog and uploaded to blob storage, no replication needed for durability.

Stateless probably deserves its own dedicated section (which I was actually planning on tackling next 😇), but happy to add a note here in the meantime too!

leaving it with yourself, looks great as is! :)

Added a note in 196dc4

szybia · 2026-03-25T15:33:13Z

these are so useful for learning btw, ty!

docs/internal/DistributedArchitectureGuide.md

fcofdez

Looks good! I left a few comments

fcofdez · 2026-03-27T11:12:47Z

docs/internal/DistributedArchitectureGuide.md

+before an ack is sent back to the client.
+
+Note that in serverless Elasticsearch, durability is instead provided by writing to the translog and uploading to blob 
+storage. There is no replication process. See this 


I would say that we rely on the blob store properties to achieve the same reliability and fault tolerance.

Added in 1a8073

fcofdez · 2026-03-27T11:15:03Z

docs/internal/DistributedArchitectureGuide.md

+
+### Coordinator: REST to Transport
+
+A user sends a `PUT` or `POST` to `/{index}/_doc/{id}` (or `POST /{index}/_doc` with no id for auto-generated ids) to


I think that it's more interesting to describe the bulk API and maybe describe a bit the ingest pipelines and all the process that happens on the coordinating layer?

And maybe add a link to https://www.elastic.co/docs/deploy-manage/distributed-architecture/reading-and-writing-documents?

Done in 61c853

fcofdez · 2026-03-27T11:18:23Z

docs/internal/DistributedArchitectureGuide.md

+If the client did not supply `_id`, one is
+[generated](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/bulk/BulkOperation.java#L333)
+before routing. The latest applied [ClusterState](#cluster-state) supplies routing for the chosen index (`routingTable`,
+[ShardRouting](#cluster-state)). A routing key is chosen (by default the document `_id` but the request may


Maybe we should mention that the default routing key is computed differently depending on the index type.

Done in 1a8073 but let me know if there are any additional details you want to add

fcofdez · 2026-03-27T11:20:40Z

docs/internal/DistributedArchitectureGuide.md

+request to the engine (typically [InternalEngine], see [Engine](#engine) section for more details) that
+[generates](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java#L1239)
+the sequence number for the operation, [updates](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java#L1262)
+Lucene via an `IndexWriter`, and then


maybe we can mention why we append to the translog after indexing into Lucene?

Added a tentative note in 1a8073, but please let me know if that does capture all the relevant reasons!

fcofdez · 2026-03-27T11:26:57Z

docs/internal/DistributedArchitectureGuide.md

+
+The primary term is a monotonically increasing counter per shard, recorded in cluster state metadata (see
+[IndexMetadata#primaryTerm](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java#L1199)).
+It advances when a primary is (re)selected, e.g. after a full restart or when a replica is promoted. The shard stamps


when a replica is promoted can be a bit misleading as that happens during a regular relocation and in that case we don't bump the primary term. Maybe mention somehow that it's bumped after a primary crash?

Done in 1a8073

fcofdez · 2026-03-27T11:29:45Z

docs/internal/DistributedArchitectureGuide.md

+and each [ReplicaResponse](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L1284)
+returns the replica’s local checkpoint and last synced global checkpoint).
+
+Note that the `InternalEngine` also tracks the


maybe we should mention that when _ids are auto generated we don't need to check for duplicates under certain circumstances too?

Added a note about it in 61c853

fcofdez · 2026-03-27T11:30:43Z

docs/internal/DistributedArchitectureGuide.md

+which is the sequence number up to which all in-sync copies have
+processed every earlier operation. The [ReplicationTracker] computes it as the minimum of those per-copy local checkpoints among in-sync shard copies.
+The primary [updates](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java#L1375)
+the global checkpoint whenever an in-sync copy’s local checkpoint advances, when the primary activates or takes over


maybe we can mention GlobalCheckpointSyncAction too?

Done in 1a8073

michalborek

LGTM

fcofdez

LGTM. Thanks for adding this!

…rics * upstream/main: (21 commits) Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.topSnippetsFunction} elastic#145353 Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.scoreFunction} elastic#145352 [DiskBBQ] Fix bug in NeighborQueue#popRawAndAddRaw (elastic#145324) Fix dense_vector default index options when using BFLOAT16 (elastic#145202) Use checked exceptions in entitlement constructor rules (elastic#145234) ESQL: DS: datasource file plugins should not return TEXT types (elastic#145334) Plumb DLM error store through to DlmFrozenTransition classes (elastic#145243) Make Settings.Builder.remove() fluent (elastic#145294) Add FLS tests for METRICS_INFO and TS_INFO (elastic#145211) Fix flaky SecurityFeatureResetTests (elastic#145063) [DOCS] Fix conflict markers in ESQL processing command list (elastic#145338) Skip certain metric assertions on Windows (elastic#144933) [ES|QL] Add schema reconciliation for multi-file external sources (elastic#145220) Simplify DiskBBQ dynamic visit ratio to linear (elastic#142784) ESQL: Disallow unmapped_fields=load with partial non-KEYWORD (elastic#144109) [Transform] Track Linked Projects (elastic#144399) Fix bulk scoring to process last batch instead of falling through to scalar tail (elastic#145316) Clean up TickerScheduleEngineTests (elastic#145303) [CI] ShardBulkInferenceActionFilterIT testRestart - Ensuring that secrets-inference index is available after full restart and unmuting test (elastic#145317) Add CRUD doc to the DistributedArchitectureGuide (elastic#144710) ...

* Add CRUD doc to the DistributedArchitectureGuide * Primary routing and execution * Nit * Fill out Replication * Primary Terms & Sequence Numbers * Fill out the checkpoint and gaps section * Style nits * Nits and typos * Retries, failures and serverless notes * Refer to blog post instead of serverless section * Clarify result returned to transport caller * Some clarifications and notes * Bulk request + auto generated id optimization * Typo * Fix internal marker * Small inaccuracy

Add CRUD doc to the DistributedArchitectureGuide

b03a866

elasticsearchmachine added the v9.4.0 label Mar 22, 2026

inespot and others added 9 commits March 22, 2026 23:44

Primary routing and execution

f726311

Nit

1cb757f

Merge branch 'main' into doc/crud

6c2ac63

Fill out Replication

93568c1

Primary Terms & Sequence Numbers

b817b3a

Merge branch 'main' into doc/crud

7754ed7

Fill out the checkpoint and gaps section

c97b55f

Style nits

82ad660

Nits and typos

d0fcfa2

inespot marked this pull request as ready for review March 25, 2026 14:46

inespot added the :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Mar 25, 2026

elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 25, 2026

inespot added documentation >non-issue and removed Team:Distributed Meta label for distributed team. labels Mar 25, 2026

elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 25, 2026

fcofdez self-requested a review March 25, 2026 15:28

szybia reviewed Mar 25, 2026

View reviewed changes

inespot and others added 3 commits March 25, 2026 12:52

Retries, failures and serverless notes

196dc47

Refer to blog post instead of serverless section

1301e14

Merge branch 'main' into doc/crud

57ab4a2

michalborek reviewed Mar 26, 2026

View reviewed changes

docs/internal/DistributedArchitectureGuide.md Outdated Show resolved Hide resolved

Clarify result returned to transport caller

e7de358

fcofdez reviewed Mar 27, 2026

View reviewed changes

inespot and others added 6 commits March 27, 2026 12:00

Some clarifications and notes

1a8073b

Bulk request + auto generated id optimization

61c8530

Typo

a7e0af5

Fix internal marker

5043f49

Merge branch 'main' into doc/crud

6202881

Small inaccuracy

91c96a8

inespot requested review from fcofdez, michalborek and szybia March 27, 2026 21:49

michalborek approved these changes Mar 30, 2026

View reviewed changes

fcofdez approved these changes Mar 31, 2026

View reviewed changes

Merge branch 'main' into doc/crud

b0b2bab

inespot merged commit 1b38b00 into elastic:main Mar 31, 2026
12 checks passed


		### Coordinator: REST to Transport

		A user sends a `PUT` or `POST` to `/{index}/_doc/{id}` (or `POST /{index}/_doc` with no id for auto-generated ids) to

Conversation

inespot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions bot commented Mar 22, 2026

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

Uh oh!

inespot commented Mar 25, 2026

Uh oh!

elasticsearchmachine commented Mar 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

inespot Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szybia commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fcofdez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michalborek left a comment

Choose a reason for hiding this comment

Uh oh!

inespot commented Mar 22, 2026 •

edited

Loading

github-actions bot commented Mar 22, 2026 •

edited

Loading

inespot Mar 25, 2026 •

edited

Loading

szybia commented Mar 25, 2026 •

edited

Loading