Add CRUD doc to the DistributedArchitectureGuide#144710
Add CRUD doc to the DistributedArchitectureGuide#144710inespot merged 21 commits intoelastic:mainfrom
Conversation
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
|
I left out Lucene and locking details here since they’re covered (or slated) in other parts of the doc. But happy to reconsider and add short subsection in this section as a FLUP if preferred. |
|
Pinging @elastic/es-distributed (Team:Distributed) |
| [routedBasedOnClusterVersion](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L1021) | ||
| field. Each forward updates it to the forwarding node’s cluster state version so the next receiver | ||
| [is on at least](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L898) | ||
| that version before it redirects again. |
There was a problem hiding this comment.
learning: if i read code correctly, it will then retry once again on next cluster update, what happens if that fails? 🤔 fail but then what happens after
not sure whether worth adding info in here
There was a problem hiding this comment.
If I am not mistaken, the retries are bounded by the request's timeout. So on each retry, the node will check whether or not the timeout has expired and fail the request if so.
There was a problem hiding this comment.
Can add a note about this in the doc! 196dc4
| The primary will first try to | ||
| [acquire](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L484) | ||
| an operation permit via [IndexShardOperationPermits]. Note that recovery and relocation are operations that can | ||
| intentionally grab all available permits with the goal of blocking in-flight writes. If the permit is successfully |
There was a problem hiding this comment.
learning: what happens if we can't grab a permit 🤔
|
|
||
| The Distributed team owns Elasticsearch's write path. A typical write begins as an HTTP/REST request, is routed to | ||
| the appropriate primary shard, is applied to the shard's engine and translog, and is finally replicated across replicas | ||
| before an ack is sent back to the client. |
There was a problem hiding this comment.
not sure whether worth documenting stateless behavior here 🤔
public knowledge AFAIK and might be handy, up to you
| before an ack is sent back to the client. | |
| the appropriate primary shard, is applied to the shard's engine and translog, and is finally replicated across replicas | |
| before an ack is sent back to the client. | |
| On stateless ack is sent after operation is added to translog and uploaded to blob storage, no replication needed for durability. |
There was a problem hiding this comment.
Stateless probably deserves its own dedicated section (which I was actually planning on tackling next 😇), but happy to add a note here in the meantime too!
There was a problem hiding this comment.
leaving it with yourself, looks great as is! :)
|
these are so useful for learning btw, ty! |
fcofdez
left a comment
There was a problem hiding this comment.
Looks good! I left a few comments
| before an ack is sent back to the client. | ||
|
|
||
| Note that in serverless Elasticsearch, durability is instead provided by writing to the translog and uploading to blob | ||
| storage. There is no replication process. See this |
There was a problem hiding this comment.
I would say that we rely on the blob store properties to achieve the same reliability and fault tolerance.
|
|
||
| ### Coordinator: REST to Transport | ||
|
|
||
| A user sends a `PUT` or `POST` to `/{index}/_doc/{id}` (or `POST /{index}/_doc` with no id for auto-generated ids) to |
There was a problem hiding this comment.
I think that it's more interesting to describe the bulk API and maybe describe a bit the ingest pipelines and all the process that happens on the coordinating layer?
There was a problem hiding this comment.
And maybe add a link to https://www.elastic.co/docs/deploy-manage/distributed-architecture/reading-and-writing-documents?
| If the client did not supply `_id`, one is | ||
| [generated](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/bulk/BulkOperation.java#L333) | ||
| before routing. The latest applied [ClusterState](#cluster-state) supplies routing for the chosen index (`routingTable`, | ||
| [ShardRouting](#cluster-state)). A routing key is chosen (by default the document `_id` but the request may |
There was a problem hiding this comment.
Maybe we should mention that the default routing key is computed differently depending on the index type.
There was a problem hiding this comment.
Done in 1a8073 but let me know if there are any additional details you want to add
| request to the engine (typically [InternalEngine], see [Engine](#engine) section for more details) that | ||
| [generates](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java#L1239) | ||
| the sequence number for the operation, [updates](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java#L1262) | ||
| Lucene via an `IndexWriter`, and then |
There was a problem hiding this comment.
maybe we can mention why we append to the translog after indexing into Lucene?
There was a problem hiding this comment.
Added a tentative note in 1a8073, but please let me know if that does capture all the relevant reasons!
|
|
||
| The primary term is a monotonically increasing counter per shard, recorded in cluster state metadata (see | ||
| [IndexMetadata#primaryTerm](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java#L1199)). | ||
| It advances when a primary is (re)selected, e.g. after a full restart or when a replica is promoted. The shard stamps |
There was a problem hiding this comment.
when a replica is promoted can be a bit misleading as that happens during a regular relocation and in that case we don't bump the primary term. Maybe mention somehow that it's bumped after a primary crash?
| and each [ReplicaResponse](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L1284) | ||
| returns the replica’s local checkpoint and last synced global checkpoint). | ||
|
|
||
| Note that the `InternalEngine` also tracks the |
There was a problem hiding this comment.
maybe we should mention that when _ids are auto generated we don't need to check for duplicates under certain circumstances too?
| which is the sequence number up to which all in-sync copies have | ||
| processed every earlier operation. The [ReplicationTracker] computes it as the minimum of those per-copy local checkpoints among in-sync shard copies. | ||
| The primary [updates](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java#L1375) | ||
| the global checkpoint whenever an in-sync copy’s local checkpoint advances, when the primary activates or takes over |
There was a problem hiding this comment.
maybe we can mention GlobalCheckpointSyncAction too?
fcofdez
left a comment
There was a problem hiding this comment.
LGTM. Thanks for adding this!
…rics
* upstream/main: (21 commits)
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.topSnippetsFunction} elastic#145353
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.scoreFunction} elastic#145352
[DiskBBQ] Fix bug in NeighborQueue#popRawAndAddRaw (elastic#145324)
Fix dense_vector default index options when using BFLOAT16 (elastic#145202)
Use checked exceptions in entitlement constructor rules (elastic#145234)
ESQL: DS: datasource file plugins should not return TEXT types (elastic#145334)
Plumb DLM error store through to DlmFrozenTransition classes (elastic#145243)
Make Settings.Builder.remove() fluent (elastic#145294)
Add FLS tests for METRICS_INFO and TS_INFO (elastic#145211)
Fix flaky SecurityFeatureResetTests (elastic#145063)
[DOCS] Fix conflict markers in ESQL processing command list (elastic#145338)
Skip certain metric assertions on Windows (elastic#144933)
[ES|QL] Add schema reconciliation for multi-file external sources (elastic#145220)
Simplify DiskBBQ dynamic visit ratio to linear (elastic#142784)
ESQL: Disallow unmapped_fields=load with partial non-KEYWORD (elastic#144109)
[Transform] Track Linked Projects (elastic#144399)
Fix bulk scoring to process last batch instead of falling through to scalar tail (elastic#145316)
Clean up TickerScheduleEngineTests (elastic#145303)
[CI] ShardBulkInferenceActionFilterIT testRestart - Ensuring that secrets-inference index is available after full restart and unmuting test (elastic#145317)
Add CRUD doc to the DistributedArchitectureGuide (elastic#144710)
...
* Add CRUD doc to the DistributedArchitectureGuide * Primary routing and execution * Nit * Fill out Replication * Primary Terms & Sequence Numbers * Fill out the checkpoint and gaps section * Style nits * Nits and typos * Retries, failures and serverless notes * Refer to blog post instead of serverless section * Clarify result returned to transport caller * Some clarifications and notes * Bulk request + auto generated id optimization * Typo * Fix internal marker * Small inaccuracy
* Add CRUD doc to the DistributedArchitectureGuide * Primary routing and execution * Nit * Fill out Replication * Primary Terms & Sequence Numbers * Fill out the checkpoint and gaps section * Style nits * Nits and typos * Retries, failures and serverless notes * Refer to blog post instead of serverless section * Clarify result returned to transport caller * Some clarifications and notes * Bulk request + auto generated id optimization * Typo * Fix internal marker * Small inaccuracy
Documents the full write path under the Indexing / CRUD subsection (coordinator to primary to replication), plus seq_no / primary term and checkpoint behavior (including out-of-order replication and MSU vs LCP).