Add tool elasticsearch-node unsafe-bootstrap#37696
Add tool elasticsearch-node unsafe-bootstrap#37696andrershov merged 16 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-distributed |
|
run elasticsearch-ci/2 |
|
run elasticsearch-ci/2 |
DaveCTurner
left a comment
There was a problem hiding this comment.
I left mostly minor suggestions, but we should discuss how to deal with node ordinals >0 as I don't think we should just silently ignore them.
I will look in more depth at the docs later.
| } | ||
| } | ||
|
|
||
| public void test110RepairIndexCliPackaging() { |
There was a problem hiding this comment.
RepairIndex? I see that this was copied from the test above, which also shouldn't say Repair.
There was a problem hiding this comment.
Oops, I've changed the names of both tests in this commit e2abf3f
server/src/main/java/org/elasticsearch/cluster/coordination/NodeToolCli.java
Outdated
Show resolved
Hide resolved
| "\n" + | ||
| " WARNING: Elasticsearch MUST be stopped before running this tool.\n" + | ||
| "\n" + | ||
| "You should run this tool only if you've lost the majority of master eligible nodes.\n" + |
There was a problem hiding this comment.
Suggest:
You should run this tool only if you have permanently lost half or more of the master-eligible nodes, and you cannot restore the cluster from a snapshot. This tool can result in arbitrary data loss and should be the last resort.
I'm suggesting to avoid saying "lost a majority" because you also need this tool if you have an even number of nodes and you've lost half of them, which isn't strictly a majority.
Also could you explain how it can render the cluster "completely non-functional"? I mean I see how it can break indices, but it should allow a cluster to form and any broken indices can be deleted.
There was a problem hiding this comment.
0894fe4
I could not give you an example, I just wanted for this warning to sound as scary as possible.
| ClusterService.USER_DEFINED_META_DATA.getConcreteSetting("cluster.metadata.unsafe-bootstrap"); | ||
|
|
||
| UnsafeBootstrapMasterCommand() { | ||
| super("Unsafely bootstraps the master node if the majority of master eligible nodes is lost"); |
There was a problem hiding this comment.
Again, not strictly a majority. Maybe we should replace "bootstraps" with "forces the successful election of"?
server/src/main/java/org/elasticsearch/cluster/coordination/UnsafeBootstrapMasterCommand.java
Show resolved
Hide resolved
| try { | ||
| terminal.println(Terminal.Verbosity.VERBOSE, "Writing new global metadata to disk"); | ||
| long newGeneration = MetaData.FORMAT.write(newMetaData, dataPaths); | ||
| Manifest newManifest = new Manifest(manifest.getCurrentTerm(), manifest.getClusterStateVersion(), newGeneration, |
There was a problem hiding this comment.
I think we should increment the term, and log that we've done so.
There was a problem hiding this comment.
I've implemented increment in this commit 0e50913. For now, I'm logging new currentTerm to the terminal, will get back to it - once I figure out how our command line tools work with elasticsearch logs.
| @Override | ||
| protected void execute(Terminal terminal, OptionSet options, Environment env) throws Exception { | ||
| // We don't use classical "private static final Logger", because log4j2 initialization happens | ||
| // after UnsafeBootstrapMasterCommand instance creation. |
There was a problem hiding this comment.
Hmm, we make a private static final Logger logger in RemoveCorruptedShardDataCommand, does that not work?
Also, does this logger have the same configuration as loggers in Elasticsearch proper, i.e., it writes to the Elasticsearch log by default? If so, I think we should log more information about this tool having been run in that log.
There was a problem hiding this comment.
Ok, I have figured out how the logging works:
- Most of the commands, for example,
LoggingAwareCommandare passingCommandLoggingConfigurator::configureLoggingWithoutConfigcallback toCommandconstructor asbeforeMainargument. This is the function that will be executed whenCommand.mainis called. - From CommandLoggingConfigurator::configureLoggingWithoutConfig javadoc
Configure logging without reading a log4j2.properties file, effectively configuring the status logger and all loggers to the console.Basically it means that logs won't be appended to elasticsearch log files. - If you run
ShardToolCliindeed you getERROR StatusLogger No Log4j 2 configuration file found. Using default configuration. The reason for this is because logger initialization happens when log4j2 classes are loaded. Java Runtime loads classes when it first encounters this class, in case ofShardToolCliit will loadRemoveCorruptedShardDataCommandand transitivelyLoggerclass, because there is static reference toLoggerclass inRemoveCorruptedShardDataCommand. Since pre-main is not yet executed, logging is not configured. - It turned out that
NodeToolClistill suffers from the same warning, because there isClusterService(which in turn staticly referencesLogger) static reference inUnsafeBootstrapMasterCommand. The reason why I've not noticied it earlier, because I've added this setting last moment, after I've checked how binaries are working. - For now I've made the following fix e23f59a
- I suggest to revisit how logging works for commands in general in the follow-up PR. It seems that
beforeMaincould be completely removed and replaced by logging initialization in the command constructor. But this change touches a lot of file and I prefer separate PR.
| internalCluster().startNode( | ||
| Settings.builder() | ||
| .put(ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), Integer.MAX_VALUE) | ||
| .put(DiscoverySettings.INITIAL_STATE_TIMEOUT_SETTING.getKey(), "2s") //to ensure quick node startup |
server/src/test/java/org/elasticsearch/cluster/coordination/UnsafeBootstrapMasterIT.java
Show resolved
Hide resolved
| Settings lastNodeSettings = allNodesSettings.get(allNodesSettings.size() - 1); | ||
| List<Settings> newSettings = new ArrayList<>(); | ||
| newSettings.addAll(otherNodesSettings); | ||
| newSettings.add(Settings.builder().put(lastNodeSettings) |
There was a problem hiding this comment.
Could we bootstrap a random node rather than it always being the last of them?
Co-Authored-By: andrershov <andrershov@gmail.com>
|
@DaveCTurner thanks for your review! I've addressed all the comments except |
DaveCTurner
left a comment
There was a problem hiding this comment.
Ok, looks good apart from the removal of the docs that we discussed.
server/src/test/java/org/elasticsearch/cluster/coordination/UnsafeBootstrapMasterIT.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/UnsafeBootstrapMasterCommand.java
Show resolved
Hide resolved
| @@ -0,0 +1,119 @@ | |||
| [[node-tool]] | |||
| == elasticsearch-node | |||
There was a problem hiding this comment.
We decided to move the docs changes to a separate PR - #37812, so this should go.
|
run elasticsearch-ci/1 |
|
run elasticsearch-ci/1 |
This commit, mostly authored by @DaveCTurner, adds documentation for elasticsearch-node tool #37696.
This commit, mostly authored by @DaveCTurner, adds documentation for elasticsearch-node tool #37696. (cherry picked from commit 09425d5)
This commit, mostly authored by @DaveCTurner, adds documentation for elasticsearch-node tool #37696. (cherry picked from commit 09425d5)
This tool could be used if the majority of master-eligible nodes is lost.
Docs can be found in #37812