Skip to content

Fuzzy edits#1

Merged
s1monw merged 1 commit intos1monw:issues/4082from
clintongormley:issues/4082
Jan 9, 2014
Merged

Fuzzy edits#1
s1monw merged 1 commit intos1monw:issues/4082from
clintongormley:issues/4082

Conversation

@clintongormley
Copy link
Copy Markdown
Collaborator

No description provided.

Improve fuzzy docs
@s1monw s1monw merged this pull request into s1monw:issues/4082 Jan 9, 2014
s1monw pushed a commit that referenced this pull request Jan 17, 2014
@clintongormley clintongormley deleted the issues/4082 branch June 6, 2014 16:02
s1monw pushed a commit that referenced this pull request May 28, 2015
This change adds a new "filter_path" parameter that can be used to filter and reduce the responses returned by the REST API of elasticsearch.

For example, returning only the shards that failed to be optimized:
```
curl -XPOST 'localhost:9200/beer/_optimize?filter_path=_shards.failed'
{"_shards":{"failed":0}}%
```

It supports multiple filters (separated by a comma):
```
curl -XGET 'localhost:9200/_mapping?pretty&filter_path=*.mappings.*.properties.name,*.mappings.*.properties.title'
```

It also supports the YAML response format. Here it returns only the `_id` field of a newly indexed document:
```
curl -XPOST 'localhost:9200/library/book?filter_path=_id' -d '---hello:\n  world: 1\n'
---
_id: "AU0j64-b-stVfkvus5-A"
```

It also supports wildcards. Here it returns only the host name of every nodes in the cluster:
```
curl -XGET 'http://localhost:9200/_nodes/stats?filter_path=nodes.*.host*'
{"nodes":{"lvJHed8uQQu4brS-SXKsNA":{"host":"portable"}}}
```

And "**" can be used to include sub fields without knowing the exact path. Here it returns only the Lucene version of every segment:
```
curl 'http://localhost:9200/_segments?pretty&filter_path=indices.**.version'
{
  "indices" : {
    "beer" : {
      "shards" : {
        "0" : [ {
          "segments" : {
            "_0" : {
              "version" : "5.2.0"
            },
            "_1" : {
              "version" : "5.2.0"
            }
          }
        } ]
      }
    }
  }
}
```

Note that elasticsearch sometimes returns directly the raw value of a field, like the _source field. If you want to filter _source fields, you should consider combining the already existing _source parameter (see Get API for more details) with the filter_path parameter like this:

```
curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title'
{
  "hits" : {
    "hits" : [ {
      "_source":{"title":"Book #2"}
    }, {
      "_source":{"title":"Book #1"}
    }, {
      "_source":{"title":"Book #3"}
    } ]
  }
}
```
s1monw pushed a commit that referenced this pull request May 28, 2015
This change adds a new "filter_path" parameter that can be used to filter and reduce the responses returned by the REST API of elasticsearch.

For example, returning only the shards that failed to be optimized:
```
curl -XPOST 'localhost:9200/beer/_optimize?filter_path=_shards.failed'
{"_shards":{"failed":0}}%
```

It supports multiple filters (separated by a comma):
```
curl -XGET 'localhost:9200/_mapping?pretty&filter_path=*.mappings.*.properties.name,*.mappings.*.properties.title'
```

It also supports the YAML response format. Here it returns only the `_id` field of a newly indexed document:
```
curl -XPOST 'localhost:9200/library/book?filter_path=_id' -d '---hello:\n  world: 1\n'
---
_id: "AU0j64-b-stVfkvus5-A"
```

It also supports wildcards. Here it returns only the host name of every nodes in the cluster:
```
curl -XGET 'http://localhost:9200/_nodes/stats?filter_path=nodes.*.host*'
{"nodes":{"lvJHed8uQQu4brS-SXKsNA":{"host":"portable"}}}
```

And "**" can be used to include sub fields without knowing the exact path. Here it returns only the Lucene version of every segment:
```
curl 'http://localhost:9200/_segments?pretty&filter_path=indices.**.version'
{
  "indices" : {
    "beer" : {
      "shards" : {
        "0" : [ {
          "segments" : {
            "_0" : {
              "version" : "5.2.0"
            },
            "_1" : {
              "version" : "5.2.0"
            }
          }
        } ]
      }
    }
  }
}
```

Note that elasticsearch sometimes returns directly the raw value of a field, like the _source field. If you want to filter _source fields, you should consider combining the already existing _source parameter (see Get API for more details) with the filter_path parameter like this:

```
curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title'
{
  "hits" : {
    "hits" : [ {
      "_source":{"title":"Book #2"}
    }, {
      "_source":{"title":"Book #1"}
    }, {
      "_source":{"title":"Book #3"}
    } ]
  }
}
```
s1monw pushed a commit that referenced this pull request Jun 5, 2015
Google Compute Engine VM discovery allows to use the google APIs to perform automatic discovery (similar to multicast in non hostile
multicast environments). Here is a simple sample configuration:

```yaml
  cloud:
      gce:
          project_id: <your-google-project-id>
          zone: <your-zone>
  discovery:
          type: gce
```

How to start (short story)
--------------------------

* Create Google Compute Engine instance
* Install Elasticsearch
* Install Google Compute Engine Cloud plugin
* Modify `elasticsearch.yml` file
* Start Elasticsearch

Closes #1.
s1monw pushed a commit that referenced this pull request Jun 10, 2015
It sounds like Jython 2.5.3 is leaking some threads.

Jython 2.5.4.rc1 has the same issue.

Jython 2.7-b3 fixes it.

Typical error when running tests:

```
ERROR   0.00s J2 | PythonScriptEngineTests (suite) <<<
   > Throwable #1: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.elasticsearch.script.python.PythonScriptEngineTests:
   >    1) Thread[id=12, name=org.python.google.common.base.internal.Finalizer, state=WAITING, group=TGRP-PythonScriptEngineTests]
   >         at java.lang.Object.wait(Native Method)
   >         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
   >         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
   >         at org.python.google.common.base.internal.Finalizer.run(Finalizer.java:127)
   >    at __randomizedtesting.SeedInfo.seed([7A5ECFD8D0474383]:0)
   > Throwable #2: com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated:
   >    1) Thread[id=12, name=org.python.google.common.base.internal.Finalizer, state=WAITING, group=TGRP-PythonScriptEngineTests]
   >         at java.lang.Object.wait(Native Method)
   >         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
   >         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
   >         at org.python.google.common.base.internal.Finalizer.run(Finalizer.java:127)
   >    at __randomizedtesting.SeedInfo.seed([7A5ECFD8D0474383]:0)
```

Closes elastic#22.
s1monw pushed a commit that referenced this pull request Aug 4, 2015
[TESTS] cat indices times for UTC - timezones
s1monw pushed a commit that referenced this pull request Aug 13, 2015
Build fails with maven 3.3.1 and 3.3.3. To reproduce, install one of the 3.3.x versions of maven and run `mvn clean verify` in the root directory of the project. The build will fail in the QA: Smoke Test Shaded Jar module with the following error:

```
Started J0 PID(99979@flea.local).
Suite: org.elasticsearch.shaded.test.ShadedIT
  2> NOTE: reproduce with: ant test  -Dtestcase=ShadedIT -Dtests.method=testJodaIsNotOnTheCP -Dtests.seed=2F4D23A7462CF921 -Dtests.locale= -Dtests.timezone=Asia/Baku -Dtests.asserts=true -Dtests.file.encoding=UTF-8
FAILURE 0.06s | ShadedIT.testJodaIsNotOnTheCP <<<
  > Throwable #1: junit.framework.AssertionFailedError: Expected an exception but the test passed: java.lang.ClassNotFoundException
  > at __randomizedtesting.SeedInfo.seed([2F4D23A7462CF921:3A9404F1F69FD80]:0)
  > at junit.framework.Assert.fail(Assert.java:57)
  > at java.lang.Thread.run(Thread.java:745)
  2> NOTE: reproduce with: ant test  -Dtestcase=ShadedIT -Dtests.method=testGuavaIsNotOnTheCP -Dtests.seed=2F4D23A7462CF921 -Dtests.locale= -Dtests.timezone=Asia/Baku -Dtests.asserts=true -Dtests.file.encoding=UTF-8
FAILURE 0.01s | ShadedIT.testGuavaIsNotOnTheCP <<<
  > Throwable #1: junit.framework.AssertionFailedError: Expected an exception but the test passed: java.lang.ClassNotFoundException
  > at __randomizedtesting.SeedInfo.seed([2F4D23A7462CF921:C2502FD54D83433D]:0)
  > at junit.framework.Assert.fail(Assert.java:57)
  > at java.lang.Thread.run(Thread.java:745)
  2> NOTE: reproduce with: ant test  -Dtestcase=ShadedIT -Dtests.method=testjsr166eIsNotOnTheCP -Dtests.seed=2F4D23A7462CF921 -Dtests.locale= -Dtests.timezone=Asia/Baku -Dtests.asserts=true -Dtests.file.encoding=UTF-8
FAILURE 0.01s | ShadedIT.testjsr166eIsNotOnTheCP <<<
  > Throwable #1: junit.framework.AssertionFailedError: Expected an exception but the test passed: java.lang.ClassNotFoundException
  > at __randomizedtesting.SeedInfo.seed([2F4D23A7462CF921:35593286F4269392]:0)
  > at junit.framework.Assert.fail(Assert.java:57)
  > at java.lang.Thread.run(Thread.java:745)
  2> NOTE: leaving temporary files on disk at: /Users/Shared/Jenkins/Home/workspace/elasticsearch-master/qa/smoke-test-shaded/target/J0/temp/org.elasticsearch.shaded.test.ShadedIT_2F4D23A7462CF921-001
  2> NOTE: test params are: codec=CheapBastard, sim=DefaultSimilarity, locale=, timezone=Asia/Baku
  2> NOTE: Mac OS X 10.10.4 x86_64/Oracle Corporation 1.8.0_25 (64-bit)/cpus=8,threads=1,free=482137936,total=514850816
  2> NOTE: All tests run in this JVM: [ShadedIT]
Completed [1/1] in 6.61s, 5 tests, 3 failures <<< FAILURES!

Tests with failures:
  - org.elasticsearch.shaded.test.ShadedIT.testJodaIsNotOnTheCP
  - org.elasticsearch.shaded.test.ShadedIT.testGuavaIsNotOnTheCP
  - org.elasticsearch.shaded.test.ShadedIT.testjsr166eIsNotOnTheCP
```
Please note that build doesn't fail with maven 3.2.x and it doesn't fail if mvn command is executed inside the qa/smoke-test-shaded directory. Only when the build is started from the root directory the error above can be observed.

The reason is because of the shaded version which depends on elasticsearch core.
When Maven build the module only, then elasticsearch core is not added to the dependency tree.

```sh
mvn dependency:tree -pl :smoke-test-shaded
```

```
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ smoke-test-shaded ---
[INFO] org.elasticsearch.qa:smoke-test-shaded:jar:2.0.0-beta1-SNAPSHOT
[INFO] +- org.elasticsearch.distribution.shaded:elasticsearch:jar:2.0.0-beta1-SNAPSHOT:compile
[INFO] |  +- org.apache.lucene:lucene-core:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-backward-codecs:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-analyzers-common:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-queries:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-memory:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-highlighter:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-queryparser:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-sandbox:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-suggest:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-misc:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-join:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-grouping:jar:5.2.1:compile
[INFO] |  +- org.apache.lucene:lucene-spatial:jar:5.2.1:compile
[INFO] |  \- com.spatial4j:spatial4j:jar:0.4.1:compile
[INFO] +- org.hamcrest:hamcrest-all:jar:1.3:test
[INFO] \- org.apache.lucene:lucene-test-framework:jar:5.2.1:test
[INFO]    +- org.apache.lucene:lucene-codecs:jar:5.2.1:test
[INFO]    +- com.carrotsearch.randomizedtesting:randomizedtesting-runner:jar:2.1.16:test
[INFO]    +- junit:junit:jar:4.11:test
[INFO]    \- org.apache.ant:ant:jar:1.8.2:test
```

But if shaded plugin is involved during the build, it modifies the `projectArtifactMap`:

```sh
mvn dependency:tree -pl org.elasticsearch.distribution.shaded:elasticsearch,:smoke-test-shaded
```

```
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ smoke-test-shaded ---
[INFO] org.elasticsearch.qa:smoke-test-shaded:jar:2.0.0-beta1-SNAPSHOT
[INFO] +- org.elasticsearch.distribution.shaded:elasticsearch:jar:2.0.0-beta1-SNAPSHOT:compile
[INFO] |  \- org.elasticsearch:elasticsearch:jar:2.0.0-beta1-SNAPSHOT:compile
[INFO] |     +- org.apache.lucene:lucene-backward-codecs:jar:5.2.1:compile
[INFO] |     +- org.apache.lucene:lucene-analyzers-common:jar:5.2.1:compile
[INFO] |     +- org.apache.lucene:lucene-queries:jar:5.2.1:compile
[INFO] |     +- org.apache.lucene:lucene-memory:jar:5.2.1:compile
[INFO] |     +- org.apache.lucene:lucene-highlighter:jar:5.2.1:compile
[INFO] |     +- org.apache.lucene:lucene-queryparser:jar:5.2.1:compile
[INFO] |     |  \- org.apache.lucene:lucene-sandbox:jar:5.2.1:compile
[INFO] |     +- org.apache.lucene:lucene-suggest:jar:5.2.1:compile
[INFO] |     |  \- org.apache.lucene:lucene-misc:jar:5.2.1:compile
[INFO] |     +- org.apache.lucene:lucene-join:jar:5.2.1:compile
[INFO] |     |  \- org.apache.lucene:lucene-grouping:jar:5.2.1:compile
[INFO] |     +- org.apache.lucene:lucene-spatial:jar:5.2.1:compile
[INFO] |     |  \- com.spatial4j:spatial4j:jar:0.4.1:compile
[INFO] |     +- com.google.guava:guava:jar:18.0:compile
[INFO] |     +- com.carrotsearch:hppc:jar:0.7.1:compile
[INFO] |     +- joda-time:joda-time:jar:2.8:compile
[INFO] |     +- org.joda:joda-convert:jar:1.2:compile
[INFO] |     +- com.fasterxml.jackson.core:jackson-core:jar:2.5.3:compile
[INFO] |     +- com.fasterxml.jackson.dataformat:jackson-dataformat-smile:jar:2.5.3:compile
[INFO] |     +- com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:jar:2.5.3:compile
[INFO] |     |  \- org.yaml:snakeyaml:jar:1.12:compile
[INFO] |     +- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.5.3:compile
[INFO] |     +- io.netty:netty:jar:3.10.3.Final:compile
[INFO] |     +- com.ning:compress-lzf:jar:1.0.2:compile
[INFO] |     +- com.tdunning:t-digest:jar:3.0:compile
[INFO] |     +- org.hdrhistogram:HdrHistogram:jar:2.1.6:compile
[INFO] |     +- org.apache.commons:commons-lang3:jar:3.3.2:compile
[INFO] |     +- commons-cli:commons-cli:jar:1.3.1:compile
[INFO] |     \- com.twitter:jsr166e:jar:1.1.0:compile
[INFO] +- org.hamcrest:hamcrest-all:jar:1.3:test
[INFO] \- org.apache.lucene:lucene-test-framework:jar:5.2.1:test
[INFO]    +- org.apache.lucene:lucene-codecs:jar:5.2.1:test
[INFO]    +- org.apache.lucene:lucene-core:jar:5.2.1:compile
[INFO]    +- com.carrotsearch.randomizedtesting:randomizedtesting-runner:jar:2.1.16:test
[INFO]    +- junit:junit:jar:4.11:test
[INFO]    \- org.apache.ant:ant:jar:1.8.2:test
```

A fix could consist of fixing something on Maven side. Probably something changed in a recent version and introduced this "issue" but it might be not really an issue. More a fix.

There are two workarounds:

1) exclude manually elasticsearch core from shaded version in smoke-test-shaded module and add manually each lucene lib needed by elasticsearch

2) add a new `elasticsearch-lucene` (lucene) POM module which simply declares all needed lucene libs in subprojects (such as the smoke tester one).

I choose the later.

Closes elastic#12791.
s1monw pushed a commit that referenced this pull request Aug 18, 2015
Indeed, we check within the test suite that we have not unassigned shards.

But when the test starts on my machine I get:

```
[elasticsearch] [2015-08-13 12:03:18,801][INFO ][org.elasticsearch.cluster.routing.allocation.decider] [Kehl of Tauran] low disk watermark [85%] exceeded on [eLujVjWAQ8OHdhscmaf0AQ][Jackhammer] free: 59.8gb[12.8%], replicas will not be assigned to this node
```

```
  2> REPRODUCE WITH: mvn verify -Pdev -Dskip.unit.tests -Dtests.seed=2AE3A3B7B13CE3D6 -Dtests.class=org.elasticsearch.smoketest.SmokeTestMultiIT -Dtests.method="test {yaml=smoke_test_multinode/10_basic/cluster health basic test, one index}" -Des.logger.level=ERROR -Dtests.assertion.disabled=false -Dtests.security.manager=true -Dtests.heap.size=512m -Dtests.locale=ar_YE -Dtests.timezone=Asia/Hong_Kong -Dtests.rest.suite=smoke_test_multinode
FAILURE 38.5s | SmokeTestMultiIT.test {yaml=smoke_test_multinode/10_basic/cluster health basic test, one index} <<<
   > Throwable #1: java.lang.AssertionError: expected [2xx] status code but api [cluster.health] returned [408 Request Timeout] [{"cluster_name":"prepare_release","status":"yellow","timed_out":true,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":3,"active_shards":3,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":3,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":50.0}]
```

We don't check anymore if we have unassigned shards and we wait for `yellow` status instead of `green`.

Closes elastic#12852.
s1monw pushed a commit that referenced this pull request Aug 18, 2015
Indeed, we check within the test suite that we have not unassigned shards.

But when the test starts on my machine I get:

```
[elasticsearch] [2015-08-13 12:03:18,801][INFO ][org.elasticsearch.cluster.routing.allocation.decider] [Kehl of Tauran] low disk watermark [85%] exceeded on [eLujVjWAQ8OHdhscmaf0AQ][Jackhammer] free: 59.8gb[12.8%], replicas will not be assigned to this node
```

```
  2> REPRODUCE WITH: mvn verify -Pdev -Dskip.unit.tests -Dtests.seed=2AE3A3B7B13CE3D6 -Dtests.class=org.elasticsearch.smoketest.SmokeTestMultiIT -Dtests.method="test {yaml=smoke_test_multinode/10_basic/cluster health basic test, one index}" -Des.logger.level=ERROR -Dtests.assertion.disabled=false -Dtests.security.manager=true -Dtests.heap.size=512m -Dtests.locale=ar_YE -Dtests.timezone=Asia/Hong_Kong -Dtests.rest.suite=smoke_test_multinode
FAILURE 38.5s | SmokeTestMultiIT.test {yaml=smoke_test_multinode/10_basic/cluster health basic test, one index} <<<
   > Throwable #1: java.lang.AssertionError: expected [2xx] status code but api [cluster.health] returned [408 Request Timeout] [{"cluster_name":"prepare_release","status":"yellow","timed_out":true,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":3,"active_shards":3,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":3,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":50.0}]
```

We don't check anymore if we have unassigned shards and we wait for `yellow` status instead of `green`.

Closes elastic#12852.

(cherry picked from commit da65493)
s1monw pushed a commit that referenced this pull request Dec 21, 2015
Update contributing.md for forbidding import foo.*
s1monw pushed a commit that referenced this pull request Mar 11, 2016
Add REST test for:

* `.doc`
* `.docx`

The later fails with:

```
==> Test Info: seed=DB93397128B876D4; jvm=1; suite=1
Suite: org.elasticsearch.ingest.attachment.IngestAttachmentRestIT
  2> REPRODUCE WITH: gradle :plugins:ingest-attachment:integTest -Dtests.seed=DB93397128B876D4 -Dtests.class=org.elasticsearch.ingest.attachment.IngestAttachmentRestIT -Dtests.method="test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .docx file}" -Des.logger.level=WARN -Dtests.security.manager=true -Dtests.locale=bg -Dtests.timezone=Europe/Athens
FAILURE 4.53s | IngestAttachmentRestIT.test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .docx file} <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected [2xx] status code but api [index] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parse_exception","reason":"Error parsing document in field [field1]"}],"type":"parse_exception","reason":"Error parsing document in field [field1]","caused_by":{"type":"tika_exception","reason":"Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@7f85baa5","caused_by":{"type":"illegal_state_exception","reason":"access denied (\"java.lang.RuntimePermission\" \"getClassLoader\")","caused_by":{"type":"access_control_exception","reason":"access denied (\"java.lang.RuntimePermission\" \"getClassLoader\")"}}}},"status":400}]
   >    at __randomizedtesting.SeedInfo.seed([DB93397128B876D4:53C706AB86441B2C]:0)
   >    at org.elasticsearch.test.rest.section.DoSection.execute(DoSection.java:107)
   >    at org.elasticsearch.test.rest.ESRestTestCase.test(ESRestTestCase.java:395)
   >    at java.lang.Thread.run(Thread.java:745)
```

Related to elastic#16864
s1monw pushed a commit that referenced this pull request Mar 16, 2016
Fixing some tests and compile problems in reindex module
s1monw pushed a commit that referenced this pull request Jun 22, 2016
```
   > Throwable #1: java.lang.RuntimeException: file handle leaks: [SeekableByteChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+g1gc/core/build/testrun/integTest/J0/temp/org.elasticsearch.search.suggest.CompletionSuggestSearch2xIT_518545A20D129C8C-001/tempDir-001/data/nodes/1/indices/4sTECv6WSJOJsw9L4CGamg/0/index/segments_1), SeekableByteChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+g1gc/core/build/testrun/integTest/J0/temp/org.elasticsearch.search.suggest.CompletionSuggestSearch2xIT_518545A20D129C8C-001/tempDir-001/data/nodes/1/indices/4sTECv6WSJOJsw9L4CGamg/0/index/segments_1)]
   > 	at __randomizedtesting.SeedInfo.seed([518545A20D129C8C]:0)
   > 	at org.apache.lucene.mockfile.LeakFS.onClose(LeakFS.java:63)
   > 	at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:77)
   > 	at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:78)
   > 	at java.lang.Thread.run(Thread.java:745)
   > Caused by: java.lang.Exception
   > 	at org.apache.lucene.mockfile.LeakFS.onOpen(LeakFS.java:46)
   > 	at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
   > 	at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:271)
   > 	at org.apache.lucene.mockfile.FilterFileSystemProvider.newByteChannel(FilterFileSystemProvider.java:212)
   > 	at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:240)
   > 	at java.nio.file.Files.newByteChannel(Files.java:361)
   > 	at java.nio.file.Files.newByteChannel(Files.java:407)
   > 	at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:77)
   > 	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:94)
   > 	at org.apache.lucene.util.LuceneTestCase.slowFileExists(LuceneTestCase.java:2695)
   > 	at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:737)
   > 	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:94)
   > 	at org.elasticsearch.common.lucene.Lucene$1.doBody(Lucene.java:237)
   > 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:685)
   > 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:637)
   > 	at org.elasticsearch.common.lucene.Lucene.checkSegmentInfoIntegrity(Lucene.java:242)
   > 	at org.elasticsearch.index.store.Store$MetadataSnapshot.loadMetadata(Store.java:847)
   > 	at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:740)
   > 	at org.elasticsearch.index.store.Store.getMetadata(Store.java:260)
   > 	at org.elasticsearch.index.store.Store.getMetadata(Store.java:240)
   > 	at org.elasticsearch.index.shard.IndexShard.doCheckIndex(IndexShard.java:1310)
   > 	at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:102)
   > 	at org.elasticsearch.index.shard.IndexShard.checkIndex(IndexShard.java:1288)
   > 	at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:921)
   > 	at org.elasticsearch.index.shard.IndexShard.skipTranslogRecovery(IndexShard.java:964)
   > 	at org.elasticsearch.indices.recovery.RecoveryTarget.prepareForTranslogOperations(RecoveryTarget.java:297)
   > 	at
   ```
s1monw pushed a commit that referenced this pull request Aug 4, 2016
Old:
```
   > Throwable #1: java.lang.AssertionError: expected [2xx] status code but api [reindex] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25}],"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25,"caused_by":{"type":"illegal_argument_exception","reason":"[dest] unknown field [asdfadf], parser not found"}},"status":400}]
   >    at __randomizedtesting.SeedInfo.seed([9325F8C5C6F227DD:1B71C71F680E4A25]:0)
   >    at org.elasticsearch.test.rest.yaml.section.DoSection.execute(DoSection.java:119)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:309)
   >    at java.lang.Thread.run(Thread.java:745)
```

New:
```
   > Throwable #1: java.lang.AssertionError: Failure at [reindex/10_basic:12]: expected [2xx] status code but api [reindex] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25}],"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25,"caused_by":{"type":"illegal_argument_exception","reason":"[dest] unknown field [asdfadf], parser not found"}},"status":400}]
   >    at __randomizedtesting.SeedInfo.seed([444DEEAF47322306:CC19D175E9CE4EFE]:0)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:329)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:309)
   >    at java.lang.Thread.run(Thread.java:745)
   > Caused by: java.lang.AssertionError: expected [2xx] status code but api [reindex] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25}],"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25,"caused_by":{"type":"illegal_argument_exception","reason":"[dest] unknown field [asdfadf], parser not found"}},"status":400}]
   >    at org.elasticsearch.test.rest.yaml.section.DoSection.execute(DoSection.java:119)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:325)
   >    ... 37 more
```

Sorry for the longer stack trace, but I wanted to be sure I didn't throw
anything away by accident.
s1monw pushed a commit that referenced this pull request Jul 25, 2017
…point into lucene (elastic#25827)

When a replica processes out of order operations, it can drop some due to version comparisons. In the past that would have resulted in a VersionConflictException being thrown and the operation was totally ignored. With the seq# push, we started storing these operations in the translog (but not indexing them into lucene) in order to have complete op histories to facilitate ops based recoveries. This in turn had the undesired effect that deleted docs may be resurrected during recovery in some extreme edge situation (see a complete explanation below). This PR contains a simple fix, which is also an optimization for the recovery process, incoming operation that have a seq# lower than the current local checkpoint (i.e., have already been processed) should not be indexed into lucene. Note that sometimes we can also skip storing them in the translog, but this is not required for the fix and is more complicated.

This is the equivalent of elastic#25592

## More details on resurrected ops 

Consider two operations: 
 - Index d1, seq no 1
 - Delete d1, seq no 3

On a replica they come out of order:
 - Translog gen 1 contains:
    - delete (seqNo 3)
 - Translog gen 2 contains:
    - index (seqNo 1) (wasn't indexed into lucene, but put into the translog)
    - another operation (seqNo 10)
 - Translog gen 3 
    - another op (seqNo 9)
 - Engine commits with:
    - local checkpoint 9
    - refers to gen 2 

If this replica becomes a primary:
    - Local recovery will replay translog gen 2 and up, causing index #1 to be re-index. 
    - Even if recovery will start at gen 3, the translog retention policy will cause file based recovery to replay the entire translog. If it happens to start at gen 2 (but not 1), we will run into the same problem.

#### Some context - out of order delivery involving deletes:

On normal operations, this relies on the gc_deletes setting. We assume that the setting represents an upper bound on the time between the index and the delete operation. The index operation will be detected as stale based on the tombstone map in the LiveVersionMap.

Recovery presents a challenge as it can replay an old index operation that was in the translog and override a delete operation that was done when the engine was opened (and is not part of the replayed snapshot). To deal with this situation, we disable GC deletes (i.e. retain all deletes) for the duration of recoveries. This means that the delete operation will be remembered and the index operation ignored.

Both of the above scenarios (local recover + peer recovery) create a situation where the delete operation is never replayed. It this "lost" as lucene doesn't remember it happened and our LiveVersionMap is populated with it.

#### Solution:

Note that both local and peer recovery represent a scenario where we replay translog ops on top of an existing lucene index, potentially with ongoing indexing. Therefore we can treat them the same.

The local checkpoint in Lucene represent a marker indicating that all operations below it were performed on the index. This is the only form of "memory" that we have that relates to deletes. If we can achieve the following:
1) All ops below the local checkpoint are not indexed to lucene.
2) All ops above the local checkpoint are

It will mean that all  variants are covered: (i# == index op seq#, d# == delete op seq#, lc == local checkpoint in commit)
1) i# < d# <= lc - document is already deleted in lucene and stays that way.
2) i# <= lc < d# - delete is replayed on index - document is deleted
3) lc < i# < d# - index is replayed and then delete - document is deleted.

More formally - we want to make sure that for all ops that performed on the primary o1 and o2, if o2 is processed on a shard before o1, o1 will be dropped. We have the following scenarios

1) If both o1 or o2 are not included in the replayed snapshot and are above it (i.e., have a higher seq#), they fall under the gc deletes assumption.
2) If both o1 is part of the replayed snapshot but o2 is above it:
	- if o2 arrives first, o1 must arrive due to the recovery and potentially via replication as well. since gc deletes is disabled we are guaranteed to know of o2's existence.
3) If both o2 and o1 are part of the replayed snapshot:
	- we fall under the same scenarios as #2 - disabling GC deletes ensures we know of o2 if it arrives first.
4) If o1 falls before the snapshot and o2 is either part of the snapshot or higher:
	- Since the snapshot is guaranteed to contain all ops that are not part of lucene and are above the lc in the commit used, this means that o1 is part of lucene and o1 < local checkpoint. This means it won't be processed and we're not in the scenario we're discussing.
5) If o2 falls before the snapshot but o1 is part of it:
	- by the same reasoning above, o2 is < local checkpoint. Since o1 < o2, we also get o1 < local checkpoint and this will be dropped.


#### Implementation:

For local recovery, we can filter the ops we read of the translog and avoid replaying them. For peer recovery this is tricky as we do want to send the operations in order to have some history on the target shard. Filtering operations on the engine level (i.e., not indexing to lucene if op seq# <= lc) would work for both.
s1monw pushed a commit that referenced this pull request Sep 29, 2017
Even though you annotate the Test class with `@ThirdParty` the static
code is initialized.

In that case it fails with:

```
==> Test Info: seed=529C3C6977F695FC; jvms=3; suites=6
Suite: org.elasticsearch.repositories.azure.AzureSnapshotRestoreTests
ERROR   0.00s J2 | AzureSnapshotRestoreTests (suite) <<< FAILURES!
   > Throwable #1: java.lang.IllegalStateException: to run integration tests, you need to set -Dtests.thirdparty=true and -Dtests.azure.account=azure-account -Dtests.azure.key=azure-key
   >    at org.elasticsearch.cloud.azure.AzureTestUtils.generateMockSecureSettings(AzureTestUtils.java:37)
   >    at org.elasticsearch.repositories.azure.AzureSnapshotRestoreTests.generateMockSettings(AzureSnapshotRestoreTests.java:81)
   >    at org.elasticsearch.repositories.azure.AzureSnapshotRestoreTests.<clinit>(AzureSnapshotRestoreTests.java:84)
   >    at java.lang.Class.forName0(Native Method)
   >    at java.lang.Class.forName(Class.java:348)
Completed [1/6] on J2 in 2.21s, 0 tests, 1 error <<< FAILURES!
```

Closes elastic#26812.

(cherry picked from commit eb6d714 for master branch)
s1monw pushed a commit that referenced this pull request Nov 24, 2017
elastic#27515)

Previously:
```
   > Throwable #1: java.lang.AssertionError: Expected all shards successful but got successful [8] total [9]
   > Expected: <8>
   >      but: was <9>
```

Now:
```
   > Throwable #1: java.lang.AssertionError: Expected all shards successful
   > Expected: <9>
   >      but: was <8>
```
s1monw pushed a commit that referenced this pull request Mar 23, 2018
In elastic#28350, we fixed an endless flushing loop which may happen on 
replicas by tightening the relation between the flush action and the
periodically flush condition.

1. The periodically flush condition is enabled only if it is disabled 
after a flush.

2. If the periodically flush condition is enabled then a flush will
actually happen regardless of Lucene state.

(1) and (2) guarantee that a flushing loop will be terminated. Sadly, 
the condition 1 can be violated in edge cases as we used two different
algorithms to evaluate the current and future uncommitted translog size.

- We use method `uncommittedSizeInBytes` to calculate current 
  uncommitted size. It is the sum of translogs whose generation at least
the minGen (determined by a given seqno). We pick a continuous range of
translogs since the minGen to evaluate the current uncommitted size.

- We use method `sizeOfGensAboveSeqNoInBytes` to calculate the future 
  uncommitted size. It is the sum of translogs whose maxSeqNo at least
the given seqNo. Here we don't pick a range but select translog one by
one.

Suppose we have 3 translogs `gen1={#1,#2}, gen2={}, gen3={#3} and 
seqno=#1`, `uncommittedSizeInBytes` is the sum of gen1, gen2, and gen3
while `sizeOfGensAboveSeqNoInBytes` is the sum of gen1 and gen3. Gen2 is
excluded because its maxSeqno is still -1.

This commit removes both `sizeOfGensAboveSeqNoInBytes` and 
`uncommittedSizeInBytes` methods, then enforces an engine to use only
`sizeInBytesByMinGen` method to evaluate the periodically flush condition.

Closes elastic#29097
Relates #elastic#28350
s1monw pushed a commit that referenced this pull request Mar 27, 2018
In #testPruneOnlyDeletesAtMostLocalCheckpoint, we create a new engine
but mistakenly use the same translog directory of  the existing engine.
This prevents translog files from cleaning up when closing the engines.

ERROR   0.12s J2 | InternalEngineTests.testPruneOnlyDeletesAtMostLocalCheckpoint <<< FAILURES!
   > Throwable #1: java.io.IOException: could not remove the following files (in the order of attempts):
   >    translog-primary-060/translog-2.tlog:  java.io.IOException: access denied:

This commit makes sure to use a separate directory for each engine in
this tes.
s1monw pushed a commit that referenced this pull request Jun 14, 2018
The `requires_replica` yaml test feature hasn't worked for years. This
is what happens if you try to use it:
```
   > Throwable #1: java.lang.NullPointerException
   >    at __randomizedtesting.SeedInfo.seed([E6602FB306244B12:6E341069A8D826EA]:0)
   >    at org.elasticsearch.test.rest.yaml.Features.areAllSupported(Features.java:58)
   >    at org.elasticsearch.test.rest.yaml.section.SkipSection.skip(SkipSection.java:144)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:321)
```

None of our tests use it.
s1monw pushed a commit that referenced this pull request Jul 4, 2018
The `requires_replica` yaml test feature hasn't worked for years. This
is what happens if you try to use it:
```
   > Throwable #1: java.lang.NullPointerException
   >    at __randomizedtesting.SeedInfo.seed([E6602FB306244B12:6E341069A8D826EA]:0)
   >    at org.elasticsearch.test.rest.yaml.Features.areAllSupported(Features.java:58)
   >    at org.elasticsearch.test.rest.yaml.section.SkipSection.skip(SkipSection.java:144)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:321)
```

None of our tests use it.
s1monw pushed a commit that referenced this pull request Nov 6, 2018
Error was thrown if leader index had no soft deletes enabled, but it then continued creating the follower index.

The test caught this bug, but very rarely due to timing issue.

Build failure instance:

```
1> [2018-11-05T20:29:38,597][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] before test
  1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings  ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]]
  1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings  ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]]
  1> [2018-11-05T20:29:38,609][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s_0] [leader-index] creating index, cause [api], templates [random-soft-deletes-templat
e, one_shard_index_template], shards [2]/[0], mappings []
  1> [2018-11-05T20:29:38,628][INFO ][o.e.c.r.a.AllocationService] [node_s_0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[leader-
index][0]] ...]).
  1> [2018-11-05T20:29:38,660][INFO ][o.e.x.c.a.TransportPutFollowAction] [node_s_0] [follower-index] creating index, cause [ccr_create_and_follow], shards [2]/[0]
  1> [2018-11-05T20:29:38,675][INFO ][o.e.c.s.ClusterSettings  ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]]
  1> [2018-11-05T20:29:38,676][INFO ][o.e.c.s.ClusterSettings  ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]]
  1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] after test
  1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] [LocalIndexFollowingIT#testDoNotCreateFoll
owerIfLeaderDoesNotHaveSoftDeletes]: cleaning up after test
  1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [follower-index/TlWlXp0JSVasju2Kr_hksQ] deleting index
  1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [leader-index/FQ6EwIWcRAKD8qvOg2eS8g] deleting index
FAILURE 0.23s J0 | LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes <<< FAILURES!
   > Throwable #1: java.lang.AssertionError:
   > Expected: <false>
   >      but: was <true>
   >    at __randomizedtesting.SeedInfo.seed([7A3C89DA3BCA17DD:65C26CBF6FEF0B39]:0)
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.xpack.ccr.LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes(LocalIndexFollowingIT.java:83)
   >    at java.lang.Thread.run(Thread.java:748)
```

Build failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+intake/46/console
s1monw pushed a commit that referenced this pull request Feb 27, 2019
Today this test catches an exception and asserts that its proximate cause has
message `Random IOException` but occasionally this exception is wrapped two
layers deep, causing the test to fail. This commit adjusts the test to look at
the root cause of the exception instead.

      1> [2019-02-25T12:31:50,837][INFO ][o.e.s.SharedClusterSnapshotRestoreIT] [testSnapshotFileFailureDuringSnapshot] --> caught a top level exception, asserting what's expected
      1> org.elasticsearch.snapshots.SnapshotException: [test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] Snapshot could not be read
      1> 	at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:212) ~[main/:?]
      1> 	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:135) ~[main/:?]
      1> 	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:54) ~[main/:?]
      1> 	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:127) ~[main/:?]
      1> 	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:208) ~[main/:?]
      1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[main/:?]
      1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
      1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202]
      1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202]
      1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
      1> Caused by: org.elasticsearch.snapshots.SnapshotException: [test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] failed to get snapshots
      1> 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:564) ~[main/:?]
      1> 	at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:206) ~[main/:?]
      1> 	... 9 more
      1> Caused by: java.io.IOException: Random IOException
      1> 	at org.elasticsearch.snapshots.mockstore.MockRepository$MockBlobStore$MockBlobContainer.maybeIOExceptionOrBlock(MockRepository.java:275) ~[test/:?]
      1> 	at org.elasticsearch.snapshots.mockstore.MockRepository$MockBlobStore$MockBlobContainer.readBlob(MockRepository.java:317) ~[test/:?]
      1> 	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:101) ~[main/:?]
      1> 	at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90) ~[main/:?]
      1> 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:560) ~[main/:?]
      1> 	at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:206) ~[main/:?]
      1> 	... 9 more

    FAILURE 0.59s J0 | SharedClusterSnapshotRestoreIT.testSnapshotFileFailureDuringSnapshot <<< FAILURES!
       > Throwable #1: java.lang.AssertionError:
       > Expected: a string containing "Random IOException"
       >      but: was "[test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] failed to get snapshots"
       > 	at __randomizedtesting.SeedInfo.seed([B73CA847D4B4F52D:884E042D2D899330]:0)
       > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
       > 	at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.testSnapshotFileFailureDuringSnapshot(SharedClusterSnapshotRestoreIT.java:821)
       > 	at java.lang.Thread.run(Thread.java:748)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants