This track contains data retrieved from PMC.
Note that we have filtered the data set so only a subset of the articles is included.
Note that the body content is actually much longer has been shortened here to increase readability.
{
"name": "3_Biotech_2015_Dec_13_5(6)_1007-1019",
"journal": "3 Biotech",
"date": "2015 Dec 13",
"volume": "5(6)",
"issue": "1007-1019",
"accession": "PMC4624133",
"timestamp": "2015-10-30 20:08:11",
"pmid": "",
"body": "\n==== Front\n3 Biotech3 Biotech3 Biotech2190-572X2190-5738Springer ..."
}This track allows to overwrite the following parameters with Rally 0.8.0+ using --track-params:
bulk_size(default: 500)bulk_indexing_clients(default: 8): Number of clients that issue bulk indexing requests.ingest_percentage(default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.conflicts(default: "random"): Type of id conflicts to simulate. Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).conflict_probability(default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge. Combiningconflicts=sequentialandconflict-probability=0makes Rally generate index ids by itself, instead of relying on Elasticsearch'sautomatic id generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>_.on_conflict(default: "index"): Whether to use an "index" or an "update" action when simulating an id conflict.recency(default: 0): A number between 0 and 1 that defines whether to bias towards more recent ids when simulating conflicts. See the Rally docs for the full definition of this parameter. This requires to run the respective challenge.max_num_segments: The number of segments to target when doing a force merge (default: -1)number_of_replicas(default: 0)number_of_shards(default: 5)source_enabled(default: true): A boolean defining whether the_sourcefield is stored in the index.index_settings: A list of index settings. Index settings defined elsewhere (e.g.number_of_replicas) need to be overridden explicitly.default_search_timeout(default: -1)cluster_health(default: "green"): The minimum required cluster health.error_level(default: "non-fatal"): Available for bulk operations only to specify ignore-response-error-level.post_ingest_sleep(default: false): Whether to pause after ingest and prior to subsequent operations.post_ingest_sleep_duration(default: 30): Sleep duration in seconds.
All articles that are included are licensed as CC-BY (http://creativecommons.org/licenses/by/2.0/)
This data set is licensed under the same terms. Please refer to http://creativecommons.org/licenses/by/2.0/ for details.
Attribution hint:
You can download a full list of the author information for each included document from http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/pmc/attribution.txt.bz2 (size: 52.2MB)