Skip to content

Integrate 1000 Genomes Project population definitions#45

Merged
standage merged 8 commits intomasterfrom
1kgp
Dec 7, 2019
Merged

Integrate 1000 Genomes Project population definitions#45
standage merged 8 commits intomasterfrom
1kgp

Conversation

@standage
Copy link
Member

@standage standage commented Dec 7, 2019

While preparing for #42, I discovered that the 26 global populations from the 1000 Genomes Project are included in the 96 ALFRED populations, under different IDs. This update adds a new data source to the dbbuild directory, and the 26 1KGP population definitions were moved from dbbuild/sources/alfred/ to dbbuild/sources/1kgp/. At the moment, the marker.tsv and frequency.tsv files in dbbuild/sources/1kgp are empty, but frequency data will be added as part of #42.

Closes #44.

@standage standage added datasources References to existing data sources or proposals for new sources refactoring Internal changes that don't alter behavior but make the software more robust and sustainable labels Dec 7, 2019

- `ID`: a unique identifier for this population across all sources
- `Name`: a free-text description of the population, intended to be human readable
- `Xref`: optional cross-reference
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Xref column was added back as a requirement for population tables.

@standage standage merged commit e823c9c into master Dec 7, 2019
@standage standage deleted the 1kgp branch December 7, 2019 04:29
standage pushed a commit that referenced this pull request Sep 11, 2024
This update makes sweeping changes to the genotype simulation and sequencing code.

- The `sim` module no longer performs sequencing and focuses entirely on haplotype simulation. 
- The `seq` module now handles simulated Illumina sequencing of both simple (single contributor) and mixture (multi contributor) samples.
- The `mix` module merges simple genotypes into a simulated mixture sample.
- The `mixture` module has been dropped, and its functionality is covered by the more granular `sim`, `mix`, and `seq` modules.

This update also replaced all references to `microhapulator.cli.parse_args()` with `microhapulator.cli.get_parser().parse_args()`. The former is used to configure runtime logging, which for some reason causes issues in a testing environment.

Closes #44.
standage added a commit that referenced this pull request Sep 11, 2024
Oops. These changes were supposed to be part of #45.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasources References to existing data sources or proposals for new sources refactoring Internal changes that don't alter behavior but make the software more robust and sustainable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Map 1KG pop IDs to ALFRED pop IDs

1 participant