Integrate 1000 Genomes Project population definitions by standage · Pull Request #45 · bioforensics/MicroHapDB

standage · 2019-12-07T03:52:13Z

While preparing for #42, I discovered that the 26 global populations from the 1000 Genomes Project are included in the 96 ALFRED populations, under different IDs. This update adds a new data source to the dbbuild directory, and the 26 1KGP population definitions were moved from dbbuild/sources/alfred/ to dbbuild/sources/1kgp/. At the moment, the marker.tsv and frequency.tsv files in dbbuild/sources/1kgp are empty, but frequency data will be added as part of #42.

Closes #44.

standage · 2019-12-07T03:53:03Z

dbbuild/README.md


 - `ID`: a unique identifier for this population across all sources
 - `Name`: a free-text description of the population, intended to be human readable
+- `Xref`: optional cross-reference


The Xref column was added back as a requirement for population tables.

This update makes sweeping changes to the genotype simulation and sequencing code. - The `sim` module no longer performs sequencing and focuses entirely on haplotype simulation. - The `seq` module now handles simulated Illumina sequencing of both simple (single contributor) and mixture (multi contributor) samples. - The `mix` module merges simple genotypes into a simulated mixture sample. - The `mixture` module has been dropped, and its functionality is covered by the more granular `sim`, `mix`, and `seq` modules. This update also replaced all references to `microhapulator.cli.parse_args()` with `microhapulator.cli.get_parser().parse_args()`. The former is used to configure runtime logging, which for some reason causes issues in a testing environment. Closes #44.

Oops. These changes were supposed to be part of #45.

Daniel Standage and others added 4 commits December 6, 2019 13:48

Adding 1000 Genomes Project data

8df63b4

Fix data and build scripts

421f2e8

Starting to fix tests

55b55e2

Fix tests

aa6c6bf

standage added datasources References to existing data sources or proposals for new sources refactoring Internal changes that don't alter behavior but make the software more robust and sustainable labels Dec 7, 2019

standage commented Dec 7, 2019

View reviewed changes

standage added 4 commits December 6, 2019 23:10

Removing unnecessary additions

3c89512

Merge branch 'master' into 1kgp

d1b9b3a

Fix tests

5c809fc

Reverting more unnecessary changes

27ac80b

standage merged commit e823c9c into master Dec 7, 2019

standage deleted the 1kgp branch December 7, 2019 04:29

standage added a commit that referenced this pull request Sep 11, 2024

Refactor/sim (#46)

3d95e0e

Oops. These changes were supposed to be part of #45.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate 1000 Genomes Project population definitions#45

Integrate 1000 Genomes Project population definitions#45
standage merged 8 commits intomasterfrom
1kgp

standage commented Dec 7, 2019

Uh oh!

standage Dec 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

standage commented Dec 7, 2019

Uh oh!

standage Dec 7, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant