Skip to content

MH frequency estimates based on 1KGP data#55

Merged
standage merged 20 commits intomasterfrom
1kgpfreqs
Feb 13, 2020
Merged

MH frequency estimates based on 1KGP data#55
standage merged 20 commits intomasterfrom
1kgpfreqs

Conversation

@standage
Copy link
Member

@standage standage commented Jan 15, 2020

MicroHapDB is intended, among other things, to enable design of panels that include microhaplotypes from disparate published sources. One obstacle to this, especially when it comes to interpretation, is the lack of frequencies for a unified set of population samples. This update uses 2,504 fully phased genomes from the 1000 Genomes Project to estimate microhaplotype frequencies across 26 global populations for all microhaplotypes defined in MicroHapDB.

1KGP-based frequencies published by ALFRED agree well with these new estimates—perfect agreement in many cases, only slight differences in most others. The differences are likely due to the use of PHASE by the ALFRED curators to statistically phase all of their aggregated microhap data. In this update, ALFRED's 1KGP-based frequency estimates are superceded by the frequency estimates obtained directly from the 1KGP phased haplotypes.

Frequency estimates could not be computed for 5 markers (mh06PK-24844, mh11PK-63643, mh15PK-75170, mh22PK-104638, and mh0XUSC-XqD), since they are defined using variants that were not genotyped in the 1KGP Phase 3 data.

Closes #42.

@standage standage added the enhancement New feature or request label Jan 15, 2020
@codecov-io
Copy link

codecov-io commented Feb 13, 2020

Codecov Report

Merging #55 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #55   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           9      8    -1     
  Lines         247    209   -38     
  Branches       41     31   -10     
=====================================
- Hits          247    209   -38
Impacted Files Coverage Δ
microhapdb/retrieve.py 100% <ø> (ø) ⬆️
microhapdb/conftest.py 100% <0%> (ø) ⬆️
microhapdb/panel.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 102fc7f...90cfddc. Read the comment docs.

@standage standage merged commit 378aed8 into master Feb 13, 2020
@standage standage deleted the 1kgpfreqs branch February 13, 2020 16:27
standage pushed a commit that referenced this pull request Sep 11, 2024
Previously the `usa` panel included 100 loci for which there was *supposed* to be allele frequency data for all 19 sub-populations in a mock population roughly matching demographics in the United States. Due to a bug in that code, some loci in the panel do not have allele frequency data available for all populations. Also, since the initial panel slated for evaluation on real data (`beta`) contains 50 loci, it makes sense to restrict the `usa` panel to 50 loci as well.

This update fixes the bug and limits the panel to the top 50 loci ranked by A<sub>e</sub>.

Related to #55.
standage pushed a commit that referenced this pull request Sep 11, 2024
This update fixes a bug in `notebook/usa8k/Snakefile` resulting in two self-comparisons in the **samepop** data set. Fortunately the error was not in the simulation parameters, but in the comparison code. Thus the scope of the changes was very small. See #55.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Estimate allele frequencies for 26 populations from 1000 Genomes Project data

2 participants