YSEQ DNA Shop
  Top » Catalog

YSEQ Haplogroup Predictor (beta version)


Random Forest
Developed by Hunter Provyn with input and support from Thomas Krahn (2019).
The haplogroup prediction is backed by a Random Forest model implemented in python using sklearn.

This YSEQ haplogroup predictor is open source software and can be cloned from GitHub: https://github.com/hprovyn/str-to-haplogroup-predictor

Please always give a link to this original website as a reference.



Enter STR Alleles for Y-Haplogroup Prediction


Enter STRs in one of two formats, then press ENTER.

Standard format recognizes the following STRs: DYS391, DYS389I, DYS437, DYS439, DYS389II, DYS438, DYS426, DYS393, YCAII, DYS390, DYS385, Y-GATA-H4, DYS388, DYS447, DYS19, DYS392, DYS458, DYS455, DYS454, DYS464, DYS448, DYS449, DYS456, DYS576, CDY, DYS460, DYS459, DYS570, DYS607, DYS442, DYS728, DYS723, DYS711, DYR76, DYR33, DYS727, DYR157, DYS713, DYS531, DYS578, DYF395, DYS590, DYS537, DYS641, DYS472, DYF406S1, DYS511, DYS557, DYS490, DYS446, DYS481, DYS413, DYS534, DYS450, DYS425, DYS594, DYS444, DYS520, DYS436, DYS565, DYS572, DYS617, DYS568, DYS487, DYS640, DYS492, DYR112, DYS518, DYS614, DYS626, DYS644, DYS684, DYS710, DYS485, DYS632, DYS495, DYS540, DYS714, DYS716, DYS717, DYS505, DYS556, DYS549, DYS589, DYS522, DYS494, DYS533, DYS636, DYS575, DYS638, DYS462, DYS452, DYS445, Y-GATA-A10, DYS463, DYS441, Y-GGAAT-1B07, DYS525, DYS712, DYS593, DYS650, DYS532, DYS715, DYS504, DYS513, DYS561, DYS552, DYS726, DYS635, DYS587, DYS643, DYS497, DYS510, DYS434, DYS461, DYS435

If you tested with FTDNA, you must check the "FTDNA Format" and copy-paste your STR results table row.
It will be tab separated like this: 12 15 10 14-17 11...
(Otherwise the necessary transformation of Y-GATA-H4 value to NIST format will not take place.)

In either case, use hyphens to separate palindromes, like 13-15-15-18

Standard Format: $STR1=$ALLELE1,$STR2=$ALLELE2, ... OR $STR1 $ALLELE1 $STR2 $ALLELE2 ...
FTDNA Tab Separated Format: $ALLELE1 TAB $ALLELE2 TAB $ALLELE3... in default FTDNA order containing 12, 25, 37, 67 or 111 STR Alleles


Experiment Information


The YSEQ Haplogroup Predictor was influenced by the ideas of Whit Athey's Haplogroup Predictor (2006) and the NevGen Predictor from Milos Cetkovic Gentula and Aco Nevski (2014).

While both mentioned haplogroup predictors use a Bayesian-Allele-Frequency approach, this YSEQ predictor uses machine learning with the random forest technique.
Machine learning is in its infancy, so this predictor is unlikely to give you better or more precise results than the Bayesian predictor types, but at least you can consider it as a second opinion with an independent method.
Note that the YSEQ predictor is based on results of YSEQ customers, but it doesn't reveal the underlying STR profiles and original sample donors (they are not even stored on this web server at all). A computer based random number generator is the origin for creating random decision trees which are then just tested with a real life truth set. The teaching process simply selects the best decision trees and uses them for the prediction process.
Please consider this beta version as an experiment which is largely untested and which needs a lot of improvements for reliable haplogroup predictions. We hope that when the number of samples increases, more and more outlier cases will be covered and considered for the prediction. We hope that this tool will become useful for the genetic genealogy community. The YSEQ Haplogroup Predictor comes with no warranty, explicit or implied.