25 April 2014

Family Tree DNA 2014 Y-DNA Haplogroup Tree Update

Family Tree DNA released the following announcement regarding the Y-DNA Haplogroup Tree update, but note that the website has a message delaying the upload.  With such a massive change, it is always possible there are last minute surprises.  However, it will no doubt arrive shortly.  FTDNA is to be commended on taking this daunting task!

Direct from FTDNA:

Family Tree DNA created the 2014 Y-DNA Haplotree in partnership with the National Geographic Genographic Project using the proprietary GenoChip. Launched publicly in late 2012, the chip tests approximately 10,000 Y-DNA SNPs that had not, at the time, been phylogenetically classified.

The team used the first 50,000 male samples with the highest quality results to determine SNP positions. Using only tests with the highest possible “call rate” meant more available data, since those samples had the highest percentage of SNPs that produced results, or “calls.” 

In some cases, SNPs that were on the 2010 Y-DNA Haplotree didn’t work well on the GenoChip, so the team used Sanger sequencing on anonymous samples to test those SNPs and to confirm ambiguous locations.

For example, if it wasn’t clear if a clade was a brother (parallel) clade, or a downstream clade, they tested for it.

The scope of the project did not include going farther than SNPs currently on the GenoChip in order to base the tree on the most data available at the time, with the cutoff for inclusion being about November of 2013.

Where data were clearly missing or underrepresented, the team curated additional data from the chip where it was available in later samples. For example, there were very few Haplogroup M samples in the original dataset of 50,000, so to ensure coverage, the team went through eligible Geno 2.0 samples submitted after November, 2013, to pull additional Haplogroup M data. That additional research was not necessary on, for example, the robust Haplogroup R dataset, for which they had a significant number of samples.

Family Tree DNA, again in partnership with the Genographic Project, is committed to releasing at least one update to the tree this year. The next iteration will be more comprehensive, including data from external sources such as known Sanger data, Big Y testing, and publications. If the team gets direct access to raw data from other large companies’ tests, then that information will be included as well. We are also committed to at least one update per year in the future.

Known SNPs will not intentionally be renamed. Their original names will be used since they represent the original discoverers of the SNP. If there are two names, one will be chosen to be displayed and the additional name will be available in the additional data, but the team is taking care not to make synonymous SNPs seems as if they are two separate SNPs. Some examples of that may exist initially, but as more SNPs are vetted, and as the team learns more, those examples will be removed.

In addition, positions or markers within STRs, as they are discovered, or large insertion/deletion events inside homopolymers, potentially may also be curated from additional data because the event cannot accurately be proven. A homopolymer is a sequence of identical bases, such as AAAAAAAAA or TTTTTTTTT. In such cases it’s impossible to tell which of the bases the insertion is, or if/where one was deleted. With technology such as Next Generation Sequencing, trying to get SNPs in regions such as STRs or homopolymers doesn’t make sense because we’re discovering non-ambiguous SNPs that define the same branches, so we can use the non-ambiguous SNPs instead.  Some SNPs from the 2010 tree have been intentionally removed. In some cases, those were SNPs for which the team never saw a positive result, so while it may be a legitimate SNP, even haplogroup defining, it was outside of the current scope of the tree. In other cases, the SNP was found in so many locations that it could cause the orientation of the tree to be drawn in more than one way. If the SNP could legitimately be positioned in more than one haplogroup, the team deemed that SNP to not be haplogroup defining, but rather a high polymorphic location. 

To that end, SNPs no longer have .1, .2, or .3 designations. For example, J-L147.1 is simply J-L147, and I-147.2 is simply I-147.  Those SNPs are positioned in the same place, but back-end programming will assign the appropriate haplogroup using other available information such as additional SNPs tested or haplogroup origins listed. If other SNPs have been tested and can unambiguously prove the location of the multi-locus SNP for the sample, then that data is used. If not, matching haplogroup origin information is used.

We will also move to shorthand haplogroup designations exclusively. Since we’re committing to at least one iteration of the tree per year, using longhand that could change with each update would be too confusing.  For example, Haplogroup O used to have three branches: O1, O2, and O3. A SNP was discovered that combined O1 and O2, so they became O1a and O1b.

There are over 1200 branches on the 2014 Y Haplogroup tree, as compared to about 400 on the 2010 tree. Those branches contain over 6200 SNPs, so we’ve chosen to display select SNPs as “active” with an adjacent “More” button to show the synonymous SNPs if you choose.  

In addition to the updates, any sample tested with the Genographic Project's Geno 2.0 DNA Ancestry Kit, then transferred to FTDNA will automatically be re-synched on the Geno side. The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks.  At that time, all Geno 2.0 participants' results will be updated accordingly and will be accessible via the Genographic Project website. 

2014 Haplotree Fast Facts

*  Created in partnership with National Geographic’s Genographic Project
*  Used GenoChip containing ~10,000 previously unclassified Y-SNPs
*  Some of those SNPs came from Walk Through the Y and the 1000 Genome Project
*  Used first 50,000 high-quality male Geno 2.0 samples 
*  Verified positions from 2010 YCC by Sanger sequencing additional anonymous samples
*  Filled in data on rare haplogroups using later Geno 2.0 samples


*  Expanded from approximately 400 to over 1200 terminal branches
*  Increased from around 850 SNPs to over 6200 SNPs
*  Cut-off date for inclusion for most haplogroups was November 2013

Total number of SNPs broken down by haplogroup

A 406
DE 16
IJ 29
LT 12
P 81
B 69
E 1028
M 17
Q 198
BT 8
F 90
J 707
N 168
R 724
C 371
G 401
K 11
NO 16
S 5
CT 64
H 18
O 936
T 148
D 208
I 455


myFTDNA Interface

*  Existing customers receive free update to predictions and confirmed branches based on existing SNP test results.
*  Haplogroup badge updated if new terminal branch is available
*  Updated haplotree design displays new SNPs and branches for your haplogroup
*  Branch names now listed in shorthand using terminal SNPs
*  For SNPs with more than one name, in most cases the original name for SNP was used, with synonymous SNPs listed when you click “More…”
*  No longer using SNP names with .1, .2, .3 suffixes. Back-end programming will place SNP in correct haplogroup using available data.
*  SNPs recommended for additional testing are pre-populated in the cart for your convenience. Just click to remove those you don’t want to test.
*  SNPs recommended for additional testing are based on 37-marker haplogroup origins data where possible, 25- or 12-marker data where 37 markers weren’t available.
*  Once you’ve tested additional SNPs, that information will be used to automatically recommend additional SNPs for you if they’re available.
*  If you remove those prepopulated SNPs from the cart, but want to re-add them, just refresh your page or close the page and return.  
*  Only one SNP per branch can be ordered at one time – synonymous SNPs can possibly ordered from the Advanced Orders section on the Upgrade Order page.
* Tests taken have moved to the bottom of the haplogroup page.

Coming attractions

*  Group Administrator Pages will have longhand removed.
*  At least one update to the tree to be released this year.
*  Update will include: data from Big Y, relevant publications, other companies’ tests from raw data. 
*  We’ll set up a system for those who have tested with other big data companies to contribute their raw data file to future versions of the tree.
*  We’re committed to releasing at least one update per year.
*  The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all the Geno 2.0 participants' results will be updated accordingly and accessible via the Genographic Project website.

A few added details from today's Webinar by Elise Friedman of Family Tree DNA:

*  The personal webpages for testers have been upgraded to the shorthand SNP version whereas the GAP pages and a few others will be upgraded soon
*  Personal pages list the Haplogroup Badges.  If yours does not appear, go to your Y-DNA Haplogroup page and then back to your home page.  This should add the badge.
*  Your personal Haplogroup page goes automatically to your haplogroup.  A legend listing:  Tested Positive; Tested Negative; Presumed Positive; Test Available Presumed Negative; Test in Progress.
*  Presumed SNPs are those listed on the same SNP line, but the tester did not have this SNP. In order to test positive for the others on the same line, it is presumed the tester would test positive for this one in the brown color.
*  Synonymous SNPs are multiple SNPs additional SNPs that define the same branch of the tree.  Click on the MORE link to see those additional SNPs.
*  Your terminal SNP (the last SNP for which you tested positive) could be way down on the SNP chart
*  The SNP tests you have taken are listed at the bottom of the page whereas they were formerally at the top.
*  ADVANCED SNP Order Form allows you to order any SNPs that are not yet on the tree but which are available through Family Tree DNA.  Consult your haplogroup administrator to guide your purchase.
*  Recommended SNPs (in blue on the Haplogroup chart) are determined from an algorithm similar to that used by FTDNA for predicting a SNP.  Recommended SNPs are based on who you match and their haplogroup results at a 37 marker level.  If no recommendations are possible, then 25 markers are used; if none there, then 12 markers.  As always, no one can be certain you will test positive for a SNP you order so talk with your haplogroup administrator to get recommendations there.  Also consider ordering the Geno 2.0 test or the Big Y test if there are too many recommended SNPs and you feel you may end up spending a similar amount of money buying SNPs individually.  If someone in your group within your project has ordered the Big Y or Geno 2.0, you might contact them to see their terminal SNP and order that one.

See the Family Tree DNA Learning Center for more details.  If those pages have not yet been updated they will be so check back.  AND...do attend the FREE Family Tree DNA Webinars!

Whew!  That will keep those wonderful Haplogroup geeks busy for a time!  We love them and their dedication to working with ancestral DNA!


No comments: