01 March 2014

Big Y is rolling on...


      My, oh my!  The Big Y is here!

Since the Family Tree DNA Conference last November, thousands of Big Y tests have been ordered, but only 100 Big Y tests were released February 27th with the remaining initial orders to be delivered over the next month.  At that point the backlog should be resolved, and the Big Y will keep rolling on...

The Big Y explores your deep ancestral lines and is intended for those who are interested in discovering what SNPs appear on their Y-chromosome. This extensive test uses next-generation sequencing and maps the SNP positions. The Y-chromosome has about 60 million bases, but a large percentage is inaccessible or cannot be reliably tested with today’s technology.  The gold standard is said to be 10 million bases, but the Big Y targets over 13 million and gets from 11.5 to 12.5 million reads on the average.  The Big Y tests over 25,000 SNPs out of the 36,562 known Y-SNPs in the FTDNA database.

First, let’s back-up and clarify what a SNP is:
      A single nucleotide polymorphism (SNP; pronounced snip) is the most common type of genetic variation in humans, according to the Genetics Home Reference website.  SNPs can be found throughout a person’s DNA roughly one SNP in every 300 nucleotides on average. With 3 billion base pairs in the human genome, that is about 10 million SNPs per human. Each SNP is a form of mutation (change) and represents a difference in a single DNA building block, called a nucleotide. For example, a SNP may result in the replacement of the base cytosine (C) or guanine (G) with the base thymine (T) or adenine (A) at a certain location on a chromosome.  Generally, these markers mutate only once, although back mutations or reversions (a situation where a mutation results in the nucleotide being restored to its previous condition at that location) do occur.
      As a mutation is carried forward in the consecutive generations, SNP markers help determine haplogroups for Y-chromosome DNA. For genealogists, narrowing those haplogroups to detailed subclades (sub-branches) is beneficial.  It helps them determine with whom in the general population they have a closer genetic relationship (a match).
      When a new SNP is found, a new haplogroup subclade is determined, but before the geneticists declare a new haplogroup subclade, a required minimum number of people must have the new SNP. This is the major reason why some haplogroup project administrators will ask testers from projects to test for a certain SNP. These haplogroup administrators wish to increase the number of known testers for that SNP so the geneticists will place it on the phylogenetic tree. All the descendants of the person with this newly discovered SNP will carry that mutation, and that mutation defines that new group.

Big Y Results
The customer’s personal web pages will contain two tabs.  The first is for reporting Known SNPs while the second is for Novel Variants; that is, the list of SNPs not on the list of 36,582 known and previously names SNPs.  So clearly, this test may produce some future SNPs among some families or clans which are currently not recognized.

Known SNPs
The customer’s webpage for Known SNPs includes several columns: SNP Name, Derived?, On Y-Tree?, Reference, Genotype, Confidence.  The default is 10 items shown at a time, but more can be viewed.  The lower left tells you how many entries are in your result, stating something similar to “Showing 1 of 10 of 25,000 of 36,564”.  All the results are shown, both positive and negative so there is no question about whether a specific location was tested, or its results.  There is also the option of a no-call or poor confidence call at that location.

The SNP Name column is just the names of the SNPs listed alphabetically. You can search on the full SNP name or a partial name. 

The Derived? column indicates whether the particular genotype is ancestral or derived.  Derived means the individual is positive for the SNP.  There options to SHOW ALL, YES (+), NO (-) or ?.  The question mark means the SNP is a no call, the SNP is not in the coverage region or there is not enough data to determine its value.)  The default is YES.

On Y-Tree? column indicates whether the SNP exists on the Y-chromosome phylogentic tree.  The options to filter this column are: SHOW ALL, YES, NO.

The Reference column provides the nucleotide base (adenine, cytosine, thymine or guanine) as indicated by the GRCh37 human reference genome which is maintained by the Genome Reference Consortium.  Those options include: SHOW ALL, A, C, T, G

The Genotype column is the tester’s nucleotide base at a given SNP potion.  Those options include:  SHOW ALL, A, C, T, G, ?.  The question mark means the result is not known.

The Confidence column is a score that represents how confident FTDNA is in the accuracy of the data for that SNP.  Options include:  SHOW ALL, HIGH (no question at that location), MEDIUM (ok, but the bottom of the border line to be called high) and UNKNOWN (not clear enough to call or no data).

Novel Variants
The second results page covers Novel Variants which are the list of SNPs that not on the list of 36,582 known and previously named SNPs.  These SNPs will be analyzed on an on-going basis in the case a SNP is significant enough to be given a name and added to the Y-phylogenetic tree (Y-haplotree).  However, some SNPs may remain novel and could signify a private, family or clan variant.  Project administrators may submit these novel SNPs to be considered for naming via the SNP Request form on their GAP (Group Administrator Pages) website. However, getting these new SNPs approved will take time.

This page provides columns for Position, Reference, Genotype and Confidence.  These columns are the same categories as above, but are for the SNPs not on the Y-haplotree.  All novel mutations are being reported by a reference number which can be compared to like data from any source.

There is also the following links on each page: 
Help (refers you to the FAQs on the topic, but FTDNA is transitioning to their Learning Center)
Haplotree (links to your haplogtree page)
Y Reset (allows you to rest the filters to the default filters)
Download Raw Data

Downloading the raw data cannot be done until sometime next week.  Files can be downloaded by using 3rd party tools, VCF (Variant Call File) which is a tab-delimited system that can be imported to Excel (A sample VCF file can be found in the 1000 Genomes Wiki) and BED file  which is a text file that shows a range of positions.  BAM files will be available soon as the delivery method is being finalized since the file is so large.  Insertions and deletions are included in the download files, but not reported on the customer’s results page.

Once all the customer data from this initial sale is loaded into a huge database, the new SNPs that are found in enough of the population to be named will be added to the Y-haplotree, and at that time, a customer who has not taken the Big Y could order that particular SNP. Novel variants will remain that, and will continue to be reported on client pages.

In about six weeks FTDNA will publish a paper on the average number of novel SNPs per person which are being found, the mutation rates, and findings.

HaploTree version 2014
The new Y-tree is coming, honest!  The reason for the delay includes the “SNP tsunami” as the Geno 2.0 has helped increase the size of the tree to about 8 times larger than it was four years ago. The Big Y will increase it even more.  Negative results for SNPs that are downstream of the most terminal positive SNP will be included.  For example L21+ on the R tree has 9 downstream SNPs which will be reported on the tree, even the negative result for a tester. However, if there are no negative SNPs for an already terminal SNP, there can be no further reporting as in the case of M222+.  Future updates to the tree will record the positive SNPs on your webpage.

Suggestions Welcomed!
If you have a suggestion for additional Big Y tools or features, email Elise or Rebekah and they will forward the request to management for consideration.

To learn more about the Big Y consult the following, as I did.
Ordering Individual SNPs or the Big Y
Once the results of the thousands of tests which have been ordered are available, others who appear to have the same Y-SNP signature may wish to order individual SNPs to determine their terminal SNP (last currently known SNP on the Y-haplotree for which you test positive), but remember that 17 SNPs at $39 each is the cost of the Big Y so confer with your haplogroup administrator first. Once the backlog is cleared, the results can be expected in about 8-10 weeks. Order from Family Tree DNA.

1 Mar 2014

