My, oh my! The Big Y is here!
Since the Family Tree DNA Conference last November, thousands of Big Y tests have been ordered, but only 100 Big Y tests were released February 27th
with the remaining initial orders to be delivered over the next month. At that point the backlog should be resolved,
and the Big Y will keep rolling on...
The Big Y explores your deep ancestral lines and is intended
for those who are interested in discovering what SNPs appear on their
Y-chromosome. This extensive test uses next-generation sequencing and maps the SNP
positions. The Y-chromosome has about 60 million bases, but a large percentage
is inaccessible or cannot be reliably tested with today’s technology. The gold standard is said to be 10 million
bases, but the Big Y targets over 13 million and gets from 11.5 to 12.5 million
reads on the average. The Big Y tests
over 25,000 SNPs out of the 36,562 known Y-SNPs in the FTDNA database.
First, let’s back-up
and clarify what a SNP is:
A
single nucleotide polymorphism (SNP; pronounced snip) is the most common type of genetic variation in humans,
according to the Genetics Home Reference
website.
SNPs can be found throughout a person’s
DNA roughly one SNP in every 300 nucleotides on average. With 3 billion base
pairs in the human genome, that is about 10 million SNPs per human. Each SNP is
a form of mutation (change) and represents a difference in a single DNA
building block, called a nucleotide. For example, a SNP may result in the
replacement of the base cytosine (C) or guanine (G) with the base thymine (T)
or adenine (A) at a certain location on a chromosome. Generally, these
markers mutate only once, although back mutations or reversions (a situation
where a mutation results in the nucleotide being restored to its previous
condition at that location) do occur.
As a mutation is carried forward in the consecutive
generations, SNP markers help determine haplogroups for Y-chromosome DNA. For
genealogists, narrowing those haplogroups to detailed subclades (sub-branches)
is beneficial. It helps them determine
with whom in the general population they have a closer genetic relationship (a
match).
When a new SNP is found, a new haplogroup
subclade is determined, but before the geneticists declare a new haplogroup
subclade, a required minimum number of people must have the new SNP. This is
the major reason why some haplogroup project administrators will ask testers
from projects to test for a certain SNP. These haplogroup administrators wish
to increase the number of known testers for that SNP so the geneticists will
place it on the phylogenetic tree. All the descendants of the person with this
newly discovered SNP will carry that mutation, and that mutation defines that
new group.
Big Y Results
The customer’s personal web pages will contain two
tabs. The first is for reporting Known SNPs while the second is for Novel Variants; that is, the list of SNPs
not on the list of 36,582 known and previously names SNPs. So clearly, this test may produce some future
SNPs among some families or clans which are currently not recognized.
Known SNPs
The customer’s webpage for Known SNPs includes several columns: SNP Name, Derived?, On Y-Tree?, Reference, Genotype, Confidence. The default is 10 items shown at a time, but
more can be viewed. The lower left tells
you how many entries are in your result, stating something similar to “Showing
1 of 10 of 25,000 of 36,564”. All the
results are shown, both positive and negative so there is no question about
whether a specific location was tested, or its results. There is also the
option of a no-call or poor confidence call at that location.
The SNP Name column
is just the names of the SNPs listed alphabetically. You can search on the full
SNP name or a partial name.
The Derived? column
indicates whether the particular genotype is ancestral or derived. Derived means the individual is positive for
the SNP. There options to SHOW ALL, YES
(+), NO (-) or ?. The question mark
means the SNP is a no call, the SNP is not in the coverage region or there is not
enough data to determine its value.) The
default is YES.
On Y-Tree? column
indicates whether the SNP exists on the Y-chromosome phylogentic tree. The options to filter this column are: SHOW
ALL, YES, NO.
The Reference column
provides the nucleotide base (adenine, cytosine, thymine or guanine) as
indicated by the GRCh37 human reference genome which is maintained by the
Genome Reference Consortium. Those
options include: SHOW ALL, A, C, T, G
The Genotype column
is the tester’s nucleotide base at a given SNP potion. Those options include: SHOW ALL, A, C, T, G, ?. The question mark means the result is not
known.
The Confidence column
is a score that represents how confident FTDNA is in the accuracy of the data
for that SNP. Options include: SHOW ALL, HIGH (no question at that
location), MEDIUM (ok, but the bottom of the border line to be called high) and
UNKNOWN (not clear enough to call or no data).
Novel Variants
The second results page covers Novel Variants which are the list of SNPs that not on the list of
36,582 known and previously named SNPs.
These SNPs will be analyzed on an on-going basis in the case a SNP is significant
enough to be given a name and added to the Y-phylogenetic tree (Y-haplotree). However, some SNPs may remain novel and could
signify a private, family or clan variant.
Project administrators may submit these novel SNPs to be considered for
naming via the SNP Request form on
their GAP (Group Administrator Pages) website. However, getting these new SNPs
approved will take time.
This page provides columns for Position, Reference, Genotype and Confidence. These columns
are the same categories as above, but are for the SNPs not on the Y-haplotree. All novel mutations are being reported by a
reference number which can be compared to like data from any source.
There is also the following links on each page:
Help (refers you
to the FAQs on the topic, but FTDNA is transitioning to their Learning Center)
Haplotree (links
to your haplogtree page)
Y Reset (allows
you to rest the filters to the default filters)
Download Raw Data
Downloading the raw data cannot be done until sometime next
week. Files can be downloaded by using 3rd
party tools, VCF (Variant Call File) which is a tab-delimited system that can
be imported to Excel (A sample VCF file can be found in the 1000 Genomes Wiki)
and BED file which is a text file that shows a range of positions. BAM files will be available soon as the
delivery method is being finalized since the file is so large. Insertions and deletions are included in the
download files, but not reported on the customer’s results page.
Once all the customer data from this initial sale is loaded
into a huge database, the new SNPs that are found in enough of the population to
be named will be added to the Y-haplotree, and at that time, a customer who has
not taken the Big Y could order that particular SNP. Novel variants will remain
that, and will continue to be reported on client pages.
In about six weeks FTDNA will publish a paper on the average
number of novel SNPs per person which are being found, the mutation rates, and
findings.
HaploTree version 2014
The new Y-tree is coming, honest! The reason for the delay includes the “SNP tsunami”
as the Geno 2.0 has helped increase the size of the tree to about 8 times
larger than it was four years ago. The Big Y will increase it even more. Negative results for SNPs that are downstream
of the most terminal positive SNP will be included. For example L21+ on the R tree has 9
downstream SNPs which will be reported on the tree, even the negative result
for a tester. However, if there are no negative SNPs for an already terminal
SNP, there can be no further reporting as in the case of M222+. Future updates to the tree will record the
positive SNPs on your webpage.
Suggestions Welcomed!
If you have a suggestion for additional Big Y tools or
features, email Elise or Rebekah and they will forward the request to
management for consideration.
To learn more about the Big Y consult the following, as I
did.
FTDNA Learning Center (to view the various pages mentioned
above)
FTDNA Forums
Facebook
Genetic Genealogy Blogs
DNAeXplained – Genetic Genealogy http://dna-explained.com/2014/02/27/big-y-release/
You may always see a list of scheduled webinars on the
Family Tree DNA webinar page. http://www.familytreedna.com/learn/ftdna/webinars/
Ordering Individual SNPs or the Big Y
Once the results of the thousands of tests which have been ordered
are available, others who appear to have the same Y-SNP signature may wish to
order individual SNPs to determine their terminal SNP (last currently known SNP
on the Y-haplotree for which you test positive), but remember that 17 SNPs at
$39 each is the cost of the Big Y so confer with your haplogroup administrator
first. Once the backlog is cleared, the results can be expected in about 8-10
weeks. Order from Family Tree DNA.
Enjoy!
Emily
1 Mar 2014
No comments:
Post a Comment