I wrote the following article for the The Bulletin, published by the Genealogy Forum of Oregon (Portland) 16 Jun 2015 as part of a series of DNA lessons for the membership. I am asked rather often about taking a SNP test.
by Emily D. Aulicino
Every plant and animal has a
phylogenetic tree, including humankind, of course. A phylogenetic tree shows
the inferred evolution of a species. Genetic genealogists often refer to the
human phylogenetic tree as the haplogroup tree. There are haplogroup trees for
the all-male and all-female lines. Haplogroups are decided through testing
either the Y-chromosome DNA or the mitochondrial DNA. Testing the full
mitochondria provides the haplogroup in detail, and no other testing is needed.
However, further testing is needed to fine-tune a haplogroup for the
Y-chromosome DNA (“Y-DNA”); therefore, additional SNP testing is done only for
the Y-chromosome.
Many people who
are new to genetic testing for genealogy are confused by the terms STR (short
tandem repeat, pronounced by the individual letters, S-T-R); SNP (single
nucleotide polymorphism, pronounced SNiP); haplotype (DNA results – explained
further below); and haplogroup (a group of related haplotypes constituting a
twig on the world family tree). That is, STR marker results make up a
Y-chromosome haplotype or Y-test results, and a group of haplotypes who share
the same common SNP form a haplogroup. Knowing these terms will help the
researcher more clearly understand the various Y-DNA tests and how they relate
to genealogy.
STRs and Haplotypes
An STR is a short pattern of the
four bases in our DNA, namely adenine (A), cytosine (C), guanine (G), and
thymine (T) repeated in tandem. The number of times this pattern is repeated
determines a marker result for a Y-STR test. For example: GATAGATAGATA is a
pattern repeated three times. Thus, the marker result would be “3” on a report.
The repeating pattern can be two to five bases long. Each marker has a range in
which it repeats. For instance, DYS 393 is an area on the Y chromosome known to
repeat its pattern from 9 to 17 times (normally), so the result of that marker
in a tested person could be any number from 9 to 17. Y-DNA test results are
determined by the number of STRs or short tandem repeats on different places on
the Y chromosome. The test results are referred to as the DNA signature or
haplotype.
An example of a Y-DNA
STR Result
Note that Y-37,
Y-67, and higher number Y tests are really Y-DNA STR tests, but most people
just refer to them as Y-DNA tests, thus adding to the confusion. The number
after the “Y” indicates how many STRs are being tested.
SNPs and Haplogroups
A SNP is the most common type of
genetic variation among people. Each SNP represents a difference in a single
DNA building block, called a nucleotide, which is also comprised of one of the
four bases in our DNA, among other things. For example, the base cytosine (C)
may be replaced with the base thymine (T) in a certain stretch of DNA (public
domain information from the National Library of Medicine [NLM]). To be
classified as a SNP, a change must be present in at least one percent of the
general population.
`SNPs have
unique names such as M207 or P224. The letter indicates what lab found the SNP
(M is for Peter Underhill, Ph.D. of Stanford University and P is for Michael
Hammer, Ph.D. of the University of Arizona) while the number indicates the
number of SNPs that have been located by the lab. That is, M207 is the 207th
SNP found by this lab.
DNA Double Helix graphic is courtesy of Apers0n, via Wikimedia Commons
A person tests
either positive or negative for a particular SNP, and this helps determine
where a tester is on the phylogenetic tree (the world’s family tree). That is,
testing SNPs helps determine the haplogroup. The more SNPs tested, the more
detailed or refined the haplogroup will be. DNA testing companies originally
used an alternating letter and number system; however, these strings of letters
and numbers became quite long as more information was acquired and tests
improved. Therefore, companies use the terminal SNP as the haplogroup
designation. The terminal SNP is the last (as in chronologically the most
recent) SNP for which a person tests positive. Of course, as more SNPs are
discovered and more testing is done, the terminal SNP will change. No company,
lab, nor organization has a full list of Y-DNA SNPs - yet.
Old-style
haplogroup: R1a1a1b2a2b1b
New-style
haplogroup: R- F2935
The
International Society of Genetic Genealogy (ISOGG) Tree places SNPs based upon
evidence of where they belong on their haplogroup tree. They have a public
standard so people can know how the organization determines what SNPs to place
where. ISOGG attempts to update the tree and the new haplogroups frequently.
You can find the listing criteria standard for inclusion of SNPs into the ISOGG
Y-DNA Haplogroup Tree here:
The ISOGG Y-DNA SNP tree is at http://www.isogg.org/tree/
Members of a haplogroup share the same common
ancestor. Unfortunately, this common ancestor is very likely beyond genealogical
records. Therefore, haplogroup project administrators are interested in more
ancient migration patterns whereas the usual DNA tester is a genealogist trying
to further his or her family history.
After receiving
the result of a Y-DNA STR test, it is important to join the appropriate
haplogroup as well as your surname group. Haplogroup administrators run
projects that look at ancient ancestry which tend to be quite different from
projects for a surname. Y-DNA testers may receive a request from their
haplogroup administrator to do testing for particular SNP markers. These
requests, seemingly out of the blue, can be quite a puzzle for genealogists. So
why are they beneficial, and how can they help the tester?
WHY IS SNP TESTING
BENEFICIAL?
The more refined the haplogroup,
the closer the testers of that haplogroup are to each other genetically.
Haplogroup trees
have grown immensely since the recent increased interest in SNP testing. The
following N haplogroup for Y-DNA currently seems to be one of the smallest and,
therefore, a relatively easy group to use as an example. If you think of a
haplogroup as its own tree with branches and twigs, then in this case N is the
trunk of the tree with N* and N1 being major branches. N (or any other solo
letter in the phylogenetic tree) is sometimes called the parent haplogroup.
When the parent haplogroup designator is followed by an asterisk, it is
possible that those testers who fall under the haplogroups with the asterisk
may not possess any additional unique markers or those unique markers have yet
to be discovered. When (or if) such additional unique SNP markers are
discovered then such a tester(s) involved will be given a new, unique subclade
(branch of the tree).
Off the major
branch N1, there are smaller branches N1*, N1a, N1b, and N1c, as seen in the
following chart. The term subclade is used for any haplogroup that is beneath
(contains more alternating letters and numbers) the basic haplogroup. In this
case N*, N1, N1b1, etc. are all subclades of Haplogroup N.
SNPs break down
the haplogroup and subclades into smaller subsets. As previously stated, these
SNPs have unique names determined by the lab that discovered them; however, if
multiple labs discover the same SNP each may name it, so some SNP may have multiple
names. Notice in the following chart, some SNPs are separated by a forward
slash (/) while others are separated by a comma (,). Those with the slash were
discovered and named by multiple labs while the others were not.
The SNPs listed
on each line are those required for that subclade. For example, a person who is
in subclade N1b1, must test positive for every SNP on that line (L731 and L733)
as well as every SNP above it back to N. Of course, a person in a haplogroup
like N must also test positive for every SNP from N back to Y-DNA Adam.
Remember N is just one of the branches of the oldest known haplogroup A00
(Y-DNA Adam). (See the ISOGG Y-Haplogroup Tree as previously mentioned.
N M231/Page91, M232/M2188
• N* -
• N1 CTS11499/L735/M2291
• • N1* -
• • N1a P189.2
• • N1b L732
• • • N1b* -
• • • N1b1 L731, L733
• • N1c L729.1/M2087.1/Z15.1/Z548.1
• • • N1c* -
• • • N1c1 M46/Page70/Tat, L395/M2080, P105
• • • • N1c1* -
• • • • N1c1a M178, P298
• • • • • N1c1a* -
• • • • • N1c1a1 L708/Z1951, F4325/L839v
After more
people do SNP testing on any of these branches, more branches and twigs will
appear. These would be named N2, N3, etc. which would line up in the same
column as N1 with their own subclades and SNPs. See the contrived haplogroup
tree below.
N M231/Page91, M232/M2188
• N* -
• N1 CTS11499/L735/M2291
• • N1* -
• • N1a P189.2 etc.
• N2 (plus newly found SNPs)
• • N2* -
• • N2a (plus newly found SNPs) etc.
• N3 (plus newly found SNPs)
• • N3* -
• • N3a (plus newly found SNPs) etc.
|
One of the goals
of a haplogroup administrator is to narrow the distance between written records
and the ancient migration pattern(s) of their group. By doing some selective
SNP testing, the administrator can determine what groups were established more recently
than others because SNPs mutate over time. Geneticists have designated some
periods when particular SNPs occurred and the more data they discover from
additional SNP testing will help them perfect their timelines and determine
more recent haplogroups, thus placing testers into groups that occurred more
closely to genealogical time.
When a
haplogroup administrator asks a tester to take a SNP test, that administrator
is trying to narrow this gap and determine which participants are more closely
related to each other than they are to the whole group. SNP testing helps the
entire haplogroup in establishing closely related testers. But how does this
benefit the tester who is more interested in his genealogy?
HOW DO SNP TESTS BENEFIT GENEALOGISTS?
Genealogists use DNA tests to
verify their lineage and to find others with whom they can research. Taking
advantage of all types of DNA testing helps all aspects of our genealogy and
ensures the accuracy and understanding of our results. The following examples
may illustrate how SNP testing is important to the genealogist.
Confirming a Haplogroup
A few years ago, a DNA testing
company reported a wrong haplogroup for an accountant from Florida, stating
that the man was a genetic descendant of Genghis Khan. Two major U.S.
newspapers reported this finding, and after Family Tree DNA (FTDNA) tested the
man, his haplogroup was clarified. The newspapers wrote retractions, and
Bennett Greenspan, President of FTDNA began the company’s SNP assurance program
that, in essence, states if the haplogroup cannot be derived from the
haplotype, then the SNP testing would be performed free of charge.
With a few
marker results it can be difficult to assess the haplogroup, especially in the
more common haplogroups. For this reason, a tester should test at a Y-37 marker
level or higher.
Source:
http://www.isogg.org/whysnp.htm
Confirming the Paper Trail
An African American member of a
surname group was predicted by the testing company to be in Haplogroup I1b.
This haplogroup suggests that his paternal line came from Europe, rather than
Africa. The participant had traced his ancestry through traditional
genealogical research back to a slave who lived in the mid-1800s, and he wondered
if the slave might have been the son of someone in the family who owned him.
However, a Genealogical Forum of Oregon Volume 64, No. 4 19 descendant of the
owner’s family in the project did not match his STR profile. SNP testing was
ordered and the participant was found to be in Haplogroup B, which is found
almost exclusively in sub-Saharan Africa. Now the participant knows the real
origin of his paternal line.
- Contributed by Whit Athey
Source: http://www.isogg.org/whysnp.htm
Determining Extremely Rare DNA
Several dozen people tested
positive for M201, so they were within Haplogroup G, but they were found to be
negative for every other SNP within G then being offered commercially. Finally,
a few members of this group were tested in a small research study for what was
thought to be an extremely rare SNP, M377; this resulted in defining Haplogroup
G5, which had only been observed previously in two Pakistani men. Now the
European branch of this haplogroup has something that clearly unifies them and
adds to their sense of identity. Essentially all in this group are Ashkenazi
Jews from Eastern Europe, though some did not previously know their origin.
- Contributed by Whit Athey
Creating Subgroups within a Larger Haplogroup
SNP testing refines ancestral
origins and helps to differentiate between members of the same haplogroup.
Testing positive for additional SNPs puts a person in a more select group with
others in the same haplogroup. This means you can narrow the people with whom
you match. For those who do not match you on the SNPs you are not related for
thousands of years. With each SNP for which you test positive, your DNA
signature gets closer to indicating relationships within recorded history.
The Talley
Project had three to four people whose haplogroups could not be determined
without doing SNP testing. The testing helped determine if those with no
haplogroup predictions were related, even remotely or not recently at all. It
also showed if there would be a new haplogroup for the surname. SNP testing
would also indicate if these testers could be a product of convergence; that
is, they are matching the haplotype, but are not a member of the haplogroup and
therefore not related. The result of testing indicated that the testers were
more closely related to each other than to the entire group. They became their
own subgroup within the haplogroup.
- Contributed by Emily Aulicino - Administrator for the Talley DNA
Project
Narrowing the Gap
SNP testing narrows the gap
between written genealogy and ancient genealogy. I tested my paternal Doolin
cousin with the Y-111 test. He matches a couple of Doolins and many other
surnames, such as Lawlor, Kelley, Moore, etc. The paper trail ends about 1750
in Virginia. I know the line was Irish or Scots-Irish, but where in the native
land, I had no idea. I joined my cousin to a subclade haplogroup according to
his terminal SNP at that time.
The haplogroup
administrators e-mailed to ask him to take a SNP test when they saw that my
Doolin cousin and the six other names had common markers. I did so for the sake
of the group and because I know those administrators are trying to use the SNPs
to lessen the gap between the genealogical records timeframe and ancient migrations.
I followed their suggestions and now know that the surname was probably
O’Dowling in the mid- 1600s in County Loais, Ireland. We are one of the Seven
Septs of Loais that the British tried to disband in the mid-1600s. I now have
about a 100 year gap between my paper trail and my ancestral origins, instead
of infinity. Recent analysis by the haplogroup administrators estimates that my
surname existed about 1300 AD and that the terminal SNP L1402 began about 800
AD. I realize that my line may have lived in other locations before coming to
America, but it gives me a place to start researching, and in time, haplogroup
administrators will learn more through their SNP testing.
- Contributed by Emily Aulicino
Determining Unique Novel SNPs
With the advent of the Big Y test
at FTDNA (www.familytreedna.com), a
male can be tested for 25,000 SNPs. Although not everyone will test positive
for all 25,000, the more people who take this test the higher the likelihood
that testers in the same haplogroup subgroup will find that they are more
closely related than one thought. A great benefit from this test is that novel
(newly found) SNPs will allow the creation of more subclades within a
haplogroup thus bringing the common ancestor nearer to genealogical time.
Private SNPs can be discovered as well. These SNPs may or may not remain
private; that is, belonging to a family for the past few generations. Over
time, some of these private SNPs may be found more extensively and thus help
narrow the subclades as well. The Big Y test is not a test to use for finding
matches within a genealogical time frame, but is for more ancient ancestry
which makes it of more interest to the haplogroup administrators. However, the
test could be of interest for those who wish to contribute to the overall
knowl- The Bulletin 20 June, 2015 edge of genetic testing. Besides the Big Y,
FTDNA offers individual SNP testing along with various haplogroup SNP panels
which are being created in collaboration between haplogroup administrators and
FTDNA. See http://www.isogg.org/wiki/Y-DNA_SNP_testing_chart
SNP TESTING RESOURCES
Astrid Krahn who, along with her
husband Thomas Krahn, owns YSEQ (http://www.yseq.net/)
states that their company “offers every public or private SNP on the male
specific region of the Y chromosome as long as it can be technically tested
with the Sanger sequencing method” and that “there is no practical limit to the
number of SNPs that YSEQ offers since every SNP can be wished for. The number
on the menu (top left) on our website only reflects the SNPs that have been
practically ordered and that we have confirmed with actual sequencing results.”
As of printing time, their website lists over 11,000 SNPs and 59 Custom SNPs.
Tests can be ordered separately or in panels.
Other companies
conducting SNP testing include Genographic Geno 2.0, although it is not used as
much as it used to be (https://genographic.nationalgeographic.com/)
and YFull that is helpful to people with ancestry in Eastern Europe or Asia (http://www.yfull.com/.) Also, both Full
Genomes (https://www.fullgenomes.com/)
and BritainsDNA Chromo 2.0 (https://www.britainsdna.com/)
are used by those very interested in SNP testing.
ISOGG has a
comparison chart for some of these companies at http://www.isogg.org/wiki/Y-DNA_SNP_testing_chart.
The ISOGG Y-DNA Haplogroup Tree is so powerful that not only the genetic
genealogists use it, but various genetic labs around the world also visit.
SUMMARY
SNP testing can be beneficial to
the genetic genealogy community as a whole as well as to individual testers
depending upon their desire to determine who is more specifically related on
the Y-chromosome as well as narrowing the gap between genealogical time and
ancient migrations. The exact number of SNPs for the Y-chromosome is not yet
known, but as of February 2015 Alice Fairhurst (team leader for the ISOGG Y-DNA
Haplogroup Tree) reported that there are 15,888 uniquely named SNPs whose
location on the tree are identified. ISOGG YBrowse has more than 120,000 SNP
names, but as of this writing, the site is not operational.
Both Thomas
Krahn’s company YSEQ and the ISOGG tree show the equivalent names of SNPs that
were discovered by multiple labs and so given multiple names.
When you know a
little about a subject, it is easy to make judgements based on the knowledge.
However, as knowledge increases, beliefs change. In the early years,
geneticists discovered SNPs that helped them place testers into haplogroups.
More SNPs were discovered and those haplogroups were refined, creating many
subclades. Some testers’ haplogroups were changed completely. Now that
thousands of SNPs have been discovered, geneticists are seeing some unique
situations surrounding these special markers. Some scientists question the
quality of some SNPs, believing that they are not viable enough to use for
haplogroups while others are not in agreement with how some SNPs are placed on
the haplogroup tree. All this will take time to sort out as we gain more
knowledge in understanding these markers. And, just as scientists now believe
that Haplogroup R is more recent than previously thought based on new
discoveries; we may find major changes in the structure of the phylogenetic
tree as more information surfaces.
No doubt, the
SNP testing currently available is only a small step toward what the future
holds for genealogy testing as this is just scratching the surface of the
estimated 12.8 million SNPs in the human genome according to the National Center
for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/books/NBK44423/).
The decision to SNP or not to SNP should be left to the individual tester with
guidance from the haplogroup administrators.
Permission has been given by the International Society of Genetic
Genealogy (ISOGG) to use any references to their website, including the Success
Story examples.
Originally written for
the Genealogical Forum of Oregon’s Bulletin, June 2015, p. 16-20.