You see, as a total sequence analysis dork, when I see names, I frequently ask whether the letters in the name include only letters which are used as amino acid abbreviations. I started this game when the brilliant notes/letters came out in Science in the early 90s about whether ELVIS was overrepresented in protein sequences. Of course, despite being 20 years old, Science still keeps these under wraps requiring registration to see them (see for example the Stevens letter).
Anyway, alas, three of the major candidates for the US election have names that do not use traditional amino acid abbreviations so I am stuck with analyzing Sarah Palin. But that is OK because of her professed aversion to evolution and support to Creationism (and since sequence analysis is inherently an evolutionary study).
So – I took here name and went to the NCBI Blast page and did some searches. And what came up? Well, here are some of the top hits from the blastp searches (which I used to compare the pretend peptide “SARAHPALIN” with all the peptides in the non redundant collection at Genbank).
>ref|XP_001545292.1|
hypothetical protein BC1G_16161 [Botryotinia fuckeliana B05.10]
gb|EDN25226.1|
predicted protein [Botryotinia fuckeliana B05.10]
Length=383
GENE ID: 5425746 BC1G_16161 | hypothetical protein
[Botryotinia fuckeliana B05.10]
Score = 26.9 bits (56), Expect = 189
Identities = 8/9 (88%), Positives = 8/9 (88%), Gaps = 0/9 (0%)
Query 1 SARAHPALI 9
SARA PALI
Sbjct 209 SARAQPALI 217
>ref|YP_061725.1|
homoserine dehydrogenase [Leifsonia xyli subsp. xyli str. CTCB07]
gb|AAT88620.1|
homoserine dehydrogenase [Leifsonia xyli subsp. xyli str. CTCB07]
Length=451
GENE ID: 2939000 thrA | homoserine dehydrogenase
[Leifsonia xyli subsp. xyli str. CTCB07] (10 or fewer PubMed links)
Score = 26.9 bits (56), Expect = 189
Identities = 8/9 (88%), Positives = 8/9 (88%), Gaps = 0/9 (0%)
Query 1 SARAHPALI 9
SAR HPALI
Sbjct 267 SARVHPALI 275
>ref|ZP_02031476.1| hypothetical protein PARMER_01474 [Parabacteroides merdae ATCC
43184]
gb|EDN87136.1| hypothetical protein PARMER_01474 [Parabacteroides merdae ATCC
43184]
Length=299
Score = 26.1 bits (54), Expect = 340
Identities = 7/8 (87%), Positives = 8/8 (100%), Gaps = 0/8 (0%)
Query 3 RAHPALIN 10
RAHPAL+N
Sbjct 170 RAHPALVN 177
>ref|XP_567332.1|
hypothetical protein CNJ01520 [Cryptococcus neoformans var. neoformans
JEC21]
ref|XP_773201.1|
hypothetical protein CNBJ1950 [Cryptococcus neoformans var. neoformans
B-3501A]
gb|EAL18554.1|
hypothetical protein CNBJ1950 [Cryptococcus neoformans var. neoformans
B-3501A]
gb|AAW45815.1|
hypothetical protein CNJ01520 [Cryptococcus neoformans var. neoformans
JEC21]
Length=437
GENE ID: 3254188 CNJ01520 | hypothetical protein
[Cryptococcus neoformans var. neoformans JEC21] (10 or fewer PubMed links)
Score = 26.1 bits (54), Expect = 340
Identities = 8/9 (88%), Positives = 8/9 (88%), Gaps = 0/9 (0%)
Query 1 SARAHPALI 9
SAR HPALI
Sbjct 415 SARQHPALI 423
>ref|YP_001626035.1|
citrate synthase [Renibacterium salmoninarum ATCC 33209]
gb|ABY24621.1|
citrate synthase [Renibacterium salmoninarum ATCC 33209]
Length=386
GENE ID: 5822379 RSal33209_2898 | citrate synthase
[Renibacterium salmoninarum ATCC 33209]
Score = 25.7 bits (53), Expect = 456
Identities = 9/11 (81%), Positives = 9/11 (81%), Gaps = 2/11 (18%)
Query 1 SARAHP--ALI 9
SARAHP ALI
Sbjct 218 SARAHPYAALI 228
>ref|YP_001817256.1|
integral membrane sensor hybrid histidine kinase [Opitutus terrae
PB90-1]
gb|ACB73656.1|
integral membrane sensor hybrid histidine kinase [Opitutus terrae
PB90-1]
Length=936
GENE ID: 6208547 Oter_0366 | integral membrane sensor hybrid histidine kinase
[Opitutus terrae PB90-1]
Score = 25.2 bits (52), Expect = 611
Identities = 7/7 (100%), Positives = 7/7 (100%), Gaps = 0/7 (0%)
Query 3 RAHPALI 9
RAHPALI
Sbjct 256 RAHPALI 262
>ref|YP_001757871.1|
putative anti-sigma regulatory factor, serine/threonine protein
kinase [Methylobacterium radiotolerans JCM 2831]
gb|ACB27188.1|
putative anti-sigma regulatory factor, serine/threonine protein
kinase [Methylobacterium radiotolerans JCM 2831]
Length=331
GENE ID: 6141303 Mrad2831_5232 | putative anti-sigma regulatory factor,
serine/threonine protein kinase [Methylobacterium radiotolerans JCM 2831]
Score = 25.2 bits (52), Expect = 611
Identities = 7/8 (87%), Positives = 8/8 (100%), Gaps = 0/8 (0%)
Query 2 ARAHPALI 9
ARAHPAL+
Sbjct 299 ARAHPALV 306
>ref|ZP_01466013.1| hydrolase, TatD family [Stigmatella aurantiaca DW4/3-1]
gb|EAU63211.1| hydrolase, TatD family [Stigmatella aurantiaca DW4/3-1]
Length=209
Score = 25.2 bits (52), Expect = 611
Identities = 7/7 (100%), Positives = 7/7 (100%), Gaps = 0/7 (0%)
Query 3 RAHPALI 9
RAHPALI
Sbjct 79 RAHPALI 85
>ref|YP_001558323.1|
glycosyl transferase group 1 [Clostridium phytofermentans ISDg]
gb|ABX41584.1|
glycosyl transferase group 1 [Clostridium phytofermentans ISDg]
Length=357
GENE ID: 5743305 Cphy_1206 | glycosyl transferase group 1
[Clostridium phytofermentans ISDg]
Score = 25.2 bits (52), Expect = 611
Identities = 8/10 (80%), Positives = 8/10 (80%), Gaps = 0/10 (0%)
Query 1 SARAHPALIN 10
S RAHP LIN
Sbjct 113 SERAHPLLIN 122
There does not appear to be a perfect match in the NCBI NR protein database. But take a close look at the #1 scoring hit. That is right, it is from and organism called Botryotinia fuckeliana. No comment on the appropriateness of this name, but it does contain a term I will probably use a lot if she gets elected.
Of course, anybody who has heard me blather on and on about evolution knows that I am always talking about how blast top hits are not a good measure of relatedness per se (see my NAR paper where I first talked about this in 1995). So – I decided to build a tree of Sarah Palin. I used the NCBI Distance Tree option which you can do from blast searches.

Since most likely you cannot see that in enough detail – here is a zoom in.

That one did not come through on the Blog so well either so I decided to output the tree in Newick format and then I searched for a program that could draw a better figure on the web (we have tools in my lab to do this but I am trying to do this all on the web as an exercise). And I found a web site that makes drawtree available. And I plugged in the Newick format and it made a nicer one.

Though making trees from really short sequences is not ideal, in this tree, Sarah Palin is shown to be at the root of a branch including a protein from the parasitic nematode Brugia malayi. So if we take an evolutionary interpretation it seems that this causative agent of filariasis (well, a protein from this agent) is descended from SarahPalin. In other words, she seems to be ancestral to this parasite.
So in conclusion – by similarity – SarahPalin is closest to a plant pathogen with an unusual name. And by phylogeny SarahPalin is ancestral to a parasitic nematode. Sounds about right.