Charles Darwin relic hidden in the chimp and human genomes

So – in honor of Charles Darwin and as a follow up to my analysis of Sarah Palin’s name (which amazingly showed as a best hit a fungus called B. fuckeliana) I decided today to do some blast searches with old Charlie D.’s name. You see CHARLES DARWIN includes letters that all are abbreviations of amino acids that make up proteins, so you can compare his name, pretending it is a protein, to proteins from other organisms.

So I went to the NCBI blast page and did a BLASTP search. Blastp searches a peptide against a database of peptides and identifies in the database sequences if one or more have similar amino-acid sequences to the one used to search (which is known as the query) . To make this work, I had to adjust some of the default parameters to make it possible to better detect short matches (I raised the # of expected matches to 10000).

Alas, no good matches convincing matches to known or predicted proteins came up. So I was sad. Then I said, what if Darwin was hiden in the genome of some organism? So I did a “translational” blast search called tblastn which takes a peptide and searches it against a DNA database and translates the DNA into all possible peptides it could encode. When one does this, one can possibly find “hidden” proteins or relics of proteins in the DNA that may not have been labelled as proteins by whomever analzyed the DNA data.

And what did I find by this Tblastn search? A jackpot to make evolutionary biologists VERY happy. The best matches for CHARLESDARWIN the peptide? Pan troglodytes. AKA Chimps. And humans (the matches were equally strong).

So – hidden in the human and Chimp genomes is a relic of one Charles Darwin. Happy Birthday Charlie.

———————————————-
See search results below:

Score E
Sequences producing significant alignments: (Bits) Value

gb|AC199643.3| Pan troglodytes BAC clone CH251-444E8 from chr… 25.8 1930
gb|AC093749.3| Homo sapiens BAC clone RP11-30B7 from 4, compl… 25.8 1930
gb|AF250324.1|AF250324 Homo sapiens chromosome 4q35 BAC clone… 25.8 1930
gb|AC217674.3| Pan troglodytes BAC clone CH251-398H5 from chr… 25.0 3549
gb|AC195095.2| Pan troglodytes BAC clone CH251-577A14 from ch… 25.0 3549
gb|AC188794.3| Pan troglodytes BAC clone CH251-69H24 from chr… 25.0 3549
gb|AC183104.3| Pan troglodytes BAC clone CH251-567E15 from ch… 25.0 3549
gb|AF105153.3| Homo sapiens alpha-satellite centromere border… 25.0 3549
emb|AL353763.14| Human DNA sequence from clone RP11-87H9 on c… 25.0 3549
gb|AC116618.4| Homo sapiens BAC clone RP11-98L17 from 4, comp… 25.0 3549
emb|CR786580.6| Human DNA sequence from clone RP11-764K9 on c… 25.0 3549
emb|AL591385.7| Human DNA sequence from clone RP11-391M20 on … 25.0 3549
emb|AL445925.19| Human DNA sequence from clone RP11-403A15 on… 25.0 3549
emb|AL592183.10| Human DNA sequence from clone RP11-297D8 on … 25.0 3549
ref|XM_787798.2| PREDICTED: Strongylocentrotus purpuratus sim… 24.3 6861
ref|XM_001201471.1| PREDICTED: Strongylocentrotus purpuratus … 24.3 6861
gb|AC195625.1| Pan troglodytes BAC clone CH251-895L14 from ch… 23.9 7711
gb|AC175749.2| Pan troglodytes BAC clone CH251-1124N9 from ch… 23.9 7711

Download subject sequence spanning the                                    HSP Pan troglodytes BAC clone CH251-444E8 from chromosome 7, complete sequence Length=155150
Score = 25.8 bits (55), Expect = 1930, Method: Composition-based stats. Identities = 8/13 (61%), Positives = 11/13 (84%), Gaps = 0/13 (0%) Frame = -2

Query 1 ____ CHARLESDARWIN 13
_____________CH RLE D+++IN
Sbjct 145762 CHVRLEQDSKYIN 145724

gb|AC093749.3| Download subject sequence spanning the                                    HSP Homo sapiens BAC clone RP11-30B7 from 4, complete sequence Length=163102 Score = 25.8 bits (55), Expect = 1930, Method: Composition-based stats.
Identities = 8/13 (61%), Positives = 11/13 (84%), Gaps = 0/13 (0%) Frame = -3

Query 1 ___ CHARLESDARWIN 13
____________CH RLE D+++IN
Sbjct 31925 CHVRLEQDSKYIN 31887

Author: Jonathan Eisen

I am an evolutionary biologist and a Professor at U. C. Davis. (see my lab site here). My research focuses on the origin of novelty (how new processes and functions originate). To study this I focus on sequencing and analyzing genomes of organisms, especially microbes and using phylogenomic analysis

2 thoughts on “Charles Darwin relic hidden in the chimp and human genomes”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: