Genomics – Jonathan Eisen's Lab

The tale of the blue soy products – from contaminated soy milk to a new publication

A new paper is out from my lab. This one is a remarkable story of work by PhD Student Marina E. De León (https://phylogenomics.me/people/marina-de-leon/).

It started with her pouring out some soy milk from her fridge that was blue.

See her Tweet about this here: https://twitter.com/MicrobialFuture/status/1220399781165461504?s=20 https://twitter.com/MicrobialFuture/status/1220399781165461504?s=20

There’s something about #soy that blue/violet pigmented #bacteria love! I’m gonna have to isolate this sapphire gem and find out what’s going on. @phylogenomics my 6th PHD project? pic.twitter.com/XfSeYfs1WW
— Marina De León (@MicrobialFuture) January 23, 2020

https://platform.twitter.com/widgets.js

And then she isolated bacteria from the soy milk and from some blue tofu in her fridge, identified them, did experiments to see if these isolated bacteria could cause soy milk to turn blue, found some that did, sequenced their genomes, and analyzed them to show that these ones had similar properties to other bacteria known to cause blue discoloration of food products. A truly remarkable piece of work.

See the paper here: “Draft Genome Sequences and Genomic Analysis for Pigment Production in Bacteria Isolated from Blue Discolored Soymilk and Tofu“

And thanks to Guillaume Jospin and Harriet Wilson who helped with the work and all the people in my lab and via social media that encouraged and supported Marina along the way.

And see also:

Remember that glass of blue soymilk? We published an entire #OpenAcess genome paper based on bacteria isolated from that milk! This is a purely pandemic paper and is #CitizenScience and #SciComm at it's greatest! Thanks to @phylogenomics & @Guillaumejospin https://t.co/dN3Ybnchx2
— Marina De León (@MicrobialFuture) October 2, 2021

https://platform.twitter.com/widgets.js

Matt Hahn @3rdreviewer talk at #UCDavis – pen and paper notes

Matt Hahn was at UC Davis giving a talk yesterday.

Yes @3rdreviewer has arrived at #UCDavis #holobiont #epigenetics #aquarius pic.twitter.com/yirR4rWdio

— Jonathan Eisen (@phylogenomics) October 6, 2016

//platform.twitter.com/widgets.js I did not have my laptop available so took notes with – gasp – a pen and paper. I thought it was quite a nice talk so am posting my notes here. More about Matt and his work can be found here: http://www.indiana.edu/~hahnlab/.

BLAST from the past – a bit of history behind Craig Pikaard’s discovery in 2000 of RNA Pol IV in Arabidopsis

I saw this post by Craig Pikaard on Facebook and it brought back some memories:

New paper from my lab in which we identified the RNAs made by RNA Polymerase IV, an enzyme we discovered ~15 years ago. Took us more than ten years to find the little buggers, but we finally got ’em. The paper is “open access”, meaning that anyone can read it without paying a download fee or subscription. So have at it if you need a nap.

And the post included a link to a new paper in Elife. This brought back memories because I had a small part in the discovery (or more accurately, some post discovery analysis). So – let’s step into a time machine here provided by, well, me keeping all my email forever I guess.

It was September 2000. I was working as a faculty member at TIGR (The Institute for Genomic Research) and I was doing some evolutionary analysis of the Arabidopsis thaliana genome, for what would become my most highly cited paper: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. And then on Sept 6 day I got an email from someone who I had gotten to know a little bit who was also analyzing the genome:

———————————-
9/6

Dear Jonathan,

In helping Mike Bevan search for the general transcription machinery, I’ve
stumbled across something odd that might also interest you given its
evolutionary implications.

There should be three related genes in the Arabidopsis genome (or more if
any of the genes are duplicated) encoding ~135 kd (2nd largest)
DNA-dependent RNA polymerase subunits – one each for pol I, II and III.
These subunits are similar and are clearly related to one another (also to
the B subunit of the single bacterial RNA polymerase) yet they have
distinct motifs that allow them to be placed in each class (pol I, II, or
III) based on clustal analysis with orthologs from other species. Anyway,
there ARE three distinct ~135 kd subunit genes in the thaliana genome and
based on multiple alignments vs. mouse, yeast, drosophila etc genes, and
clustal analysis to draw phylogenetic trees, one is clearly for pol II, and
one is clearly for pol III. The third paralog is strange- it does not group
with other pol I 135 kd subunits (from yeast, Drosophila, Euplotes, mouse,
C. elegans), nor with pol II or III subunits. In fact, it appears as an
outgroup even when archael subunits (e.g. Sulfolobus) are included in the
analysis: archael subunits are more closely related to the pol II second
largest subunit than the mystery subunit is to other pol I, II, or III
subunits. By BLAST searching Genbank, the mystery subunit does not match
anything better than eukaryotic 135 kd subunits and it doesn’t look like a
chloroplast or mitochondrial subunit. I’m wondering if a plant Pol I can
really be that weird.

Is this something you would be interested in looking at if I send you the
protein sequences for clustal analysis?

Cheers
Craig

Craig S. Pikaard
Associate Professor
Biology Department, Washington University
Campus Box 1137, One Brookings Drive
St. Louis, MO 63130

Now this certainly seemed interesting and as I was doing a variety of analyses of RNA polymerase homologs for some studies of the evolution of microbes, it was something I actually knew a little bit about. So I wrote back immediately:

Craig

This sounds quite interesting. I have found that for many of the DNA repair genes I have been looking at, the A. thaliana genes do show quite long branches, so long branches might be a possibility. A good phylogenetic analysis should be able to detemrine if that is the case. If you send me the sequences and/or an alignment, I would be happy to put them through a more deailed phylogenetic analysis.

Jonathan

Then, a few minutes later I got another email:

Hi Jonathan,

I’m pasting below the sequences I used for the multiple alignments (using
DNAStar), starting with the mystery gene and then known second subunits of
pol I, II, III, and archae.
Thanks for having a look at this.
Craig
———–

Arabidopsis mystery gene (from chromosome 3):
DEFINITION DNA-dependent RNA polymerase II [Arabidopsis thaliana].
ACCESSION BAB02021
A. thal chromosome III sequence. Does not group with pol I, II or III
despite its description. Two chromosome 3 P1 clones and two partial cDNAs
(that are the same)from developing seeds match it (see accessions below,
with match scores)
GSDB:S:3264005|AB020749|AB020749|Arabidopsis thaliana genomic D… 598 0.0
GSDB:S:4681131|AP000377|AP000377|Arabidopsis thaliana genomic D… 566 0.0
GSDB:S:1038672|Z19120|ATRNAPIIM|A.thaliana mRNA for RNA polymer… 504 e-142
GSDB:S:8430488|BE522782|BE522782|M28H12STM Arabidopsis developi… 171
1e-046
GSDB:S:8430529|BE522823|BE522823|M29C3STM Arabidopsis developin… 171
2e-041

mdvdeiesagqiniselgesflqtfckkaatsffeefglishqlnsynffiehglqnvfesfgdilvepsfdvikkkdgd
wryatvfkkivikhdkfktgqdeyvekeildvkkqdiligsipvmvksvlcktsekgkenckkgncafdqggyfvikgae
kvfiaqeqmctkrlwisnspwtvsfrsetkrnrfivrlsenekaedykimekvltvyflsteipvwllffalgvssdkea
mdliafdgddasitnsliasiheadavceafrcgnnaltyvehqikstkfppaesvddclrlylfpclqglkkkarflgy
mvkcllsayagkrkcenrdsfrnkrielagellereirvhlaharrkmtramqkqlsgdgdlkpiehyldasvitnglnr
afstgawshpfrkmervsgvvanlgranplqtlidlrrtrqqvlytgkvgdarhphpshwgrvcflstpdgencglvknm
sllglvstqglesvvemlftcgmeelmndtstplcgkhkvllngdwvglcadsesfvgelksrrrqselplemeikrdkd
dnevriftdagrllrpllvvenlhklkqdkptqypfkhlldqgileligieeeedcttawgikqllkepknythceldls
fllgvscaivpfanhdhgkrvlyqsqkhcqqaigfsstnpnircdtlsqqlfypqkplfktlaseclekevlfngqnaiv
avnvhlgynqedsivmnkaslergmfrseqirsykaevdtkdsekrkkmdelvqfgktyskigkvdsleddgfpfiganm
stgdivigrctesgadhsiklkhtergivqkvvlssndegknfaavslrqvrspclgdkfssmhgqkgvlgyleeqqnfp
ftiqgivpdivinphafpsrqtpgqlleaalskgiacpiqkkegssaaytkltrhatpfstpgvteiteqlhragfsrwg
nervyngrsgemmrslifmgptfyqrlvhmsenkvkfrntgpvhpltrqpvadrkrfggirfgemerdcliahgasanlh
erlftlsdssqmhicrkcktyanviertpssgrkirgpycrvcassdhvvrvyvpygakllcqelfsmgitlnfdtklc
———————————

Known Pol I ~135 kd subunits:

Yeast (S. cerevisae)
MSKVIKPPGQARTADFRTLERESRFINPPKDKSAFPLLQEAVQPHIGSFNALTEGPDGGLLNLGVKDIGEKVIFDGKPLN
SEDEISNSGYLGNKLSVSVEQVSIAKPMSNDGVSSAVERKVYPSESRQRLTSYRGKLLLKLKWSVNNGEENLFEVRDCGG
LPVMLQSNRCHLNKMSPYELVQHKEESDEIGGYFIVNGIEKLIRMLIVQRRNHPMAIIRPSFANRGASYSHYGIQIRSVR
PDQTSQTNVLHYLNDGQVTFRFSWRKNEYLVPVVMILKALCHTSDREIFDGIIGNDVKDSFLTDRLELLLRGFKKRYPHL
QNRTQVLQYLGDKFRVVFQASPDQSDLEVGQEVLDRIVLVHLGKDGSQDKFRMLLFMIRKLYSLVAGECSPDNPDATQHQ
EVLLGGFLYGMILKEKIDEYLQNIIAQVRMDINRGMAINFKDKRYMSRVLMRVNENIGSKMQYFLSTGNLVSQSGLDLQQ
VSGYTVVAEKINFYRFISHFRMVHRGSFFAQLKTTTVRKLLPESWGFLCPVHTPDGSPCGLLNHFAHKCRISTQQSDVSR
IPSILYSLGVAPASHTFAAGPSLCCVQIDGKIIGWVSHEQGKIIADTLRYWKVEGKTPGLPIDLEIGYVPPSTRGQYPGL
YLFGGHSRMLRPVRYLPLDKEDIVGPFEQVYMNIAVTPQEIQNNVHTHVEFTPTNILSILANLTPFSDFNQSPRNMYQCQ
MGKQTMGTPGVALCHRSDNKLYRLQTGQTPIVKANLYDDYGMDNFPNGFNAVVAVISYTGYDMDDAMIINKSADERGFGY
GTMYKTEKVDLALNRNRGDPITQHFGFGNDEWPKEWLEKLDEDGLPYIGTYVEEGDPICAYFDDTLNKTKIKTYHSSEPA
YIEEVNLIGDESNKFQELQTVSIKYRIRRTPQIGDKFSSRHGQKGVCSRKWPTIDMPFSETGIQPDIIINPHAFPSRMTI
GMFVESLAGKAGALHGIAQDSTPWIFNEDDTPADYFGEQLAKAGYNYHGNEPMYSGATGEELRADIYVGVVYYQRLRHMV
NDKFQVRSTGPVNSLTMQPVKGRKRHGGIRVGEMERDALIGHGTSFLLQDRLLNSSDYTQASVCRECGSILTTQQSVPRI
GSISTVCCRRCSMRFEDAKKLLTKSEDGEKIFIDDSQIWEDGQGNKFVGGNETTTVAIPFVLKYLDSELSAMGIRLRYNV
EPK

C. elegans
MDCDIASYHVDSFDFLVSKGCQFAAQAVPAEKFRLKNGDAVTMKFTSAQLHKPTLDTGAKLTSDTLPLLPAECRQRGLTY
AGNLKVGIDVHVNGSRLDIIEIILGKVPIMLRSEGCHLRGMSRKELVVAGEEPIEKGGYFIVNGSEKVIRLLIANRRNFP
IAIIRKTFKEKGKLFSEFGVMMRSVKENHTAVMMTLHYLDTGTMQLALQFRREIFYVPLMYIVKALTDKNDAVISAGFKR
GRNQDQFYSSCILNMLAQCQEEEILNQEAAIRAIGSRFRVAVSDRVAPWEDDLEAGRFIIRECVLIHLDSDEEKFHTLAY
MTQKLIALVKGECAPETPDNPQFQEASVSGHILLLILRERMENIIGMVRRKLEYMSSRKDFILTSAAILKALGNHTGGEI
TRGMAYFLATGNLVTRVGLALQQESGFSVIAERINQLRFVSHFRAIHRGAFFMEMRTTDVRKLRPEAWGFICPVHTPDGA
PCGLLNHVTASCRIVTDLSDNSNVPSLLAELGMYTHKTVALAPPGEELYPVLMNGRFLGYVPITKAASIERYLRCAKVAK
DARIPYTSEIALVRRSTDIKNIQTQYPGIYILSDAGRLIRPVRNLAMDAVEHIGTFEQVYLSVVLDPEEAEPGVTMHQEL
HPSCLFSFAGNLIPFPDHNQSPRNVYQCQMGKQTMGTAVHAWHSRADNKMYRLQFPQQPMLKLEAYEKYEMDEYPLGTNA
CVAVISYTGYDMEDAMTINKASYQRGFAHGTVIKVERINLVTERERKTIFYRNPREEIKTVGPDGLPIPGRRYFLDEVYY
VTFNMETGDFRTHKFHYAEPAYCGLVRIVEQGEGDSGAKHALIQWRIERNPIIGDKFASRHGQKGINSFLWPVESLPFSE
TGMVPDIIFNPHGFPSRMTIGMMIESMAGKAAATHGENYDASPFVFNEDNTAINHFGELLTKAGYNYYGNETFYSGVDGR
QMEMQIFFGIVYYQRLRHMIADKFQVRATGPIDPITHQPVKGRKKGGGIRFGEMERDAIIAHGTSFVLQDRLLNCSDRDV
AYACRRCGSLLSVLMSSRAGSHLLKKKRKDDEPLDYTETQRCRTCDKDDQVFLLQVPRVFRYLTAELAAMNVKIKLGIEH
PSKVTGS

D. melanogaster
MLEEMQQMKTIPVLTNSRPEFKQIPKKLSRHLANLGGPHVDSFDEMLTVGLDNSAKHMIPNHWLSPAGEKISMKVESIWI
AKPKVPQDVIDVRTREIYPTDSRQLHVSYSGMCSVRLGWSVNGVQKTPINMDLGEVPIMLRSKACNLGQATPEEMVKHGE
HDSEWGGIFVIRGNEKIVRMLIMTRRNHPICVKRSSWKDRGQNFSDLGMLVQTVREDESSLSNVVHYLNNGTAKFMFSHV
KRLSYVPVCLILKCLMDYTDEEIYNRLVQGYESDQYYVSCVQAMLREVQNENVYTHAQCKSFIGNLFRARFPEVPEWQPD
DDVTDFILRERVMIHLDTYEDKFQLIVFMIQKLFQCAQGKYKVENVDSSMMQEVLLPGHLYQKYLSERVESWVSQVRRCL
QKKLTSPDALVTSAVMTQCMRQAGGVGRAIESFLATGNIASRTGLGLMQNSGLVIMAENINRMRYMSHFRAIHRGSYFTT
MRTTEARQLLPDAWGFICPVHTPDGTPCGLLNHLTLTCEISMRPDPKLVKAIPKHLIDMGMMPLSNRRYLGEKLYVVFLD
GKHLGHIHQSEAEKIVDELRYGKIFGTLPQMMEIGFIPFKKNGQFPGLYIATGPARLMRPVWNLKWKRVEYIGTLEQLYM
EIAIDAKEMYPDFTTHLELAKTHFMSNLANLIPMPDYNQSPRNMYQCQMGKQTMGTPCLNWPKQAANKLYRLQTPGTPLF
RPVHYDIIQLDDFAMGTNAIVAVISYTGYDMEDAMIINKAAYERGFAYGSIYKTKFLTLDKKSSYFARHPHMPELIKHLD
TDGLPHPGSKLSYGSPLYCYFDGEVATYKVVKMDEKEDCIVESIRQLGSFDLSPTKMVAITLRVPRPATIGDKFASRAGQ
KGICSQKYPAEDLPFTESGLIPDIVFNPHGFPSRMTIAMMIETMAGKGAAIHGNVYDATPFRFSEENTAIDYFGKMLEAG
GYNYYGTERLYSGVDGREMTADIFFGVVHYQRLRHMVFDKWQVRSTGAVEARTHQPIKGRKRGGGVRFGEMERDALISHG
AAFLLQDRLFHNSDKTHTLVCHKCGSILAPLQRIVKRNETGGLSSQPDTCRLCGDNSSVSMIEIPFSFKYLVTELSSVNI
NARFKLNEI

mouse
MDVDGRWRNLPSGPSLKHLTDPSYGIPPEQQKAALQDLTRAHVDSFNYAALEGLSHAVQAIPPFEFAFKDERISLTIVDA
VISPPSVPKGTICKDLNVYPAECRGRKSTYRGRLTADISWAVNGVPKGIIKQFLGYVPIMVKSKLCNLYNLPPRVLIEHH
EEAEEMGGYFIINGIEKVIRMLIEPRRNFPVAMVRPKWKSRGLGYTQFGVSMRCVREEHSAVNMNLHYVENGTVMLNFIY
RKELFFLPLGFALKALVSFSDYQIFQELIKGKEEDSFFRNSVSQMLRIVIEEGCHSQKQVLNYLGECFRVKLSLPDWYPN
VEAAEFLLNQGICIHLQSNTDKFYLRCLMTRKLFALARGECMDDNPDSLVNQEVLSPGQLFLMFLKEKMENWLVSIKIVL
DKRAQKANVSINNENLMKIFSMGTELTRPFEYLLATGNLRSKTGLGFLEDSGLCVVADKLNFLRYLSHFRCVHRGAAFAK
MRTTTVRRLLPESWGFLCPVHTPDGAPCGLLNHLTAVCEVVTKFGDTASIPALLCGLGVTGADTAPCRPYSDCYPVLLDG
VMVGWVDKDLAPEVADTLRRFKVLREKRIPPWMEVALIPMTGKPSLYPGLFLFTTPCRLVRPVQNLELGREELIGTMEQL
FMNVAIFEDEVFGGISTHQELFPHSLLSVIANFIPFSDHNQSPRNMYQCQMGKQTMGFPLLTYQNRSDNKLYRLQTPQSP
LVRPCMYDFYDMDNYPIGTNAIVAVISYTGYDMEDAMIVNKASWERGFAHGSVYKSEFIDLSEKFKQGEDNLVFGVKPGD
PRVMQKLDDDGLPSIGAKLEYGDPYYSYLNLNTGEGFVVYYKSKENCVVDNIKVCSNDMGSGKFKCICITVRIPRNPTIG
DKFASRHGQKGILSRLWPAEDMPFTESGMMPDILFNPHGFPSRMTIGMLIESMAGKSAALHGLCHDATPFIFSEENSALE
YFGEMLKAAGYNFYGTERLYSGISGMELEADIFIGVVYYQRLRHMVSDKFQVRTTGARDKVTNQPLGGRNVQGGIRFGEM
ERDALLAHGTSFLLHDRLFNCSDRSVAHVCVECGSLLSPLLEKPPPSWSAMRNRKYNCTVCGRSDTIDTVSVPYVFRYFV
AELAAMNIKVKLDVI

Euplotes
MKTNAKFDRKEISKIYKNIARHHIDSFDFAMSTCLNRACEHMLPFDYIVPEESASCGFKKLTLWYDSFELGQPSLGEIDY
DSHILYPSECRQRKMTYTIPLFATIFKKFDDEMVDNFKVKLGDIPTMGRKKFCNLKGLTKKELAKRGEDMLEFGGYFIVN
GNEKVIRMLIVPKRNFPIAFKRSKFLERGKDFTDYGVQMRCVRDDFTAQTITLTYLSDGSVSLRLIYQKQEFLIPIILIL
KALKNCTDRQIYERIVKGNFNQRQISDRVEAILAVGKDLNIYDSDQSKALIGSRFRIVLAGITSETSDIDAGDLFLSKHI
CIHTDSYEAKFDTLILMIDKLYASVANEVELDNLDSVAMQDVLLGGHLYLQILSEKLFDCLHINLRARLNKELKRHNFDP
MKFRDVLTNQKINCGIGLIGKRMENFLATGNLISRTNLDLMQTSGFCIIGDKLNNIRFLSHFRSIHRGQYFAEQKTTSVR
KLLPESWGFICPVHTPDGAPCGLLNHISMSCVPIGSEEKQIDIDKFRNILGELGMNSISSDLCLNYHTGYYPVIFDGIHL
GYVEKDIGESFVEGLRYLKCTQSQPDYAIPRTLEIAFIPFSGYSRNLQWPGIFLASTPARFTRPVKNLHYNCIEWISPLE
QMNLSIACTDEDITPETTHQELDPINILSIVASVGVFAEYNQSPRNMYQCQMAKQTMGTPYHNHQFRTDNKIYRLLFPHR
PIVKTRTQVDFDIEEYPSGTNAVVAVISYTGYDLEDAMIINKSSYERGFGHGVVYKSYTHDLNESNSQSTRGIKSSVRYK
FLNNVSQKDKSKIKLENIDPDGLPKIGSQLTKGKPELCIFDTLKRGAKLSKFKDSEKARIETVRVCGNDDKNPDNLSIGY
TIRYSRIPVIGDKFSSRHGQKGVLSVLWPQVDMPFTENGITPDLIINPHAFPSRMTMGMLIQSMAAKSGSLRGEFKTVET
FQRYDDNDIVGHFGKELLDKGFNYHGNELMYSGIFGTPLKADIFIGVVYYQRLRHMVSDKSQARGTGPIDILTHQPVKGR
KKGGGIRFGEMERDSLLAHGAAYCLNDRLFRSSDYSEGFVCQNCGSILSCYVNRAIMKTQTFIPPSLDESNKDTEDKEIH
MNEKVICKVCKKNSNCKKVALPFVLRFLANELASMGIKLKFTVNDF
——————–

Pol II second largest subunits

S. cerevisae
msdlansekyydedpygfedesapitaedswavisaffrekglvsqqldsfnqfvdytlqdiicedstlileqlaqhtte
sdnisrkyeisfgkiyvtkpmvnesdgvthalypqearlrnltyssglfvdvkkrtyeaidvpgrelkyeliaeesedds
esgkvfigrlpimlrskncylseatesdlyklkecpfdmggyfiingsekvliaqersagnivqvfkkaapspishvaei
rsalekgsrfistlqvklygregssartikatlpyikqdipiviifralgiipdgeilehicydvndwqmlemlkpcved
gfviqdretaldfigrrgtalgikkekriqyakdilqkeflphitqlegfesrkafflgyminrlllcaldrkdqddrdh
fgkkrldlagpllaqlfktlfkkltkdifrymqrtveeahdfnmklainaktitsglkyalatgnwgeqkkamssragvs
qvlnrytysstlshlrrtntpigrdgklakprqlhnthwglvcpaetpegqacglvknlslmscisvgtdpmpiitflse
wgmepledyvphqspdatrvfvngvwhgvhrnparlmetlrtlrrkgdinpevsmirdirekelkiftdagrvyrplfiv
eddeslghkelkvrkghiaklmateyqdieggfedveeytwssllneglveyidaeeeesiliamqpedlepaeaneend
ldvdpakrirvshhattfthceihpsmilgvaasiipfpdhnqsprntyqsamgkqamgvfltnynvrmdtmanilyypq
kplgttrameylkfrelpagqnaivaiacysgynqedsmimnqssidrglfrslffrsymdqekkygmsitetfekpqrt
ntlrmkhgtydkldddgliapgvrvsgedviigkttpispdeeelgqrtayhskrdastplrstengivdqvlvttnqdg
lkfvkvrvrttkipqigdkfasrhgqkgtigityrredmpftaegivpdliinphaipsrmtvahliecllskvaalsgn
egdaspftditvegiskllrehgyqsrgfevmynghtgkklmaqiffgptyyqrlrhmvddkiharargpmqvltrqpve
grsrdgglrfgemerdcmiahgaasflkerlmeasdafrvhicgicglmtviaklnhnqfeckgcdnkidiyqihipyaa
kllfqelmamnitprlytdrsrdf

C. elegans
myddedemvndpmdgdyiddsdeisaeawqeacwvvisayfdekglvrqqldsfdefvqmnvqrivedsppvelqsenqh
lgtdmenpakfslkfnqiylskpthwekdgapmpmmpnearlrnltyasplyvditkvvtrddsatekvydkvfvgkvpv
mlrssycmlsnmtdrdltelnecpldpggyfvingsekvliaqekmatntvyvfsmkdgkyafktecrsclenssrptst
mwvnmlargggggkktamgqriigilpyikqeipimivfralgfvsdrdilghiiydfndpemmemvkpsldeafviqeq
nvalnfigargakpgvtreqrikyareilqkellphvgvsehcetkkaffigymvhrlllaalgrrelddrdhignkrld
lagpllaflfrslfrnllkemrmtaqkyinknddfaldvcvktstitrgltyslatgnwgdqkkahqsragvsqvlnrlt
ytatlshlrranspigregklakprqlhntqwgmvcpaetpegqavglvknlalmayisvgslpepilefleewsmenle
evspsaiadatkifvngawvgihrepdqlmttlkklrrqmdiivsevsmvrdirdreiriytdagrvcrpllivenqkla
lkkrhidqlkeaadeankytwsdlvgggvvelidsmeeetsmiammpedlrsggycdththceihpamilgvcasiipfp
dhnqsprntyqsamgkqamgvyttnfhvrmdtlahvlyypqkplvttrsmeylrfnelpaginaivailsysgynqedsv
imnnsaidrglfrsvfyrsyrdneanldnaneeliekptrekcsgmrhslydkldedgiispgmrvsgddviigktvalp
didddldasgkkypkrdastflrssetgivdqvmlslnsdgnkfvkirmrsvrlpqigdkfasrhgqkgtmgimyrqedm
pftaegltpdiiinphavpsrmtighlieclqgklsankgeigdatpfndtvnvqkisgllceygyhlrgnevmynghtg
kklttqiffgptyyqrlkhmvddkihsrargpiqmmnrqpmegrardgglrfgemerdcqishgatqflrerlfevsdpy
hvyvcnncglivvanlrtnsfeckacrnktqvsavripyackllfqelmsmsiaprlmvkprqskrskhqsea

Drosophila
msvqrivedspaielqaerqhtsgevetpprfslkfeqiylskpthwekdgspspmmpnearlrnltysaplyvditktk
nvegldpvetqhqktfigkipimlrstycllsqltdrdltelnecpldpggyfiingsekvliaqekmatntvyvfsmkd
gkyafkteirsclehssrptstlwvnmmargsqnikksaigqriiailpyikqeipimivfralgfvadrdilehiiydf
ddpemmemvkpsldeafvvqeqnvalnfigargarpgvtkdkrikyakeilqkemlphvgvsdfcetkkayflgymvhrl
llaslgrrelddrdhygnkrldlagpllaflfrglfknlmkevrmytqkfidrgkdfnlelaiktniitdglryslatgn
wgdqkkahqaragvsqvlnrltfastlshlrrvnspigrdgklakprqlhntlwgmlcpaetpegaavglvknlalmayi
svgsqpspilefleewsmenleeiapsaiadatkifvngcwvgihrdpeqlmatlrklrrqmdiivsevsmirdirdrei
riytdagricrpllivengslllkkthvemlkerdyknyswqvlvasgvveymytleeetvmiamspydlkqdkdyayct
tythceihpamilgvcasiipfpdhnqsprntyqsamgkqamgvyitnfhvrmdtlahvlyypmkplvttrsmeylrfre
lpaginsivailcytgynqedsvilnasavergffrsvfyrsykdsenkrvgdqeenfekphrgtcqgmrnahydklddd
giiapgirvsgddvvigktitlpenddeldsntkrfskrdastflrnsetgivdqvmltlnsegykfckirvrsvripqi
gdkfasrhgqkgtcgiqyrqedmaftceglapdiiinphaipsrmtighlieclqgklgsnkgeigdatpfndavnvqki
stflqeygyhlrgnevmynghtgrkinaqvflgptyyqrlkhmvddkihsrargpvqilvrqpmegrardgglrfgemer
dcqishgaaqflrerlfevsdpyrvhicnfcgliaianlrnntfeckgcknktqisqvrlpyaakllfqelmsmniaprl
mvt

Human
mydadedmqydedddeitpdlwqeacwivissyfdekglvrqqldsfdefiqmsvqrivedappidlqaeaqhasgevee
ppryllkfeqiylskpthwerdgapspmmpnearlrnltysaplyvditktvikegeeqlqtqhqktfigkipimlrsty
cllngltdrdlcelnecpldpggyfiingsekvliaqekmatntvyvfakkdskyaytgecrsclenssrptstiwvsml
arggqgakksaigqrivatlpyikqevpiiivfralgfvsdrdilehiiydfedpemmemvkpsldeafviqeqnvalnf
igsrgakpgvtkekrikyakevlqkemlphvgvsdfcetkkayflgymvhrlllaalgrrelddrdhygnkrldlagpll
aflfrgmfknllkevriyaqkfidrgkdfnlelaiktriisdglkyslatgnwgdqkkahqaragvsqvlnrltfastls
hlrrlnspigrdgklakprqlhntlwgmvcpaetpeghavglvknlalmayisvgsqpspilefleewsmenleeispaa
iadatkifvngcwvgihkdpeqlmntlrklrrqmdiivsevsmirdirereiriytdagricrpllivekqklllkkrhi
dqlkereynnyswqdlvasgvveyidtleeetvmlamtpddlqekevaycstythceihpsmilgvcasiipfpdhnqsp
rntyqsamgkqamgvyitnfhvrmdtlahvlyypqkplvttrsmeylrfrelpaginsivaiasytgynqedsvimnrsa
vdrgffrsvfyrsykeqeskkgfdqeevfekptretcqgmrhaiydkldddgliapgvrvsgddviigktvtlpenedel
estnrrytkrdcstflrtsetgivdqvmvtlnqegykfckirvrsvripqigdkfasrhgqkgtcgiqyrqedmpftceg
itpdiiinphaipsrmtighlieclqgkvsankgeigdatpfndavnvqkisnllsdygyhlrgnevlyngftgrkitsq
ifigptyyqrlkhmvddkihsrargpiqilnrqpmegrsrdgglrfgemerdcqiahgaaqflrerlfeasdpyqvhvcn
lcgimaiantrthtyecrgcrnktqislvrmpyackllfqelmsmsiaprmmsv
Peperomia (Plant)
wgmmcpaetpegqacglvknlalmvyitvgsaanpilefleewstenfeeispavipqatkifvngcwvgihrnpdllvk
tlrqlrrqidvntevgvirdirlkelrlytdygrcsrplfivenqkllikkrdiqalqqretqeegwhflvskgfieyvd
teeeettmismtindlvqarrskdaysttythceihpslilgvcasiipfpdhnqsprntyqsamgkqamgiyvtnyqlr
mdtlayvlyypqkplvttramehlhfrqlpaginaivaiacysgynqedsvimnqssidrgffrslffrsyrdeekkmgt
lvkedfgrpnrentmgmrhgsydkldddglappgtrvsgedviigktspiaqdesqgqasrynrrdhstslrhsesgmvd
qvllttnadglrfvkvrmrsvripqigdkfssrhgqkgtvgmtytqedmpwtaegitpdiivnqhaipsrmtigqlieci
mgkvaahmgkegdatpftdvtvdniskalhkcgyqmrgfetmynghtgrrlsamiflgptyyqrlkhmvddkih

Arabidopsis
Columbia; BAC clone F17L22.
Essentially identical to Larkin and Guilfoyle sequence for pol II 2nd

largest subunit
MEYNEYEPEPQYVEDDDDEEITQEDAWAVISAYFEEKGLVRQQLDSFDEFIQNTMQEIVDESADIEIRPESQHNPGHQSD
FAETIYKISFGQIYLSKPMMTESDGETATLFPKAARLRNLTYSAPLYVDVTKRVIKKGHDGEEVTETQDFTKVFIGKVPI
MLRSSYCTLFQNSEKDLTELGECPYDQGGYFIINGSEKVLIAQEKMSTNHVYVFKKRQPNKYAYVGEVRSMAENQNRPPS
TMFVRMLARASAKGGSSGQYIRCTLPYIRTEIPIIIVFRALGFVADKDILEHICYDFADTQMMELLRPSLEEAFVIQNQL
VALDYIGKRGATVGVTKEKRIKYARDILQKEMLPHVGIGEHCETKKAYYFGYIIHRLLLCALGRRPEDDRDHYGNKRLDL
AGPLLGGLFRMLFRKLTRDVRSYVQKCVDNGKEVNLQFAIKAKTITSGLKYSLATGNWGQANAAGTRAGVSQVLNRLTYA
STLSHLRRLNSPIGREGKLAKPRQLHNSQWGMMCPAETPEGQACGLVKNLALMVYITVGSAAYPILEFLEEWGTENFEEI
SPSVIPQATKIFVNGMWVGVHRDPDMLVKTLRRLRRRVDVNTEVGVVRDIRLKELRIYTDYGRCSRPLFIVDNQKLLIKK
RDIYALQQRESAEEDGWHHLVAKGFIEYIDTEEEETTMISMTISDLVQARLRPEEAYTENYTHCEIHPSLILGVCASIIP
FPDHNQSPRNTYQSAMGKQAMGIYVTNYQFRMDTLAYVLYYPQKPLVTTRAMEHLHFRQLPAGINAIVAISCYSGYNQED
SVIMNQSSIDRGFFRSLFFRSYRDEEKKMGTLVKEDFGRPDRGSTMGMRHGSYDKLDDDGLAPPGTRVSGEDVIIGKTTP
ISQDEAQGQSSRYTRRDHSISLRHSETGMVDQVLLTTNADGLRFVKVRVRSVRIPQIGDKFSSRHGQKGTVGMTYTQEDM
PWTIEGVTPDIIVNPHAIPSRMTIGQLIECIMGKVAAHMGKEGDATPFTDVTVDNISKALHKCGYQMRGFERMYNGHTGR
PLTAMIFLGPTYYQRLKHMVDDKIHSRGRGPVQILTRQPAEGRSRDGGLRFGEMERDCMIAHGAAHFLKERLFDQSDAYR
VHVCEVCGLIAIANLKKNSFECRGCKNKTDIVQVYIPYACKLLFQELMSMAIAPRMLTKHLKSAKGRQ
—————–

RNA polymerase III

Yeast (cerevisae)
mvaatkrrkthihkhvkdeafddllkpvykgkkltdeintaqdkwhllpaflkvkglvkqhldsfnyfvdtdlkkiikan
qlilsdvdpefylkyvdirvgkksssstkdyltpphecrlrdmtysapiyvdieytrgrniimhkdveigrmpimlrsnk
cilydadeskmaklnecpldpggyfivngtekvilvqeqlsknriiveadekkgivqasvtsstherksktyvitkngki
ylkhnsiaeeipiaivlkacgilsdleimqlvcgndssyqdifavnleesskldiytqqqaleyigakvktmrrqkltil
qegieaiattviahltvealdfrekalyiammtrrvvmamynpkmiddrdyvgnkrlelagqlisllfedlfkkfnndfk
lsidkvlkkpnrameydallsinvhsnnitsglnraistgnwslkrfkmeragvthvlsrlsyisalgmmtrissqfeks
rkvsgpralqpsqfgmlctadtpegeacglvknlalmthittddeeepikklcyvlgveditlidsaslhlnygvylngt
ligsirfptkfvtqfrhlrrtgkvsefisiysnshqmavhiatdggricrpliivsdgqsrvkdihlrklldgeldfddf
lklglveyldvneendsyialyekdivpsmthleiepftilgavaglipyphhnqsprntyqcamgkqaigaiaynqfkr
idtllylmtypqqpmvktktielidydklpagqnatvavmsysgydiedalvlnkssidrgfgrcetrrktttvlkryan
htqdiiggmrvdengdpiwqhqslgpdglgevgmkvqsgqiyinksvptnsadapnpnnvnvqtqyreapviyrgpepsh
idqvmmsvsdndqalikvllrqnrrpelgdkfssrhgqkgvcgiivkqedmpfndqgivpdiimnphgfpsrmtvgkmie
lisgkagvlngtleygtcfggskledmskilvdqgfnysgkdmlysgitgeclqayiffgpiyyqklkhmvldkmharar
gpravltrqptegrsrdgglrlgemerdcviaygasqlllerlmissdafevdvcdkcglmgysgwcttcksaeniikmt
ipyaakllfqellsmniaprlrledifqq
S. pombe
mgvntagdpqksqpkinkggigkdesfgalfkpvykgkkladpvptiedkwqllpaflkvkglvkqhldsynyfvdvdlk
kivqanekvtsdvepwfylkyldirvgapvrtdadaiqasisphecrlrdltyganiyvdieytrgkqvvrrrnvpigrm
pvmlrsnkcvlsgknememaalnecpldpggyfivkgtekvilvqeqlsknriiveaepkkglwqasvtsstherkskty
vitkngklylkhnsvaddipivvvlkamglqsdqeifelvagaeasyqdlfapsieecaklniytaqqaleyigarvkvn
rraganrlppheealevlaavvlahinvfnlefrpkavyigimarrvlmamvdplqvddrdyvgnkrlelagqllallfe
dlfkkfnsdlklnidkvlkkphrtqefdaynqltvhsdhitqgmvralstgnwslkrfkmeragvthvlsrlsyisalgm
mtritsqfektrkvsgprslqasqfgmlctsdtpegeacglvknlalmthittdeeeepiiklayafgiedihvisgrel
hshgtylvylngailgisrypslfvasfrklrrsgkispfigifinthqravfistdggricrpliivqnglpkveskhi
rllkegkwgfedflkqglveyvdvneendslisvyerditpdtthleiepftilgavaglipyphhnqsprntyqcamgk
qaigaiaynqlqridtllylmvypqqpmvktktieligydklpagqnatvaimsysgydiedalvlnkssidrgfgrcqv
fhkhsvivrkypngthdrigdpqrdpetgevvwkhgvveddglagvgcrvqpgqiyvnkqtptnaldnsitlghtqtves
gykatpmtykapepgyidkvmltttdsdqtlikvlmrqtrrpelgdkfssrhgqkgvcgvivqqedmpfndqgicpdiim
nphgfpsrmtvgkmiellsgkvgvlrgtleygtcfggtkvedasrilvehgynysgkdmltsgitgetleayifmgpiyy
qklkhmvmdkmharargpravltrqptegrsrdgglrlgemerdcliaygasqlllerlmissdacdvdvcgqcgllgyk
gwcnscqstrevvkmtipyaakllfqellsmnivprlaledefky

Drosophila
mvelkmgdhnveattwdpgdskdwsvpikpltekwklvpaflqvkglvkqhidsfnhfinvdikkivkanelvtsgadpl
fylkyldvrvgkpdiddgfnitkattphecrlrdttysapitvdieytrgtqrikrnnlligrmplmlrsncaltgksef
elsklnecpldpggyfvvrgqekviliqeqlswnkmltedfngvvqcqvtssthekksrtlvlskhgkyylkhnsmtddi
pivvifkalgvvsdqeiqsligidsksqnrfgaslidaynlkvftqqraleymgsklvvkrfqsattktpseearelllt
tilahvpvdnfnlqmkaiyvsmmvrrvmaaeldktlfddrdyygnkrlelagsllsmmfedlfkrmnwelktiadknipk
vkaaqfdvvkhmraaqitaglesaissgnwtikrfkmeragvtqvlsrlsyisalgmmtrvnsqfektrkvsgprslqps
qwgmlcpsytpegeacglvknlalmthitteveerpvmivafnagvedirevsgnpinnpnvflvfingnvlgltlnhkh
lvrnlrymrrkgrmgsyvsvhtsytqrciyihtdggrlcrpyvivenrrplvkqhhldelnrgirkfddflldglieyld
vneendsfiawnedqiedrtthleietftllgvcaglvpyphhnqsprntyqcamgkqamgmigynhnnridslmynlvy
phapmvksktieltnfdklpagqnatvavmsysgydiedalilnkasidrgygrclvyknskctvkryanqtfdrimgpm
kdaltnkvifkhdvldtdgivapgeqvqnkqiminkempavtsmnplqgqsaqvpytavpisykgpepsyiervmvsana
eedflikillrqtriprgdkfssrhgqkgvtgliveqedmpfndfgicpdmimnphgfpsrmtvgktlellggkaglleg
kfhygtafggskvediqaelerhgfnyvgkdffysgitgtpleayiysgpvyyqklkhmvqdkmharargpkavltrqpt
qgrsregglrlgemerdclisygasmlimerlmissdafevdvcrtcgrmaycswchfcqssanvskismpyackllfqe
ltsmnvvpkmileny

A. thaliana (chromosome 5)
DEFINITION DNA-directed RNA polymerase subunit [Arabidopsis thaliana].

ACCESSION BAB11387
mliiflhgfqitdsliaklramgldqedldltnddhfidkeklsapikstadkfqlvpeflkvrglvkqhldsfnyfinv
gihkivkansritstvdpsiylrfkkvrvgepsiinvntveninphmcrladmtyaapifvnieyvhgshgnkaksakdn
viigrmpimlrscrcvlhgkdeeelarlgecpldpggyfiikgtekvlliqeqlsknriiidsdkkgninasvtsstemt
ksktviqmekekiylflhrfvkkipiiivlkamgmesdqeivqmvgrdprfsasllpsieecvsegvntqkqaldyleak
vkkisygtppekdgralsilrdlflahvpvpdnnfrqkcfyvgvmlrrmieamlnkdamddkdyvgnkrlelsgqlisll
fedlfktmlseaiknvdhilnkpirasrfdfsqclnkdsrysislglertlstgnfdikrfrmhrkgmtqvltrlsfigs
mgfitkispqfeksrkvsgprslqpsqwgmlcpcdtpegescglvknlalmthvttdeeegplvamcyklgvtdlevlsa
eelhtpdsflvilnglilgkhsrpqyfanslrrlrragkigefvsvftnekqhcvyvasdvgrvcrplviadkgisrvkq
hhmkelqdgvrtfddfirdglieyldvneennalvclraeaakadtthieiepftilgvvaglipyphhnqsprntyqca
mgkqamgniaynqlnrmdtllyllvypqrpllttrtielvgydklgagqnatvavmsfsgydiedaivmnkssldrgfgr
civmkkivamsqkydnctadrilipqrtgpdaekmqildddglatpgeiirpndiyinkqvpvdtvtkftsalsdsqyrp
areyfkgpegetqvvdrvalcsdkkgqlcikyiirhtrrpelgdkfssrhgqkgvcgiiiqqedfpfselgicpdlimnp
hgfpsrmtvgkmiellgskagvscgrfhygsafgersghadkvetisatlvekgfsysgkdllysgisgepveayifmgp
iyyqklkhmvldkmhargsgprvmmtrqptegkskngglrvgemerdcliaygasmliyerlmissdpfevqvcracgll
gyynyklkkavcttckngdniatmklpyackllfqvktiglffklklstsshlendkiilisgykflpkisknh

———————

Archae second subunit (only one polymerase, though multi-subunit similar to
eukaryotes)

Sulfolobus
DEFINITION DNA-DIRECTED RNA POLYMERASE SUBUNIT B.
ACCESSION P11513
PID g133422
mldtesrwaiaesffktrglvrqhldsfndflrnklqqviyeqgeivtevpglkiklgkiryekpsiretdkgpmreitp
mearlrnltysspiflsmipvenniegepieiyigdlpimlksvadptsnlpidklieigedpkdpggyfivngsekmii
aqedlatnrvlvdygksgsnithvakvtssaagyrvqvmierlkdstiqisfatvpgripfaiimralgfvtdrdivyav
sldpqiqnellpsleqassitsaeealdfignrvaigqkrenriqkaeqvidkyflphlgtspedrkkkgyylasavnki
lelylgrrepddkdhyankrvrlagdlftslfrvafkafvkdlvyqlekskvrgrrlsltalvradiiterirhalatgn
wvggrtgvsqlldrtnwlsmlshlrrvvsslargqpnfeardlhgtqwgrmcpfetpegpnsglvknlallaqvsvgine
svvervayelgvvsvedvirriseqnedvekymswskvylngrllgyyedgkelakkiresrrqgklsdevnvayiatdy
lnevhincdagrvrrpliivnngtplvdtedikklkngeitfddlvkqgkiefidaeeeenayvalnpqdltpdhthlei
wpsailgiiasiipypehnqsprntyqsamakqslglyasnyqirtdtrahllhypqmplvqtrmlgvigyndrpagana
ilaimsytgynmedsiimnkssiergmyrstffrlysteevkypggqedkivtpeagvkgykgkdyyrlledngvvspev
evkggdvligkvspprflqefkelspeqakrdtsivtrhgengivdlvlitetlegnklvkvrvrdlripeigdkfatrh
gqkgvvgilidqvdmpytakgivpdiilnphalpsrmtigqimeaiggkyaalsgkpvdatpfletpklqemqkeilklg
hlpdstevvydgrtgqklksrilfgivyyqklhhmvadkmharargpvqiltrqptegraregglrfgemerdcligfgt
amlikdrlldnsykavvyicdqcgyvgwydrsknryvcpvhgdksvlhpvtvsyafklliqelmsmvisprlilgekvnl
ggasne

Well, this was helpful. Sequences and useful notes about them. So I played around with the sequences and searched for some other homologs and built a few alignments, build some masks to filter out poorly aligned regions, and then fed the data into PAUP and built a tree. (I note – I know about this because amazingly I still have all the files)

And I wrote back to Mike Bevan and Craig on Sept 8:

Mike and Craig

Attached is a phylogenetic tree of RNA polymerase subunits (Craig suggested I look at these because of an unusual protein in the A. thaliana genome). A. thaliana has representatives in five different subfamilies – Pol-I, Pol-II, Pol-III and RpoB (for the chloroplast) as would be expected and then this novel Pol which I have called Pol-IV.

I do not know much about RNA polymerase, but it seems like this is a pretty big deal and I think should be emphasized in the paper. What do you think? I could try to make a pretty tree figure to show the different families.

Jonathan

I got an email back:

Dear Jonathan (and Mike),

Many thanks for the detailed phylogenetic tree of the mystery pol subunit.
I think a figure is the only way to show clearly that this protein defines
a new clade. Is there room for such a figure, Mike?

In the lab we have also been calling it a putative pol IV subunit just for
the shock value of saying the words (a radical idea in the transcription
field), though in the absence of knowing what other subunits associate with
it, I’m not sure what to call it in the annotation or figure. Maybe
“oddpol” or “atypical polymerase 2nd subunit”. It takes more than a dozen
subunits to make a eukaryotic polymerase, so it is not clear that one
unusual subunit is enough to confer new properties-i.e. a true pol IV.
Obviously, that will require quite a bit of work.

Cheers,
Craig

Me to Craig on 9/11/00:

Yes

I agree that it is too early to call it a true polIV, and I was doing it for the shock value too

Jonathan

PS. Do you mind if I present this at the TIGR GSAC meeting later this week

Jonathan

Craig to me 9/11:

Hi Jonathan,

Feel free to show the data. In thinking more about this, it is worth also
making a phylogenetic tree for the largest pol subunit (the equivalent of
eubacterial B’) just to see if there might be a fourth class out there for
the largest subunit, too. If there is, pol IV may not be such a wild idea.

In case you are interested in giving this a try, I’m including some
sequences below. In the meantime, is there a good web site for performing
the types of extensive phylogenetic trees you’ve done for the mystery
subunit? I should do this for many of the general transcription factors
just to be sure they really group with the correct homologs, as you
suggested.

Anyway, here are some largest subunit sequences for pol I, II and III.
Vive la difference!

Craig

Pol I:
rat
mlaskhtpwrrlqgisfgmysaeelkklsvksitnpryvdslgnpsadglydlalgpadskevcstcvqdfnncsghlgh
idlpltvynpllfdklylllrgsclnchmltcpraaihllvcqlkvldvgalqavyelerilsrfleetsdpsafeiqee
leeytskilqnnllgsqgahvknvcesrsklvahfwkthmaakrcphcktgrsvvrkehnskltitypamvhkksgqkda
elpegapaapgideaqmgkrgyltpssaqehlfaiwknegfflnylfsglddigpessfnpsmffldfivvppsryrpin
rlgdqmftngqtvnlqavmkdavlirkllavmaqeqklpcemteitidkendssgaidrsflsllpgqsltdklyniwir
lqshvnivfdsdmdklmlekypgirqilekkeglfrkhmmgkrvdyaarsvicpdmyintneigipmvfatkltypqpvt
pwnvqelrqavingpnvhpgasmvinedgsrtalsavdatqreavakqlltpstgipkpqgakvvcrhvkngdilllnrq
ptlhrpsiqahrahilpeekvlrlhyanckaynadfdgdemnahfpqselgraeayvlactdqqylvpkdgqplagliqd
hmvsganmtirgcfftreqymelvyrgltdkvgrvklfppailkpfplwtgkqvvstlliniipedytplnltgkakigs
kawvkekprpvpdfdpdsmcesqviiregellcgvldkahygssayglvhccyeiyggetsgrvltclarlftaylqlyr
gftlgvedilvkpnadvmrqriieestqcgpravraalnlpeaascdeiqgkwqdaiwrkdqrdfnmidmkfkeevnhys
neinkacmpfglhrqfpennlqmmvqsgakgstvntmqiscllgqielegrrpplmasgkslpcfepyeftpraggfvtg
rfltgirppefffhcmagreglvdtavktsrsgylqrciikhleglviqydltvrdsdgsvvqflygedgldipktqflq
pkqfpflasnyevimkskhlhevlsradpqkvlrhfraikkwhhrhssallrkgaflsfsqkiqaavkalnlegktqngr
spetqqmlqmwheldeqsrrkyqkraapcpdpslsvwrpdihfasvsetfekkiddysqewaaqaekshnrselsldrlr
tllqlkwqrslcdpgeavgllaaqsigepstqmtlntfhfagrgemnvtlgiprlreilmvasaniktpmmsvpvfntkk
alrrvkslkkqltrvclgevlqkvdiqesfcmgekqnkfrvyelrfqflphayyqqekclrpedilhfmetrffkllmea
ikkknskasafrsvntrratqkdlddtedsgrnrreeerdeeeegnivdaeaeegdadasdtkrkekqeeevdyeseeeg
eeeeeedvqeeenikgegahqthepdeeegsgleeessqnppcrhsrpqgaeamerriqavreshsfiedyqydteeslw
cqvtvklplmkinfdmsslvvslahnaivyttkgitrcllnetinsknekefvlnteginlpelfkysevldlrrlysnd
ihavantygieaalrviekeikdvfavygiavdprhlslvadymcfegvykplnrfgiqssssplqqmtfetsfqflkqa
tmmgshdelkspsaclvvgkvvkggtglfelkqplr

Drosophila
mgskramdvhmfpsdlefavftdqeirklsvvkvitgitfdalghaipgglydirmgsygrcmdpcgtclklqdcpghmg
hielgtpvynpffikfvqrllcifclhcyklqmkdheceiimlqlrlidagyiieaqelelfkseivcqntenlvaikng
dmvhphiaamykllekneknssnstktscslrtaithsalqrlgkkcrhcnksmrfvrymhrrlvfyvtladikervgtg
aetggqnkvifadecrrylrqiyanypellkllvpvlglsntdltqgdrspvdlffmdtlpvtpprarplnmvgdmlkgn
pqtdiyiniiennhvlnvvlkymkggqeklteeakaayqtlkgetaheklytawlalqmsvdvlldvnmsremksgeglk
qiiekkcglirshmmgkrvnyaartvitpaypninvdeigipdifakklsypvpvtewnvtdvrkmvmngpdvhpganyi
qdkngfttyipadnaskreslaklllsnpkdgikivhrhvlngdvlllnrqpslhkpsimghkarilhgektfrlhysnc
kaynadfdgdemnahypqsevaraeaynlvnvasnylvpkdgtplggliqdhvisgvklsirgrffnredyqqlvfqgls
qlkkdikllpptilkpavlwsgkqilstiiiniipegyerinldsfakiagknwnvsrprppicgtnpegndlsesqvqi
rngellvgvldkqqygattyglihcmyelyggdvstllvtaftkvftfflqlegftlgvkdilvtdvadrkrrkiirecr
nvgnsavaaaleledepphdelvekmeaayvkdskfrvlldrkykslldgytndinstclprglitkfpsnnlqlmvlsg
akgsmvntmqiscllgqielegkrpplmisgkslpsftsfetspksggfidgrfmtgiqpqdfffhcmagreglidtavk
tsrsgylqrclikhleglsvhydltvrdsdnsvvqflygedgldilkskffndkfcadfltqnatailrpaqlqlmkdee
qlvkvqrhekhirswekkkpaklraafthfseelreevevkrpnevnsktgrrrfdegllklwkkadaedkalyrkkyar
cpdptvavykqdlyygsvsertrklitdyakrkpalketiadimrvktikslaapgepvgliaaqsigepstqmtlntfh
fagrgemnvtlgiprlreilmlassniktpsmdipikpgqqhqaeklrinlnsvtlanlleyvhvstgltldpersyeyd
mrfqflprevykedygvrpkhiikymhqtffkqlipppilkvsnasrttkivviddkkdadkdddndldngdevgrskak
andddssddnddddatgvklkqrktdekdyddpddveelhdanddddeaededdeekgqdgndndgddkaverllsndmv
kaytydkenhlwcqvklnlsvryqkpdltsiirelagksvvhqvqhikraiiykgndddqllktdginigemfqhnkild
lnrlysndihaiartygieaasqvivkevsnvfkvygitvdrrhlsliadymtfdgtfqplsrkgmehsssplqqmsfes
slqflksaagfgradelsspssrlmvglpvrngtgafelltkic

yeast (S.c.)
mdiskpvgseitsvdfgiltakeirnlsakqitnptvldnlghpvsgglydlalgaflrnlcstcgldekfcpghqghie
lpvpcynplffnqlyiylrasclfchhfrlksvevhryacklrllqyglidesykldeitlgslnssmytddeaiedted
emdgegskqskdisstllnelkskrseyvdmaiakalsdgrttergsftatvnderkklvhefhkkllsrgkcdncgmfs
pkfrkdgftkifetalnekqitnnrvkgfirqdmikkqkqakkldgsneasandeesfdvgrnpttrpktgstyilstev
knildtvfrkeqcvlqyvfhsrpnlsrklvkadsffmdvlvvpptrfrlpsklgeevhensqnqllskvlttsllirdln
ddlsklqkdkvsledrrvifsrlmnafvtiqndvnafidstkaqgrtsgkvpipgvkqalekkeglfrkhmmgkrvnyaa
rsvispdpnietneigvppvfavkltypepvtayniaelrqavingpdkwpgatqiqnedgslvsligmsveqrkalanq
lltpssnvsthtlnkkvyrhiknrdvvlmnrqptlhkasmmghkvrvlpnektlrlhyantgaynadfdgdemnmhfpqn
enaraealnlantdsqyltptsgspvrgliqdhisagvwltskdsfftreqyqqyiygcirpedghttrskivtlpptif
kpyplwtgkqiittvllnvtppdmpginlisknkikneywgkgslenevlfkdgallcgildksqygaskygivhslhev
ygpevaakvlsvlgrlftnyitataftcgmddlrltaegnkwrtdilktsvdtgreaaaevtnldkdtpaddpellkrlq
eilrdnnksgildavtsskvnaitsqvvskcvpdgtmkkfpcnsmqamalsgakgsnvnvsqimcllgqqalegrrvpvm
vsgktlpsfkpyetdamaggyvkgrfysgikpqeyyfhcmagreglidtavktsrsgylqrcltkqlegvhvsydnsird
adgtlvqfmyggdaiditkeshmtqfefcldnyyallkkynpsaliehldvesalkyskktlkyrkkhskephykqsvky
dpvlakynpakylgsvsenfqdklesfldknsklfkssdgvnekkfralmqlkymrslinpgeavgiiasqsvgepstqm
tlntfhfaghgaanvtlgiprlreivmtasaaiktpqmtlpiwndvsdeqadtfcksiskvllsevidkvivtettgtsn
taggnaarsyvihmrffdnneyseeydvskeelqnvisnqfihlleaaivkeikkqkrttgpdigvavprlqtdvansss
nskrleedndeeqshkktkqavsydepdedeietmreaekssdeegidsdkesdsdsededvdmneqinksiveannnmn
kvqrdrqsaiishhrfitkynfddesgkwcefklelaadtekllmvniveeicrksiirqiphidrcvhpepengkrvlv
tegvnfqamwdqeafidvdgitsndvaavlktygveaarntivneinnvfsryaisvsfrhldliadmmtrqgtylafnr
qgmetstssfmkmsyettcqfltkavldnereqldspsarivvgklnnvgtgsfdvlakvpnaa

Arabidopsis
MAHAQTTEVCLSFHRSLLFPMGASQVVESVRFSFMTEQDVRKHSFLKVTSPILHDNVGNPFPGGLYDLKLGPKDDKQACN
SCGQLKLACPGHCGHIELVFPIYHPLLFNLLFNFLQRACFFCHHFMAKPEDVERAVSQLKLIIKGDIVSAKQLESNTPTK
SKSSDESCESVVTTDSSEECEDSDVEDQRWTSLQFAEVTAVLKNFMRLSSKSCSRCKGINPKLEKPMFGWVRMRAMKDSD
VGANVIRGLKLKKSTSSVENPDGFDDSGIDALSEVEDGDKETREKSTEVAAEFEEHNSKRDLLPSEVRNILKHLWQNEHE
FCSFIGDLWQSGSEKIDYSMFFLESVLVPPTKFRPPTTGGDSVMEHPQTVGLNKVIESNNILGNACTNKLDQSKVIFRWR
NLQESVNVLFDSKTATVQSQRDSSGICQLLEKKEGLFRQKMMGKRVNHACRSVISPDPYIAVNDIGIPPCFALKLTYPER
VTPWNVEKLREAIINGPDIHPGATHYSDKSSTMKLPSTEKARRAIARKLLSSRGATTELGKTCDINFEGKTVHRHMRDGD
IVLVNRQPTLHKPSLMAHKVRVLKGEKTLRLHYANCSTYNADFDGDEMNVHFPQDEISRAEAYNIVNANNQYARPSNGEP
LRALIQDHIVSSVLLTKRDTFLDKDHFNQLLFSSGVTDMVLSTFSGRSGKKVMVSASDAELLTVTPAILKPVPLWTGKQV
ITAVLNQITKGHPPFTVEKATKLPVDFFKCRSREVKPNSGDLTKKKEIDESWKQNLNEDKLHIRKNEFVCGVIDKAQFAD
YGLVHTVHELYGSNAAGNLLSVFSRLFTVFLQTHGFTCGVDDLIILKDMDEERTKQLQECENVGERVLRKTFGIDVDVQI
DPQDMRSRIERILYEDGESALASLDRSIVNYLNQCSSKGVMNDLLSDGLLKTPGRNCISLMTISGAKGSKVNFQQISSHL
GQQDLEGKRVPRMVSGKTLPCFHPWDWSPRAGGFISDRFLSGLRPQEYYFHCMAGREGLVDTAVKTSRSGYLQRCLMKNL
ESLKVNYDCTVRDADGSIIQFQYGEDGVDVHRSSFIEKFKELTINQDMVLQKCSEDMLSGASSYISDLPISLKKGAEKFV
EAMPMNERIASKFVRQEELLKLVKSKFFASLAQPGEPVGVLAAQSVGEPSTQMTLNTFHLAGRGEMNVTLGIPRLQEILM
TAAANIKTPIMTCPLLKGKTKEDANDITDRLRKITVADIIKSMELSVVPYTVYENEVCSIHKLKINLYKPEHYPKHTDIT
EEDWEETMRAVFLRKLEDAIETHMKMLHRIRGIHNDVTGPIAGNETDNDDSVSGKQNEDDGDDDGEGTEVDDLGSDAQKQ
KKQETDEMDYEENSEDETNEPSSISGVEDPEMDSENEDTEVSKEDTPEPQEESMEPQKEVKGVKNVKEQSKKKRRKFVRA
KSDRHIFVKGEGEKFEVHFKFATDDPHILLAQIAQQTAQKVYIQNSGKIERCTVANCGDPQVIYHGDNPKERREISNDEK
KASPALHASGVDFPALWEFQDKLDVRYLYSNSIHDMLNIFGVEAARETIIREINHVFKSYGISVSIRHLNLIADYMTFSG
GYRPMSRMGGIAESTSPFCRMTFETATKFIVQAATYGEKDTLETPSARICLGLPALSGTGCFDLMQRVEL

Pol II:
Arabidopsis
mdtrfpfspaevskvrvvqfgilspdeirqmsvihvehsettekgkpkvgglsdtrlgtidrkvkcetcmanmaecpghf
gylelakpmyhvgfmktvlsimrcvcfncskiladeamkiknpknrlkkildacknktkcdggddiddvqshstdepvkk
srggcgaqqpkltiegmkmiaeyknskeendepdqlpepaerkqtlgadrvlsvlkrisdadcqllgfnpkfarpdwmil
evlpippppvrpsvmmdatsrseddlthqlamiirhnenlkrqekngaprhiisrftqllqfhiatyfdnelpgqpratq
ksgrpiksicsrlkakegrirgnlmgkrvdfsartvitpdptinidelgvpwsialnltypetvtpynierlkelvdygp
hpppgktgakyiirddgqrldlrylkkssdqhlelgyryvllsysihsthkrlflevvifmlswsqverhlqdgdfvlfn
rqpslhkmsimghririmpystfrlnlsvtspynadfdgdemnmhvpqsfetraevlelmmvpkcivspqanrpvmgivq
dtllgcrkitkrdtfiekdvfmntlmwwedfdgkvpapailkprplwtgkqvfnliipkqinllrysawhadtetgfitp
gdtqvriergellagtlckktlgtsngslvhviweevgpdaarkflghtqwlvnywllqngftigigdtiadsstmekin
etisnaktavkdlirqfqgkeldpepgrtmrdtfenrvnqvlnkarddagssaqkslaetnnlkamvtagskgsfinisq
mtacvgqqnvegkripfgfdgrtlphftkddygpesrgfvensylrgltpqefffhamggreglidtavktsetgyiqrr
lvkamedimvkydgtvrnslgdviqflygedgmdavwiesqkldslkmkksefdrtfkyeiddenwnptylsdehledlk
girelrdvfdaeyskletdrfqlgteiatngdstwplpvnikrhiwnaqktfkidlrkisdmhpveivdavdklqerllv
vpgddalsveaqknatlffnillrstlaskrvleeyklsreafewvigeiesrflqslvapgemigcvpaqsigepatqm
tlntfhyagvsaknvtlgvprlreiinvakriktpslsvyltpeaskskegaktvqcaleyttlrsvtqatevwydpdpm
stiieedfefvrsyyempdedvspdkispwllrielnremmvdkklsmadiaekinlefdddltcifnddnaqklilrir
imndegpkgelqdesaeddvflkkiesnmltemalrgipdinkvfikqvrksrfdeeggfktseewmldtegvnllavmc
hedvdpkrttsnhlieiievlgieavrralldelrvvisfdgsyvnyrhlailcdtmtyrghlmaitrhginrndtgplm
rcsfeetvdilldaaayaetdclrgvtenimlgqlapigtgdcelylndemlknaielqlpsymdglefgmtparspvsg
tpyhegmmspnyllspnmrlspmsdaqfspyvggmafspssspgyspsspgysptspgysptspgysptspgysptspty
spsspgysptspaysptspsysptspsysptspsysptspsysptspsysptspsysptspaysptspaysptspayspt
spsysptspsysptspsysptspsysptspsysptspaysptspgysptspsysptspsygptspsynpqsakyspsiay
spsnarlspaspysptspnysptspsysptspsyspssptyspsspyssgaspdyspsagysptlpgyspsstgqytphe
gdkkdktgkkdaskddkgnp

Drosophila
mstptdskaplrqvkrvqfgilspdeirrmsvteggvqfaetmeggrpklgglmdprqgvidrtsrcqtcagnmtecpgh
fghidlakpvfhigfitktikilrcvcfycskmlvsphnpkikeivmksrgqprkrlayvydlckgkticeggedmdltk
enqqpdpnkkpghggcghyqpsirrtgldltaewkhqnedsqekkivvsaervweilkhitdeecfilgmdpkyarpdwm
ivtvlpvpplavrpavvmfgaaknqddlthklsdiikannelrkneasgaaahviqenikmlqfhvatlvdndmpgmpra
mqksgkplkaikarlkgkegrirgnlmgkrvdfsartvitpdpnlridqvgvprsiaqnltfpelvtpfnidrmqelvrr
gnsqypgakyivrdngeridlrfhpkssdlhlqcgykverhlrdddlvifnrqptlhkmsmmghrvkvlpwstfrmnlsc
tspynadfdgdemnlhvpqsmetraevenihitprqiitpqankpvmgivqdtltavrkmtkrdvfitreqvmnllmflp
twdakmpqpcilkprplwtgkqifsliipgnvnmirthsthpdeedegpykwispgdtkvmvehgelimgilckkslgts
agsllhicflelghdiagrfygniqtvinnwllfeghsigigdtiadpqtyneiqqaikkakddvinviqkahnmelept
pgntlrqtfenkvnrilndahdktggsakkslteynnlkamvvsgskgsninisqviacvgqqnvegkripygfrkrtlp
hfikddygpesrgfvensylagltpsefyfhamggreglidtavktaetgyiqrrlikamesvmvnydgtvrnsvgqliq
lrygedglcgelvefqnmptvklsnksfekrfkfdwsnerlmkkvftddvikemtdsseaiqeleaewdrlvsdrdslrq
ifpngeskvvlpcnlqrmiwnvqkifhinkrlptdlspirvikgvktllercvivtgndriskqanenatllfqclirst
lctkyvseefrlsteafewlvgeietrfqqaqanpgemvgalaaqslgepatqmtlntfhfagvssknvtlgvprlkeii
niskkpkapsltvfltggaardaekaknvlcrlehttlrkvtantaiyydpdpqrtvisedqefvnvyyempdfdptris
pwllrieldrkrmtdkkltmeqiaekinvgfgedlncifnddnadklvlririmnneenkfqdedeavdkmeddmflrci
eanmlsdmtlqgieaigkvymhlpqtdskkrivitetgefkaigewlletdgtsmmkvlserdvdpirtssndiceifqv
lgieavrksvekemnavlqfyglyvnyrhlallcdvmtakghlmaitrhginrqdtgalmrcsfeetvdvlmdaaahaet
dpmrgvseniimgqlpkmgtgcfdllldaekcrfgieipntlgnsmlggaamfigggstpsmtppeldsawancntpryf
sppghvsamtpggpsfspsaasdasgmspswspahpgsspsspgpsmspyfpaspsvspsysptspnytasspggaspny
spsspnysptsplyaspryasttpnfnpqstgyspsssgysptspvysptvqfqsspsfagsgsniyspgnayspsssny
spnspsysptspsyspsspsysptspcysptspsysptspnytpvtpsysptspnysaspqyspaspaysqtgvkyspts
ptysppspsydgspgspqytpgspqyspaspkysptsplyspsspqhspsnqysptgstysatspryspnmsiyspsstk
ysptsptytptarnysptspmysptapshysptspayspssptfeesedvrkggrg

human
mhgggppsgdsacplrtikrvqfgvlspdelkrmsvteggikypetteggrpklgglmdprqgviertgrcqtcagnmte
cpghfghielakpvfhvgflvktmkvlrcvcffcskllvdsnnpkikdilakskgqpkkrlthvydlckgkniceggeem
dnkfgveqpegdedltkekghggcgryqprirrsglelyaewkhvnedsqekkillspervheifkrisdeecfvlgmep
ryarpewmivtvlpvpplsvrpavvmqgsarnqddlthkladivkinnqlrrneqngaaahviaedvkllqfhvatmvdn
elpglpramqksgrplkslkqrlkgkegrvrgnlmgkrvdfsartvitpdpnlsidqvgvprsiaanmtfaeivtpfnid
rlqelvrrgnsqypgakyiirdngdridlrfhpkpsdlhlqtgykverhmcdgdivifnrqptlhkmsmmghrvrilpws
tfrlnlsvttpynadfdgdemnlhlpqsletraeiqelamvprmivtpqsnrpvmgivqdtltavrkftkrdvflergev
mnllmflstwdgkvpqpailkprplwtgkqifsliipghincirthsthpddedsgpykhispgdtkvvvengelimgil
ckkslgtsagslvhisylemghditrlfysniqtvinnwllieghtigigdsiadsktyqdiqntikkakqdvievieka
hnneleptpgntlrqtfenqvnrilndardktgssaqkslseynnfksmvvsgakgskinisqviavvgqqnvegkripf
gfkhrtlphfikddygpesrgfvensylagltptefffhamggreglidtavktaetgyiqrrliksmesvmvkydatvr
nsinqvvqlrygedglagesvefqnlatlkpsnkafekkfrfdytneralrrtlqedlvkdvlsnahiqnelerefermr
edrevlrvifptgdskvvlpcnllrmiwnaqkifhinprlpsdlhpikvvegvkelskklvivngddplsrqaqenatll
fnihlrstlcsrrmaeefrlsgeafdwllgeieskfnqaiahpgemvgalaaqslgepatqmtlntfhyagvsaknvtlg
vprlkeliniskkpktpsltvfllgqsardaerakdilcrlehttlrkvtantaiyydpnpqstvvaedqewvnvyyemp
dfdvarispwllrveldrkhmtdrkltmeqiaekinagfgddlncifnddnaeklvlririmnsdenkmqeeeevvdkmd
ddvflrciesnmltdmtlqgieqiskvymhlpqtdnkkkiiitedgefkalqewiletdgvslmrvlsekdvdpvrttsn
diveiftvlgieavrkalerelyhvisfdgsyvnyrhlallcdtmtcrghlmaitrhgvnrqdtgplmkcsfeetvdvlm
eaaahgesdpmkgvsenimlgqlapagtgcfdllldaekckygmeiptnipglgaagptgmffgsapspmggispamtpw
nqgatpaygawspsvgsgmtpgaagfspsaasdasgfspgyspawsptpgspgspgpsspyipspggamspsysptspay
eprspggytpqspsysptspsysptspsysptspnysptspsysptspsysptspsysptspsysptspsysptspsysp
tspsysptspsysptspsysptspsysptspsysptspsysptspsysptspsysptspsysptspnysptspnytptsp
sysptspsysptspnytptspnysptspsysptspsysptspsyspssprytpqsptytpsspsyspsspsysptspkyt
ptspsyspsspeytpaspkysptspkysptspkysptsptyspttpkysptsptysptspvytptspkysptsptyspts
pkysptsptysptspkgstysptspgysptsptysltspaispddsdeen

yeast (S.c)
mvgqqyssaplrtvkevqfglfspeevraisvakirfpetmdetqtrakigglndprlgsidrnlkcqtcqegmnecpgh
fghidlakpvfhvgfiakikkvcecvcmhcgkllldehnelmrqalaikdskkrfaaiwtlcktkmvcetdvpseddptq
lvsrggcgntqptirkdglklvgswkkdratgdadepelrvlsteeilnifkhisvkdftslgfnevfsrpewmiltclp
vppppvrpsisfnesqrgeddltfkladilkanisletlehngaphhaieeaesllqfhvatymdndiagqpqalqksgr
pvksirarlkgkegrirgnlmgkrvdfsartvisgdpnleldqvgvpksiaktltypevvtpynidrltqlvrngpnehp
gakyvirdsgdridlryskragdiqlqygwkverhimdndpvlfnrqpslhkmsmmahrvkvipystfrlnlsvtspyna
dfdgdemnlhvpqseetraelsqlcavplqivspqsnkpcmgivqdtlcgirkltlrdtfieldqvlnmlywvpdwdgvi
ptpaiikpkplwsgkqilsvaipngihlqrfdegttllspkdngmliidgqiifgvvekktvgssngglihvvtrekgpq
vcaklfgniqkvvnfwllhngfstgigdtiadgptmreitetiaeakkkvldvtkeaqanlltakhgmtlresfednvvr
flneardkagrlaevnlkdlnnvkqmvmagskgsfiniaqmsacvgqqsvegkriafgfvdrtlphfskddyspeskgfv
ensylrgltpqefffhamggreglidtavktaetgyiqrrlvkaledimvhydnttrnslgnviqfiygedgmdaahiek
qsldtiggsdaafekryrvdllntdhtldpsllesgseilgdlklqvlldeeykqlvkdrkflrevfvdgeanwplpvni
rriiqnaqqtfhidhtkpsdltikdivlgvkdlqenllvlrgkneiiqnaqrdavtlfccllrsrlatrrvlqeyrltkq
afdwvlsnieaqflrsvvhpgemvgvlaaqsigepatqmtlntfhfagvaskkvtsgvprlkeilnvaknmktpsltvyl
epghaadqeqaklirsaiehttlksvtiaseiyydpdprstvipedeeiiqlhfslldeeaeqsfdqqspwllrleldra
amndkdltmgqvgerikqtfkndlfviwsedndekliircrvvrpksldaeteaeedhmlkkientmlenitlrgvenie
rvvmmkydrkvpsptgeyvkepewvletdgvnlsevmtvpgidptriytnsfidimevlgieagraalykevynviasdg
syvnyrhmallvdvmttqggltsvtrhgfnrsntgalmrcsfeetveilfeagasaelddcrgvsenvilgqmapigtga
fdvmideeslvkympeqkiteiedgqdggvtpysnesglvnadldvkdelmfsplvdsgsndamaggftayggadygeat
spfgaygeaptspgfgvsspgfsptsptysptspaysptspsysptspsysptspsysptspsysptspsysptspsysp
tspsysptspsysptspsysptspsysptspsysptspsysptspsysptspsysptspaysptspsysptspsysptsp
sysptspsysptspnysptspsysptspgyspgspayspkqdeqkhnenensr

pol III:
human
mvkeqfretdvakktshicfgmkspeemrqqahiqvvsknlysqdnqhapllygvldhrmgtsekdrpcetcgknladcl
ghygyidlelpcfhvgyfravigilqmicktcchimlsqeekkqfldylkrpgltylqkrglkkkisdkcrkknichhcg
afngtvkkcgllkiihekyktnkkvvdpivsnflqsfetaiehnkevepllgraqenlnplvvlnlfkripaedvplllm
npeagkpsdliltrllvpplcfrpsvvsdlksgtneddltmklteiiflndvikkhrisgaktqmimedwdflqlqcaly
inselsgiplnmapkkwtrgfvqrlkgkqgrfrgnlsgkrvdfsgrtvispdpnlridevavpvhvakiltfpekvnkan
inflrklvqngpevhpganfiqqrhtqmkrflkygnrekmaqelkygdiverhlidgdvvlfnrqpslhklsimahlarv
kphrtfrfnecvctpynadfdgdemnlhlpqteeakaealvlmgtkanlvtprngepliaaiqdfltgaylltlkdtffd
rakacqiiasilvgkdekikvrlppptilkpvtlwtgkqifsvilrpsddnpvranlrtkgkqycgkgedlcandsyvti
qnselmsgsmdkgtlgsgsknnifyillrdwgqneaadamsrlarlapvylsnrgfsigigdvtpgqgllkakyellnag
ykkcdeyiealntgklqqqpgctaeetlealilkelsvirdhagsaclreldksnspltmalcgskgsfinisqmiacvg
qqaisgsrvpdgfenrslphfekhsklpaakgfvansfysgltptefffhtmagreglvdtavktaetgymqrrlvksle
dlcsqydltvrsstgdiiqfiyggdgldpaamegkdeplefkrvldnikavfpcpsepalsknelilttesimkkseflc
cqdsflqeikkfikgvsekikktrdkygindngtteprvlyqldritptqvekfletcrdkymraqmepgsavgalcaqs
igepgtqmtlktfhfggvasmnitlgvprikeiinaskaistpiitaqldkdddadyarlvkgriektllgeiseyieev
flpddcfilvklslerirllrlevnaetvrysictsklrvkpgdvavhgeavvcvtprenskssmyyvlqflkedlpkvv
vqgipevsravihideqsgkekykllvegdnlravmathgvkgtrttsnntyevektlgieaarttiineiqytmvvnhg
msidrrhvmllsdlmtykgevlgitrfglakmkesvlmlasfektadhlfdaayfgqkdsvcgvseciimgipmnigtgl
fkllhkadrdpnppkrplifdtnefhiplvt

trypanosome
mlkgssstsfllpqqfveplphapveisalhygllsrndvhrlsvlpcrrvvgdvkeygvndarlgvcdrlsicetcgln
siecvghpghidleapvfhlgffttvlricrtickrcshvllddteidyykrrlssssleplqrtmliktiqtdayktrv
clkcgglngvvrrvrpmrlvhekyhveprrgegprenpggffdaelrtacaynkvvgecrefvhdfldpvrvrqlflavp
pgevillglapgvsptdllmttllvppvpvrprgcagtttvrdddltaqyndilvstdtmqdgsldatrytetwemlqmr
aarlldsslpgfppnvrtsdlksyaqrlkskhgrfrcnlsgkrvdysgrsvispdpnldvdelavplhvarvltypqrvf
kanhelmrrlvrngphvhpgattvylaqegskkslknerdrhrlaarlavgdiverhvmngdlvlfnrqpslhrvsmmah
rarvlpfrtfrfnecccapynadfdgdemnvhfvqtekaraealqlmstarniisakngepiiactqdflaaaylvtsrd
vffdrgefsqmvshwlgpvtqfrlpipailkpvelwtgkqlfelivrpspevdvllsfeaptkfytrkgkhdcaeegyva
fldscfisgrldkkllgggakdglfarlhtiagggytarvmsriaqftsryltnygfslglgdvaptpelnkqkaavlar
svevcdgliksaktgrmiplpgltvkqslearlntelskvrdecgtaavqtlsihnntplimvqsgskgsalniaqmmac
vgqqtvsgkrildafqdrslphfhrfeeapaargfvansfysglsptefffhtmagreglvdtavktaetgyiyrrlmka
menlsvrydgtvrntkgdviqlrfgedgldpqlmegnsgtplnleqewlsvraayarwvvgllagsktasdgnairdnen
yfnefismlptegpsfveaclngdqealkvceeqesredalhnsngktndresrprtgrlrravlishlvkvcsrkfkdd
iqdffvkkvreqqrirnllnlpntsrertegggdnsgpiankrtkkrapslkvkdskeggrvselrdlemlqtellpltr
gmvtrfiaqcaskylrkacepgtpcgaiaaqsvgepstqmtlrtfhfagvasmsitqgvprlvevinanrniatpvvtap
vllmegeenhceifrkrarfvkaqiervllrevvseivevcsdtefylrvhlnmsvitklhlpinaitvrqrilaaaght
msplrmlnedcievfsldtlavyphfqdarwvhfslrrilgllpdvvvggigginramissngtevlaegaelravmnlw
gvdstrvvcnhvavvervlgieaarrvivdeiqnilkayslsidvrhvylladlmtqrgvvlgitrygiqkmnfnvltma
sferttdhlynaaatqrvdrdlsvsdsiivgkpvplgttsfdllldgsisndilppqrcvkrgmgpnfhtakrhhlvpla
aegvfrldlf

yeast (S.c)
mkevvvsetpkrikglefsalsaadivaqsevevstrdlfdlekdrapkangaldpkmgvsssslecatchgnlaschgh
fghlklalpvfhigyfkatiqilqgickncsaillsetdkrqflhelrrpgvdnlrrmgilkkildqckkqrrclhcgal
ngvvkkaaagagsaalkiihdtfrwvgkksapekdiwvgewkevlahnpeleryvkrcmddlnplktlnlfkqiksadce
llgidatvpsgrpetyiwrylpappvcirpsvmmqdspasneddltvklteivwtsslikagldkgisinnmmehwdylq
ltvamyinsdsvnpamlpgssngggkvkpirgfcqrlkgkqgrfrgnlsgkrvdfsgrtvispdpnlsidevavpdrvak
vltypekvtrynrhklqelivngpnvhpganyllkrnedarrnlrygdrmklaknlqigdvverhledgdvvlfnrqpsl
hrlsilshyakirpwrtfrlnecvctpynadfdgdemnlhvpqteearaeainlmgvknnlltpksgepiiaatqdfitg
sylishkdsfydratltqllsmmsdgiehfdipppaimkpyylwtgkqvfsllikpnhnspvvinldaknkvfvppksks
lpnemsqndgfviirgsqilsgvmdksvlgdgkkhsvfytilrdygpqeaanamnrmaklcarflgnrgfsigindvtpa
ddlkqkkeelveiayhkcdelitlfnkgeletqpgcneeqtleakiggllskvreevgdvcineldnwnaplimatcgsk
gstlnvsqmvavvgqqiisgnrvpdgfqdrslphfpknsktpqskgfvrnsffsglsppeflfhaisgreglvdtavkta
etgymsrrlmksledlscqydntvrtsangivqftyggdgldplemegnaqpvnfnrswdhaynitfnnqdkgllpyaim
etaneilgpleerlvrydnsgclvkredlnkaeyvdqydaerdfyhslreyingkatalanlrksrgmlglleppakelq
gidpdetvpdnvktsvsqlyriseksvrkfleialfkyrkarlepgtaigaigaqsigepgtqmtlktfhfagvasmnvt
lgvprikeiinaskvistpiinavlvndnderaarvvkgrvektllsdvafyvqdvykdnlsfiqvridlgtidklqlel
tiediavaitrasklkiqasdvniigkdriainvfpegykaksistsakepsendvfyrmqqlrralpdvvvkglpdisr
avinirddgkrellvegyglrdvmctdgvigsrtttnhvlevfsvlgieaarysiireinytmsnhgmsvdprhiqllgd
vmtykgevlgitrfglskmrdsvlqlasfekttdhlfdaafymkkdavegvseciilgqtmsigtgsfkvvkgtnisekd
lvpkrclfeslsneaalkan

9/11 me to Craig

sorry .. no useful sites out there for doing phylogenetic analysis … I am working on such a type of thing right now. I tis tricky becuase to do it correctly you need to filter out parts of a multiple sequence alignment to remove badly aligned regions as well as hypervariable regions.

9/12 Craig to Me

Dear Mike,

Yes, I can do this for the atypical RNA polymerase 2nd subunit. I have
already done multiple alignments with it against pol I, II, III subunits
and it is clear that the atypical subunit has amino acid differences that
set it apart, rather than large indels that skew the data. So I think
Jonathan is safe to go ahead and make a figure while I examine the gene
sequences and gene models more carefully.

Any comments on the tone/amount of detail in the section I wrote on the
general transcription machinery? Either way, I will add some references
and send you an updated version as soon as I can.

cheers
Craig

———————
>Speaking on behalf of the editorial committee whom I have not consulted, I
>would be delighted to have this in our section. But we need to check out the
>gene structure in detail (dodgy gene prediction, missing exons etc. Craig,
>could you so this as you know most about these enzymes
>
>All the best
>
>Mike

Me to Craig

Craig

I am still working on a slightly better figure … but I have attached the latest version … I think it is sufficient for submission

I have attached it in a few different formats.

I will be out of town for a few days but checking email.

Jonathan

Craig to Me:

Hi Jonathan,

The phytlogenetic tree figure for the atypical pol subunit looks good
though the font size may need to be reduced to fit “Fungal Plasmids”
between the dividing lines for the adjacent categories. Have you sent a
copy to Mike?

Craig

Craig again

Hi Jonathan,

I forwarded a copy to Mike. Did you ever have a chance to do a tree for the
largest subunit to further test the hypothesis of a pol IV?

Hope you are having fun in LA

Craig

> I am not sure if I sent a copy to mike
>
>I am in LA right now and it would be easier if you could send mike a copy to
>make sure he has one. I will try and edit the figure and send one with a
>smaller font.
>
>J

10/3 Me to Craig:

Criag

Attached is a new version of the rna pol tree with fonts corrected. I am going to add a few more sequences a rerun it and make a new tree tomorrow.

Jonathan

PS Also … here is a potential figure legend

Figure. Phylogenetic tree of RNA polymerase homologs. Homologs of RNA polymerase were identified by searching sequence databases with representatives of the major known RNA polymerase subfamilies. These proteins, as well as six DNA polymerase homologs from A. thaliana, were aligned using clustalx using default settings. Phylogenetic trees were generated from the alignment (with ambiguously aligned regions and hypervariable regions excluded) using the PAUP* program. The tree shows was generated using the neighbor-joining algorithm with pairwise distances between sequences calculated with a PAM-like matrix. Numbers on the branches are bootstrap values indicating the percentage of 100 trees in which the proteins to the right of the node grouped together to the exclusion of all other proteins.

Craig 10/3

Hi Jonathan,

I will look forward to seeing the final tree, as will Mike, I’m sure. For
the legend, the fact that this is an alignment of second-largest subunits
should be made clear. Here is a stab at a minor revision:

Figure—–. Phylogenetic tree for the second-largest subunit of
DNA-dependent RNA polymerases. Homologs of RNA polymerase second-largest
subunits were identified by searching sequence databases with
representatives of the major known subfamilies (e.g. pol I, II, III and
eubacterial beta subunits). Identified proteins, including six homologs
from A. thaliana, were
aligned using clustalx using default settings. Phylogenetic trees were
generated from the alignment (with ambiguously aligned regions and
hypervariable regions excluded) using the PAUP* program. The treewas generated using the neighbor-joining algorithm with pairwise distances
between sequences calculated with a PAM-like matrix. Numbers on the
branches are bootstrap values indicating the percentage of 100 trees inwhich the proteins to the right of the node group together to the
exclusion of all other proteins.

Thanks,
Craig

Me:

much better figure legend

j

Anyway – and so it went. Alas, for a variety of reasons not much made it into the final paper. What was there was this:

Unexpectedly, Arabidopsis has two genes encoding a fourth class of largest subunit and second-largest subunit (Supplementary Information Fig. 5). It will be interesting to determine whether the atypical subunits comprise a polymerase that has a plant-specific function.

And of course, this Supplemental Information is not exactly easy to find and does not actually work correctly anymore:

Downloading the Zip file and opening first page.htm gets one to this

And then clicking on the Figure 5 you get a broken page w/o the Figure.

But there, hidden in the folder with the Supplemental Information is the figure

So that is the beginning of the story on RNA Pol IV in Arabidopsis.

Go read the E-life paper and some of what it cites for the last 15 years of the story.

Free workshop at #UCDavis: “Microbial genomics and transcriptomics hands-on”, Sep 24-25

Was informed of this by Titus Brown, one of the instructors. Info copied from here.

Microbial genomics and transcriptomics hands-on, Sep 24-25

Who: Ben Johnson (Michigan State University); Tracy Teal (Data Carpentry); C. Titus Brown (UC Davis).

Host: C. Titus Brown

When: Sep 24 and 25, 2015

Times: 9am-3pm on both days

Where: TBD (UC Davis campus).

Cost: there is no fee.

This workshop is open to everyone, including graduate students, postdocs, staff, faculty, and community members. We have extra space for UC Davis VetMed affiliates; contact the host if you are an SVM affiliate.

> Register here <

Description

This two-day hands-on workshop will introduce biologists to microbial genomics and transcriptomics. The primary focus will be on genome assembly and annotation, and subsequent transcriptome analysis, of bacteria.

We will be analyzing a stock data set, and we will be using the Amazon cloud.

Topics overview

Logging into the Amazon Cloud
Short read quality and trimming
Genome assembly (with SPAdes or MEGAHIT)
Genome annotation (with Prokka)
RNAseq analysis
Differential expression analysis

Computer requirements¶

Attendees will need to bring a computer with a Web browser, an Internet connection, and an ssh client; Windows users should install MobaXterm before the workshop.

LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons – 0 License (CC0) — fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.

Cold Spring Harbor presents the men’s only view on the evolution of sequencing

On June 5 I posted a guest blog post by an anonymous person writing about the Programming for Biology workshop at Cold Spring Harbor Labs: Guest post on Yet Another Mostly Male Meeting (YAMMM) – Programming for Biology

And this post generated some responses including yesterday a series of responses from whomever is behind the Cold Spring Harbor Meetings Twitter account.

@phylogenomics @mike_schatz We do have a role. Course instructors develop speaker lists but we work with them, especially on diversity. 1/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

@phylogenomics @mike_schatz Gender, race/ethnicity, U.S. representation, and geographic region are some ways we look at diversity. 2/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

@phylogenomics @mike_schatz The gender skew in the 2014 Programming course came up last year and we talked to Simon and Sofia about it. 3/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

@phylogenomics @mike_schatz Simon & Sofia teach a great course that is always very well received and evaluated by its students. 4/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

@phylogenomics @mike_schatz They’ll teach a great course again this year and ensure its list of guest lecturers includes more women. 5/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

Sounds great. And I retweeted all of these.

And then I got an email invite to a new Cold Spring Harbor Meeting: The Evolution of Sequencing Technology: A Half Century of Progress

With a long long list of speakers. Alas, the gender ratio here of speakers is abyssmal. I have highlighted men in yellow and women in green (with the caveat that I always try to giver that assigning gender from names or appearance or records is not always accurate)

Mark Adams, J. Craig Venter Institute
Gillian Air, University of Oklahoma
Shankar Balasubramanian, University of Cambridge, UK
Hagan Bayley, Oxford Nanopore Technologies, Ltd.
David Bentley, Illumina Cambridge, Ltd
Sydney Brenner, Salk Institute for Biological Studies
Nigel Brown, University of Edinburgh, UK
George Brownlee, University of Oxford, UK
Graham Cameron, Bioinformatics Resource, Australia EMBL
Piero Carninci, RIKEN Ctr.for Life Science Technologies, Japan
Norman Dovichi, University of Notre Dame
J. William Efcavitch, Molecular Assemblies, Inc.
Miguel Garcia-Sancho, University of Edinburgh, UK
Mark Gerstein, Yale University
Jack Gilbert, University of Chicago
Walter Gilbert, Harvard University
Philip Green, University of Washington
Leroy Hood, Institute for Systems Biology
Clyde Hutchison, J. Craig Venter Institute
James Kent, University of California, Santa Cruz
Jonas Korlach, Pacific Biosciences
Victor Ling, BC Cancer Agency, Canada
David Lipman, NCBI/NLM National Instiutes of Health
James Lupski, Baylor College of Medicine
Thomas Maniatis, Columbia University Medical Center
W. Richard McCombie, Cold Spring Harbor Laboratory
Joachim Messing, Waksman Institute, Rutgers University
Gene Myers, Max Planck Institute of Molecular Cell Biology & Genetics, Germany
Richard Myers, HudsonAlpha Institute for Biotechnology
Debbie Nickerson, University of Washington
James Ostell, NLM/NCBI
Stephen Quake, Stanford University/HHMI
Charles Richardson, Harvard Medical School
Richard Roberts, New England BioLabs
Jane Rogers, The Genome Analysis Centre, UK
Mostafa Ronaghi, Illumina, Inc.
Yoshiyuki Sakaki, University of Tokyo
Jay Shendure, University of Washington
Melvin Simon, Caltech
Hamilton Smith, J. Craig Venter Institute
Lloyd Smith, University of Wisconsin-Madison
J. Craig Venter, J. Craig Venter Institute
Robert Waterston, University of Washington
James Watson, Cold Spring Harbor Laboratory
Jean Weissenbach, Genoscope, France
Barbara Wold, Caltech
Huanming Yang, Beijing Genomics Institute, China

That is right. 47 speakers. 4 of which are female. For a whopping 7.8 % female speakers. This is one of the most extreme skews I have seen for any meeting. This truly makes me sick to my stomach. Since there are plenty of women who have had and still have fundamentally important roles in the field of sequencing and sequencing technology I infer that this most likely reflects some type of bias in the meeting organization and planning process.

The meeting page lists the organizers as

Mark Adams, J. Craig Venter Institute
Nigel Brown, University of Edinburgh, UK
Mila Pollock, Cold Spring Harbor Laboratory
Robert Waterston, University of Washington

And one of the major sponsors as Illumina.

I think they all have some explaining to do.

One last note – the meeting description says “The opening session will include a tribute to Frederick Sanger, the father of DNA sequencing, and will cover the early efforts in protein, RNA and DNA sequencing.” Really? The father of DNA sequencing? Seems perfect for this meeting I guess.

UPDATE 6/29/15 7 PM PST

Apparently this meeting is part of a series on the history of molecular biology. The meeting page says

The CSHL/Genentech Center Conferences on the History of Molecular Biology & Biotechnology (http://library.cshl.edu/hosted-meetings) aim to explore important themes of discovery in the biological sciences, bringing together scientists who made many of the seminal discoveries that began the field with others whose interests may include the current status of the field, the historical progress of the field, and/or the application of these techniques and approaches in biotechnology and medicine. Previous meetings in the series have included:

Biotechnology: Past, Present & Future (2008)
History of Restriction Enzymes (2013)
Messenger RNA: From Discovery to Synthesis and Regulation in Bacteria and Eukaryotes (2014)
Plasmids: History & Biology (2014)

So I decided to take a peek at these meetings I started with Biotechnology: Past, Present & Future (2008).

Organizers

Mila Pollock
Jan Witkowski

Advisors

Sydney Brenner
Peter Feinstein
Lee Hood
Tom Maniatis
Richard Roberts

Speakers are listed below:

Garen Bohlin
Robert Bud
Don Comb
Peter Feinstein
Maryann Feldman
Herbert Heyneker
John H. Leamon
Yuk-Lam Lo
Alan McHughen
Stelios Papadopoulos
Rich Roberts
Robert Steinbrook
Kenneth Thibodeau
Marc Van Montagu
Charles Weissmann
Julie Xing

For speakers that comes to 14:2 male:female or 12.5 % female

Next I went to History of Restriction Enzymes (2013).

Organizers

Herb Boyer, University of California, San Francisco
Stu Linn, University of California, Berkeley
Mila Pollock, Cold Spring Harbor Laboratory
Richard Roberts, New England BioLabs

Speakers are listed below:

Aneel Aggarwal, Mount Sinai School of Medicine
Werner Arber, University of Basel, Switzerland
Tom Bickle, University of Basel, Switzerland
Herb Boyer, University of California, San Francisco
Jack Chirikjian, Georgetown University
Steve Halford, Bristol University, United Kingdom
Ken Horiuchi, The Rockefeller University
Clyde Hutchison, J. Craig Venter Institute
Arvydas Janulaitis, Institute of Biotechnology, Lithuania
Stu Linn, University of Califoria, Berkeley
Bill Linton, Promega
Arvydas Lubys, Institute of Biotechnology, Lithuania
Matthew Meselson, Harvard University
Rick Morgan, New England BioLabs
Andrzej Piekarowicz, Warsaw University, Poland
Alfred Pingoud, Institute of Biochemistry – Giessen, Germany
Mila Pollock, Cold Spring Harbor Laboratory
Rich Roberts, New England BioLabs
John Rosenberg, University of Pittsburgh
Ham Smith, J. Craig Venter Institute
Bruno Strasser, Yale University & University of Geneva
Geoff Wilson, New England BioLabs

OK that is 21:1 or 4.5 % women. Well, I guess this makes the meeting on sequencing look good.

So then I went to “Messenger RNA: From Discovery to Synthesis and Regulation in Bacteria and Eukaryotes (2014)“. Speakers are below:

Organizers:

James Darnell, The Rockefeller University
Adrian Krainer, Cold Spring Harbor Laboratory
Mila Pollock, Cold Spring Harbor Laboratory

Speakers

Arnold Berk, University of California, Los Angeles
Douglas Black, HHMI, University of California, Los Angeles
George Brawerman, Tufts University School of Medicine
Sydney Brenner, Janelia Farm Research Campus, HHMI
Stephen Buratowski, Harvard Medical School
Louise Chow, University of Alabama
Juan Pablo Couso, University of Sussex, UK
James Darnell, The Rockefeller University
Gideon Dreyfuss, HHMI, University of Pennsylvania
Grigorii Georgiev, Russian Academy of Sciences, Russia
Adrian Krainer, Cold Spring Harbor Laboratory
Tom Maniatis, Columbia University Medical Center
James Manley, Columbia University
Lynne Maquat, University of Rochester Medical Center
Matthew Meselson, Harvard University
Melissa Moore, University of Massachusetts Medical School
Bernard Moss, National Institute of Allergy & Infectious Diseases
Arthur Pardee, Dana Farber Cancer Institute
Mila Pollock, Cold Spring Harbor Laboratory
Rich Roberts, New England BioLabs
Robert Roeder, The Rockefeller University
Mike Rosbash, Brandeis University
Robert Schleif, John Hopkins University
Robert Singer, Albert Einstein College of Medicine
Nahum Sonenberg, McGill University, Montré, Quéc, Canada
Joan Steitz, Yale University/ HHMI
David Tollervey, Wellcome Center for Cell Biology; University of Edinburgh, UK
Jonathan Warner, Albert Einstein College of Medicine
James Watson, Cold Spring Harbor Laboratory

So so much better no? 24:5 Male: Female or 17% female (for the speakers).

Finally I checked out Plasmids: History & Biology (2014)

Organizers

Dhruba Chattoraj, National Cancer Institute, Bethesda, MD
Stanley N. Cohen, Stanford University
Stanley Falkow, Stanford University
Richard Novick, New York University
Chris Thomas, University of Birmingham, UK
Jan Witkowski, Cold Spring Harbor Laboratory, NY

Speakers

Peter Barth, Helsby, Cheshire UK
Susana Brom, Universidad Nacional Autonóma de México, Cuernavaca, Morelos Mexico
Ananda Chakrabarty, University of Illinois
Mike Chandler, Université Sabatier, Toulouse, France
Dhruba Chattoraj, National Cancer Institute, Bethesda, MD
Don Clewell, University of Michigan, Ann Arbor, MI
Stanley N. Cohen, Stanford University
Fernando de la Cruz, Universidad de Cantabria, Spain
R. Curtiss III, Arizona State University, Tempe, AZ
Julian Davies, University of British Columbia, Canada
Stanley Falkow, Stanford University
Laura Frost, University of Alberta, Edmonton, Alberta, Canada
Barbara Funnell, University of Toronto, Toronto, Ontario, Canada
Mathias Grote, Technische Universität Berlin, Germany
George A. Jacoby, Lahey Clinic, Burlington, MA
Mark Jones, Life Sciences Foundation, San Francisco, CA
Saleem Khan, University of Pittsburgh
Bruce Levin, Emory University, Atlanta, GA
John Mekalanos, Harvard Medical School
Marc van Montagu, Ghent University, Belgium
Richard Novick, New York University
David Sherratt, University of Oxford, UK
David Summers, University of Cambridge, UK
Chris Thomas, University of Birmingham, UK
Eva Top, University of Idaho, Moscow, ID
Gerhart Wagner, Uppsala University, Sweden
Michael Yarmolinsky, National Cancer Institute, Bethesda MD
Peter Young, University of York, UK

That comes to 24:4 for speakers or 14% female.

Notice any patterns? The totals for these meetings come to 17 women out of 142 speakers. Or ~12 %. That is a dismal record for Cold Spring Harbor Labs and certainly does not convince me that they are trying at all to have diversity represented at their meetings. I note – I truly love many things about CSHL. This is definitely not one of them.

UPDATE 2 – Some discussion of this post on Twitter

@nl_brown @phylogenomics wrote ‘plenty of women who have had & still have fundamentally important roles in the field of sequencing&seqtech’

— Geertje van Keulen (@DrGvanK) June 27, 2015

//platform.twitter.com/widgets.js

@DrGvanK Sadly, it is correct as the history of sequencing is male-dominated. Original phiX paper had 1/9 women authors and… (1/2)

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@DrGvanK (2/2) …Gillian Air is on list. There will be 1 further replacement female/male. Not proud & could have done better on new techs

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@DrGvanK Would be good if they were named and could be invited to speak.

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@phylogenomics @DrGvanK Attempt was to represent the history of DNA sequencing. This is why I sought names we might have missed. (1/2)

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@phylogenomics @DrGvanK (2/2) Taking diversity over actual history is both token & revisionist. Would have loved more equality in 1900s

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@nl_brown @DrGvanK I don’t buy it; there are always different angles on history and technology and your appears severely skewed towards men

— Jonathan Eisen (@phylogenomics) June 27, 2015

//platform.twitter.com/widgets.js

@nl_brown @DrGvanK for example, though I love @gilbertjacka what role exactly did he have in history of DNA sequencing? (sorry Jack)

— Jonathan Eisen (@phylogenomics) June 27, 2015

//platform.twitter.com/widgets.js

@nl_brown @DrGvanK @gilbertjacka and if the meeting includes applications of sequencing I can think of dozens of women who could be there

— Jonathan Eisen (@phylogenomics) June 27, 2015

//platform.twitter.com/widgets.js

@nl_brown @DrGvanK @gilbertjacka many on speaker list who don’t do sequencing technology per se; if include those could include many others

— Jonathan Eisen (@phylogenomics) June 27, 2015

//platform.twitter.com/widgets.js UPDATE 3: Made a Storify w/ some of the discussions

//storify.com/phylogenomics/cold-spring-harbor-history-of-science-meetings-gen/embed?border=false //storify.com/phylogenomics/cold-spring-harbor-history-of-science-meetings-gen.js?border=false[View the story “Cold Spring Harbor History of Science Meetings Gender Bias” on Storify]

Whole issues of Genome Biology/Genome Medicine on "Genomics of Infectious Disease"

Wow this has really got some nice papers: BioMed Central | Article collections | Genomics of infectious diseases special issue. I note – this goes well as a follow up to the series I co-coordinated in PLOS a few years back: Genomics of Emerging Infectious Disease – PLOS Collections

From their site:

Infectious diseases are major contributors to global morbidity and mortality, and have a devastating impact on public health. The World Health Organization estimates that 1 in 3 deaths worldwide are due to an infectious disease, with a disproportionate number occurring in developing regions.

While the completion of the first genome sequence of a pathogen, Haemophilus influenzae, in 1995 took decades of work, in recent years, high-throughput technologies have revolutionized the study of pathogens. Whole-genome sequences are now achievable within days and available for multiple pathogens, including those that cause neglected tropical diseases, which has advanced our understanding of the biology and evolution of pathogens. Crucially, such research has enabled important advances in the clinical management of infectious diseases, and continues to guide public health interventions worldwide.

In this cross-journal special issue, guest edited by George Weinstock (The Jackson Laboratory for Genomic Medicine, USA) and Sharon Peacock (University of Cambridge, UK), Genome Biology and Genome Medicine take stock of where we are now, with a collection of primary research and commissioned articles that discuss different aspects of the genomics of infectious diseases in human populations, including the progress made towards their eradication, and the remaining challenges in terms of both fundamental science and clinical management.

I have copied the list from their site (I am pretty sure this is OK since these are #OpenAccess journals but not 100% sure):

	Editorial Human infectious diseases in the genomics era: where do we go from here? Ripudaman K Bains Genome Biology 2014, 15:529 (22 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Review Genomic analysis of emerging pathogens: methods, application and future trends Lucy M Li, Nicholas C Grassly, Christophe Fraser Genome Biology 2014, 15:541 (22 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research The genome of the sparganosis tapeworm Spirometra erinaceieuropaei isolated from the biopsy of a migrating brain lesion Hayley M Bennett, Hoi Ping Mok, Effrossyni Gkrania-Klotsas, Eleanor J Stanley, Isheng J Tsai, Nagui M Antoun, Avril Coghlan, Bhavana Harsha, Alessandra Traini, Diogo M Ribeiro, Sascha Steinbass, Sebastian B Lucas, Kieren S.J Allinson, Stephen J Price, Thomas S Santarius, Andrew J Carmichael, Peter L Chiodini, Nancy Holroyd, Andrew F Dean, Matthew Berriman Genome Biology 2014, 15:510 (21 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Research highlight Stopping outbreaks with real-time genomic epidemiology Patrick Tang, Jennifer L Gardy Genome Medicine 2014, 6:104 (20 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Software YMAP: a pipeline for visualization of copy number variation and loss of heterozygosity in eukaryotic pathogens Darren Abbey, Jason Funt, Mor N Lurie-Weinberger, Dawn A Thompson, Aviv Regev, Chad L Myers, Judith Berman Genome Medicine 2014, 6:100 (20 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Comment Single cell genomics of bacterial pathogens: outlook for infectious disease research Jeffrey S McLean, Roger S Lasken Genome Medicine 2014, 6:108 (20 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Software SRST2: Rapid genomic surveillance for public health and hospital microbiology labs Michael Inouye, Harriet Dashnow, Lesley-Ann Raven, Mark B Schultz, Bernard J Pope, Takehiro Tomita, Justin Zobel, Kathryn E Holt Genome Medicine 2014, 6:90 (20 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research Genomic epidemiology of a protracted hospital outbreak caused by multidrug-resistant Acinetobacter baumannii in Birmingham, England Mihail R Halachev, Jacqueline Chan, Chrystala I Constantinidou, Nicola Cumley, Craig Bradley, Matthew Smith-Banks, Beryl Oppenheim, Mark J Pallen Genome Medicine 2014, 6:70 (20 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Editorial Next-generation pathogen genomics George M Weinstock, Sharon J Peacock Genome Biology 2014, 15:528 (19 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Software Rapid Core-Genome Alignment and Visualization for Thousands of Intraspecific Microbial Genomes Todd J Treangen, Brian D Ondov, Sergey Koren, Adam M Phillippy Genome Biology 2014, 15:524 (19 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Method BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads Lewis Z Hong, Shuzhen Hong, Han Teng Wong, Pauline PK Aw, Cheng Yan, Andreas Wilm, Paola F de Sessions, Seng Gee Lim, Niranjan Nagarajan, Martin L Hibberd, Stephen R Quake, William F Burkholder Genome Biology 2014, 15:517 (19 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Editorial Microbial sequencing to improve individual and population health Sharon J Peacock, George M Weinstock Genome Medicine 2014, 6:103 (19 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Opinion Genomics and infectious disease: a call to identify the ethical, legal and social implications for public health and clinical practice Gail Geller, Rachel Dvoskin, Chloe L Thio, Priya Duggal, Michelle H Lewis, Theodore C Bailey, Andrea Sutherland, Daniel A Salmon, Jeffrey P Kahn Genome Medicine 2014, 6:106 (18 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Review Epidemiologic data and pathogen genome sequences: a powerful synergy for public health Yonatan H Grad, Marc Lipsitch Genome Biology 2014, 15:538 (18 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Method Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses in clinical and biological samples Christian B Matranga, Kristian G Andersen, Sarah Winnicki, Michele Busby, Adrianne D Gladden, Ryan Tewhey, Matthew Stremlau, Aaron Berlin, Stephen K Gire, Eleina England, Lina M Moses, Tarjei S Mikkelsen, Ikponmwosa Odia, Philomena E Ehiane, Onikepe Folarin, Augustine Goba, S.Humarr Khan, Donald S Grant, Anna Honko, Lisa Hensley, Christian Happi, Robert F Garry, Christine M Malboeuf, Bruce W Birren, Andreas Gnirke, Joshua Z Levin, Pardis C Sabeti Genome Biology 2014, 15:519 (18 November 2014) Abstract \| Provisional PDF \| PubMed\| Editor’s summary
	Research The conjunctival microbiome in health and trachomatous disease: a case control study Yanjiao Zhou, Martin J Holland, Pateh Makalo, Hassan Joof, Chrissy h Roberts, David Maybe, Robin L Bailey, Matthew J Burton, George M Weinstock, Sarah E Burr Genome Medicine 2014, 6:99 (15 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Research Proteomics informed by transcriptomics reveals Hendra virus sensitizes bat cells to TRAIL mediated apoptosis James W Wynne, Brian J Shiell, Glenn A Marsh, Victoria Boyd, Jennifer A Harper, Kate Heesom, Paul Monaghan, Peng Zhou, Jean Payne, Reuben Klein, Shawn Todd, Lawrence Mok, Diane Green, John Bingham, Mary Tachedjian, Michelle L Baker, David Matthews, Lin-Fa Wang Genome Biology 2014, 15:532 (15 November 2014) Abstract \| Provisional PDF \| PubMed\| Editor’s summary
	Method A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens Maha R Farhat, B Shapiro, Samuel K Sheppard, Caroline Colijn, Megan Murray Genome Medicine 2014, 6:101 (15 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Review Engineering the control of mosquito-borne infectious diseases Paolo Gabrieli, Andrea Smidler, Flaminia Catteruccia Genome Biology 2014, 15:535 (15 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research A Genomic and Evolutionary Approach Reveals Non-Genetic Drug Resistance in Malaria Jonathan D Herman, Daniel P Rice, Ulf Ribacke, Jacob Silterra, Amy A Deik, Eli Moss, Kate M Broadbent, Daniel E Neafsey, Michael M Desai, Clary B Clish, Ralph Mazitschek, Dyann F Wirth Genome Biology 2014, 15:511 (14 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Research highlight The road to drug resistance in Mycobacterium tuberculosis Anastasia Koch, Robert Wilkinson Genome Biology 2014, 15:520 (13 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research highlight A CRISPR design for next-generation antimicrobials Chase L Beisel, Ahmed A Gomaa, Rodolphe Barrangou Genome Biology 2014, 15:516 (8 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research Evolution of extensively drug-resistant Mycobacterium tuberculosis from a susceptible ancestor in a single patient Vegard Eldholm, Gunnstein Norheim, Bent von der Lippe, Wibeke Kinander, Ulf R Dahle, Dominique A Caugant, Turid Mannsåker, Anne Mengshoel, Anne Dyrhol-Riise, Francois Balloux Genome Biology 2014, 15:490 (7 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Opinion Translating genomics research into control of tuberculosis: lessons learned and future prospects Digby F Warner, Valerie Mizrahi Genome Biology 2014, 15:514 (7 November 2014) Abstract \| Full text \| PDF \| PubMed\| Editor’s summary
	Comment Empowering African genomics for infectious disease control Onikepe A Folarin, Anise N Happi, Christian T Happi Genome Biology 2014, 15:515 (7 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research highlight Bringing non-human primate research into the post-genomic era: how monkeys are teaching us about elite controllers of HIV/AIDS Eric J Vallender Genome Biology 2014, 15:507 (7 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research Whole genome sequencing of SIV-infected macaques identifies candidate loci that may contribute to host control of virus replication Adam J Ericsen, Gabriel J Starrett, Justin M Greene, Michael Lauck, Muthuswamy Raveendran, David Deiros, Mariel S Mohns, Nicolas Vince, Brian T Cain, Ngoc H Pham, Jason T Weinfurter, Adam L Bailey, Melisa L Budde, Roger W Wiseman, Richard Gibbs, Donna Muzny, T homas C Friedrich, Jeffrey Rogers, David H O’Connor Genome Biology 2014, 15:478 (7 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research Comparative analyses of Legionella species identifies genetic features of strains causing Legionnaires’ Disease Laura Gomez Valero, Christophe Rusniok, Monica Rolando, Mario Neou, Delphine Dervins-Ravault, Jasmin Demirtas, Zoe Rouy, Robert J Moore, Honglei Chen, Nicola K Petty, Sophie Jarraud, Jerome Etienne, Michael Steinert, Klaus Heuner, Simonetta Gribaldo, Claudine Médigue, Gernot Glöckner, Elizabeth L Hartland, Carmen Buchrieser Genome Biology 2014, 15:505 (3 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Research Staphylococcus aureus gene expression in a rat model of infective endocarditis Frank Hanses, Christelle Roux, Paul M Dunman, Bernd Salzberger, Jean C Lee Genome Medicine 2014, 6:93 (3 November 2014) Abstract \| Full text \| PDF \| PubMed\| Editor’s summary
	Research Gene flow in environmental Legionella pneumophila leads to genetic and pathogenic heterogeneity within a Legionnaires’ disease outbreak Paul McAdam, Charles vander broek, Diane Lindsay, Melissa Ward, Mary Hanson, Michael Gillies, Mike Watson, Joanne Stevens, Giles Edwards, Ross Fitzgerald Genome Biology 2014, 15:504 (3 November 2014) Abstract \| Provisional PDF \| PubMed\| Editor’s summary
	Research Mapping and manipulating the Mycobacterium tuberculosis transcriptome using a transcription factor overexpression-derived regulatory network Tige R Rustad, Kyle J Minch, Shuyi Ma, Jessica K Winkler, Samuel Hobbes, Mark J Hickey, William Brabant, Serdar Turkarslan, Nathan D Price, Nitin S Baliga, David R Sherman Genome Biology 2014, 15:502 (3 November 2014) Abstract \| Provisional PDF \| PubMed\| Editor’s summary

NIH Announces Revised Genome Data Release Policies

Just got notified of this by the UC Davis Med. School grants administration: NOT-OD-14-124: NIH Genomic Data Sharing Policy. Lots of interesting things in here including a summary of the comments that they received on the draft policy.

I have copied some of the more interesting and relevant bits below:

Sharing research data supports the NIH mission and is essential to facilitate the translation of research results into knowledge, products, and procedures that improve human health. NIH has longstanding policies to make a broad range of research data, in addition to genomic data, publicly available in a timely manner from the research activities that it funds.
The public comments have been posted on the NIH GDS website. http://gds.nih.gov/pdf/GDS_Policy_Public_Comments.PDF
The statement of scope remains intentionally general enough to accommodate the evolving nature of genomic technologies and the broad range of research that generates genomic data.
Several comments were submitted by representatives or members of tribal organizations about data access. Tribal groups expressed concerns about the ability of DACs to represent tribal preferences in the review of requests for tribal data.
The GDS Policy expects that basic sequence and certain related data made available through NIH-designated data repositories and all conclusions derived from them will be freely available. It discourages patenting of “upstream” discoveries, which are considered pre-competitive, while it encourages the patenting of “downstream” applications appropriate for intellectual property.
NIH expects investigators and their institutions to provide basic plans for following this Policy in the “Genomic Data Sharing Plan” located in the Resource Sharing Plan section of funding applications and proposals. Any resources that may be needed to support a proposed genomic data sharing plan (e.g., preparation of data for submission) should be included in the project’s budget.
Large-scale non-human genomic data, including data from microbes, microbiomes, and model organisms, as well as relevant associated data (e.g., phenotype and exposure data), are to be shared in a timely manner.

Quick post – Outbreaker and the "Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data"

Interesting new paper out: PLOS Computational Biology: Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data.

Full Citation: Jombart T, Cori A, Didelot X, Cauchemez S, Fraser C, et al. (2014) Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data. PLoS Comput Biol 10(1): e1003457. doi:10.1371/journal.pcbi.1003457

Abstract:

Recent years have seen progress in the development of statistically rigorous frameworks to infer outbreak transmission trees (“who infected whom”) from epidemiological and genetic data. Making use of pathogen genome sequences in such analyses remains a challenge, however, with a variety of heuristic approaches having been explored to date. We introduce a statistical method exploiting both pathogen sequences and collection dates to unravel the dynamics of densely sampled outbreaks. Our approach identifies likely transmission events and infers dates of infections, unobserved cases and separate introductions of the disease. It also proves useful for inferring numbers of secondary infections and identifying heterogeneous infectivity and super-spreaders. After testing our approach using simulations, we illustrate the method with the analysis of the beginning of the 2003 Singaporean outbreak of Severe Acute Respiratory Syndrome (SARS), providing new insights into the early stage of this epidemic. Our approach is the first tool for disease outbreak reconstruction from genetic data widely available as free software, the R package outbreaker. It is applicable to various densely sampled epidemics, and improves previous approaches by detecting unobserved and imported cases, as well as allowing multiple introductions of the pathogen. Because of its generality, we believe this method will become a tool of choice for the analysis of densely sampled disease outbreaks, and will form a rigorous framework for subsequent methodological developments.

Check out the nice figure on a SARS outbreak:

Figure 5. Results of the analysis of the SARS data using outbreaker. This figure summarizes the reconstruction of the outbreak, showing putative transmissions (arrows) amongst individuals (rows). Arrows represent ancestries with a least 5% of support in the posterior distributions, while boxes correspond to the posterior distributions of the infection dates. Arrows are annotated by number of mutations and posterior support of the ancestries, and colored by numbers of mutations, with lighter shades of grey for larger genetic distances. The actual sequence collection dates are plotted as plain black dots. Bubbles are used to represent the generation time distribution, with larger disks used for greater infectivity. Shades of blue indicate the degree of certainty for inferring the origin of different cases, as measured by the entropy of ancestries (see methods and equation 12): blue represents conclusive identification of the ancestor of the case (low entropy), while grey shades are uncertain (high entropy).

And then the consensus transmission tree

Figure 6. Consensus transmission tree reconstruction of the SARS outbreak. This figure indicates the most supported transmission tree reconstructed by outbreaker. Cases are represented by spheres colored according to their collection dates. Edges are colored according to the corresponding numbers of mutations, with lighter shades of grey for larger numbers. Edge annotations indicate numbers of mutations and frequencies of the ancestries in the posterior samples.

Outbreaker is available here: http://cran.r-project.org/web/packages/outbreaker/index.html

I also like the 1st line of their Acknowledgements:

We are thankful to Sourceforge (http://sourceforge.net/) and CRAN (http://cran.r-project.org/) for providing great resources for developing and hosting outbreaker.

Definitely worth checking out.

Another genomics meeting featuring men men men and men: International Forum on "Genomics, Innovation and economic growth"

Well this is just peachy. Saw this tweet

The International Forum on Genomics, Innovation & Economic Growth will be held on 25 – 27 Nov 2013 in Mexico City. http://t.co/0M54BLg1IU.
— Human Genome Org (@humangenomeorg) July 29, 2013

And my first thought was – please – please – please let this meeting have a decent gender ratio. I am so so sick of genome meetings that have gender ratio issues. Alas, then I went to their site: International Forum “Genomics, Innovation and economic growth”

11 plenary speakers. All of them men. See here.
Forum president: 1 man
Advisory Board: 5 men

Crap crap crap. What is WRONG WITH PEOPLE?

Nothing else to say really. But I will not be going I guess I can say that.

New paper from some in the Eisen lab: phylogeny driven sequencing of cyanobacteria

Quick post here. This paper came out a few months ago but it was not freely available so I did not write about it (it is in PNAS but was not published with the PNAS Open Option — not my choice – lead author did not choose that option and I was not really in the loop when that choice was made).

Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. [Proc Natl Acad Sci U S A. 2013] – PubMed – NCBI.

Anyway – it is now in Pubmed Central and at least freely available so I felt OK posting about it now. It is in a way a follow up to the “A phylogeny driven genomic encyclopedia of bacteria and archaea” paper (AKA GEBA) from 2009 with this paper a zooming in on the cyanobacteria.

	Editorial Human infectious diseases in the genomics era: where do we go from here? Ripudaman K Bains Genome Biology 2014, 15:529 (22 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Review Genomic analysis of emerging pathogens: methods, application and future trends Lucy M Li, Nicholas C Grassly, Christophe Fraser Genome Biology 2014, 15:541 (22 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research The genome of the sparganosis tapeworm Spirometra erinaceieuropaei isolated from the biopsy of a migrating brain lesion Hayley M Bennett, Hoi Ping Mok, Effrossyni Gkrania-Klotsas, Eleanor J Stanley, Isheng J Tsai, Nagui M Antoun, Avril Coghlan, Bhavana Harsha, Alessandra Traini, Diogo M Ribeiro, Sascha Steinbass, Sebastian B Lucas, Kieren S.J Allinson, Stephen J Price, Thomas S Santarius, Andrew J Carmichael, Peter L Chiodini, Nancy Holroyd, Andrew F Dean, Matthew Berriman Genome Biology 2014, 15:510 (21 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Research highlight Stopping outbreaks with real-time genomic epidemiology Patrick Tang, Jennifer L Gardy Genome Medicine 2014, 6:104 (20 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Software YMAP: a pipeline for visualization of copy number variation and loss of heterozygosity in eukaryotic pathogens Darren Abbey, Jason Funt, Mor N Lurie-Weinberger, Dawn A Thompson, Aviv Regev, Chad L Myers, Judith Berman Genome Medicine 2014, 6:100 (20 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Comment Single cell genomics of bacterial pathogens: outlook for infectious disease research Jeffrey S McLean, Roger S Lasken Genome Medicine 2014, 6:108 (20 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Software SRST2: Rapid genomic surveillance for public health and hospital microbiology labs Michael Inouye, Harriet Dashnow, Lesley-Ann Raven, Mark B Schultz, Bernard J Pope, Takehiro Tomita, Justin Zobel, Kathryn E Holt Genome Medicine 2014, 6:90 (20 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research Genomic epidemiology of a protracted hospital outbreak caused by multidrug-resistant Acinetobacter baumannii in Birmingham, England Mihail R Halachev, Jacqueline Chan, Chrystala I Constantinidou, Nicola Cumley, Craig Bradley, Matthew Smith-Banks, Beryl Oppenheim, Mark J Pallen Genome Medicine 2014, 6:70 (20 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Editorial Next-generation pathogen genomics George M Weinstock, Sharon J Peacock Genome Biology 2014, 15:528 (19 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Software Rapid Core-Genome Alignment and Visualization for Thousands of Intraspecific Microbial Genomes Todd J Treangen, Brian D Ondov, Sergey Koren, Adam M Phillippy Genome Biology 2014, 15:524 (19 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Method BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads Lewis Z Hong, Shuzhen Hong, Han Teng Wong, Pauline PK Aw, Cheng Yan, Andreas Wilm, Paola F de Sessions, Seng Gee Lim, Niranjan Nagarajan, Martin L Hibberd, Stephen R Quake, William F Burkholder Genome Biology 2014, 15:517 (19 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Editorial Microbial sequencing to improve individual and population health Sharon J Peacock, George M Weinstock Genome Medicine 2014, 6:103 (19 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Opinion Genomics and infectious disease: a call to identify the ethical, legal and social implications for public health and clinical practice Gail Geller, Rachel Dvoskin, Chloe L Thio, Priya Duggal, Michelle H Lewis, Theodore C Bailey, Andrea Sutherland, Daniel A Salmon, Jeffrey P Kahn Genome Medicine 2014, 6:106 (18 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Review Epidemiologic data and pathogen genome sequences: a powerful synergy for public health Yonatan H Grad, Marc Lipsitch Genome Biology 2014, 15:538 (18 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Method Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses in clinical and biological samples Christian B Matranga, Kristian G Andersen, Sarah Winnicki, Michele Busby, Adrianne D Gladden, Ryan Tewhey, Matthew Stremlau, Aaron Berlin, Stephen K Gire, Eleina England, Lina M Moses, Tarjei S Mikkelsen, Ikponmwosa Odia, Philomena E Ehiane, Onikepe Folarin, Augustine Goba, S.Humarr Khan, Donald S Grant, Anna Honko, Lisa Hensley, Christian Happi, Robert F Garry, Christine M Malboeuf, Bruce W Birren, Andreas Gnirke, Joshua Z Levin, Pardis C Sabeti Genome Biology 2014, 15:519 (18 November 2014) Abstract \| Provisional PDF \| PubMed\| Editor’s summary
	Research The conjunctival microbiome in health and trachomatous disease: a case control study Yanjiao Zhou, Martin J Holland, Pateh Makalo, Hassan Joof, Chrissy h Roberts, David Maybe, Robin L Bailey, Matthew J Burton, George M Weinstock, Sarah E Burr Genome Medicine 2014, 6:99 (15 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Research Proteomics informed by transcriptomics reveals Hendra virus sensitizes bat cells to TRAIL mediated apoptosis James W Wynne, Brian J Shiell, Glenn A Marsh, Victoria Boyd, Jennifer A Harper, Kate Heesom, Paul Monaghan, Peng Zhou, Jean Payne, Reuben Klein, Shawn Todd, Lawrence Mok, Diane Green, John Bingham, Mary Tachedjian, Michelle L Baker, David Matthews, Lin-Fa Wang Genome Biology 2014, 15:532 (15 November 2014) Abstract \| Provisional PDF \| PubMed\| Editor’s summary
	Method A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens Maha R Farhat, B Shapiro, Samuel K Sheppard, Caroline Colijn, Megan Murray Genome Medicine 2014, 6:101 (15 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Review Engineering the control of mosquito-borne infectious diseases Paolo Gabrieli, Andrea Smidler, Flaminia Catteruccia Genome Biology 2014, 15:535 (15 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research A Genomic and Evolutionary Approach Reveals Non-Genetic Drug Resistance in Malaria Jonathan D Herman, Daniel P Rice, Ulf Ribacke, Jacob Silterra, Amy A Deik, Eli Moss, Kate M Broadbent, Daniel E Neafsey, Michael M Desai, Clary B Clish, Ralph Mazitschek, Dyann F Wirth Genome Biology 2014, 15:511 (14 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Research highlight The road to drug resistance in Mycobacterium tuberculosis Anastasia Koch, Robert Wilkinson Genome Biology 2014, 15:520 (13 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research highlight A CRISPR design for next-generation antimicrobials Chase L Beisel, Ahmed A Gomaa, Rodolphe Barrangou Genome Biology 2014, 15:516 (8 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research Evolution of extensively drug-resistant Mycobacterium tuberculosis from a susceptible ancestor in a single patient Vegard Eldholm, Gunnstein Norheim, Bent von der Lippe, Wibeke Kinander, Ulf R Dahle, Dominique A Caugant, Turid Mannsåker, Anne Mengshoel, Anne Dyrhol-Riise, Francois Balloux Genome Biology 2014, 15:490 (7 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Opinion Translating genomics research into control of tuberculosis: lessons learned and future prospects Digby F Warner, Valerie Mizrahi Genome Biology 2014, 15:514 (7 November 2014) Abstract \| Full text \| PDF \| PubMed\| Editor’s summary
	Comment Empowering African genomics for infectious disease control Onikepe A Folarin, Anise N Happi, Christian T Happi Genome Biology 2014, 15:515 (7 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research highlight Bringing non-human primate research into the post-genomic era: how monkeys are teaching us about elite controllers of HIV/AIDS Eric J Vallender Genome Biology 2014, 15:507 (7 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research Whole genome sequencing of SIV-infected macaques identifies candidate loci that may contribute to host control of virus replication Adam J Ericsen, Gabriel J Starrett, Justin M Greene, Michael Lauck, Muthuswamy Raveendran, David Deiros, Mariel S Mohns, Nicolas Vince, Brian T Cain, Ngoc H Pham, Jason T Weinfurter, Adam L Bailey, Melisa L Budde, Roger W Wiseman, Richard Gibbs, Donna Muzny, T homas C Friedrich, Jeffrey Rogers, David H O’Connor Genome Biology 2014, 15:478 (7 November 2014) Abstract \| Full text \| PDF\| Editor’s summary
	Research Comparative analyses of Legionella species identifies genetic features of strains causing Legionnaires’ Disease Laura Gomez Valero, Christophe Rusniok, Monica Rolando, Mario Neou, Delphine Dervins-Ravault, Jasmin Demirtas, Zoe Rouy, Robert J Moore, Honglei Chen, Nicola K Petty, Sophie Jarraud, Jerome Etienne, Michael Steinert, Klaus Heuner, Simonetta Gribaldo, Claudine Médigue, Gernot Glöckner, Elizabeth L Hartland, Carmen Buchrieser Genome Biology 2014, 15:505 (3 November 2014) Abstract \| Provisional PDF\| Editor’s summary
	Research Staphylococcus aureus gene expression in a rat model of infective endocarditis Frank Hanses, Christelle Roux, Paul M Dunman, Bernd Salzberger, Jean C Lee Genome Medicine 2014, 6:93 (3 November 2014) Abstract \| Full text \| PDF \| PubMed\| Editor’s summary
	Research Gene flow in environmental Legionella pneumophila leads to genetic and pathogenic heterogeneity within a Legionnaires’ disease outbreak Paul McAdam, Charles vander broek, Diane Lindsay, Melissa Ward, Mary Hanson, Michael Gillies, Mike Watson, Joanne Stevens, Giles Edwards, Ross Fitzgerald Genome Biology 2014, 15:504 (3 November 2014) Abstract \| Provisional PDF \| PubMed\| Editor’s summary
	Research Mapping and manipulating the Mycobacterium tuberculosis transcriptome using a transcription factor overexpression-derived regulatory network Tige R Rustad, Kyle J Minch, Shuyi Ma, Jessica K Winkler, Samuel Hobbes, Mark J Hickey, William Brabant, Serdar Turkarslan, Nathan D Price, Nitin S Baliga, David R Sherman Genome Biology 2014, 15:502 (3 November 2014) Abstract \| Provisional PDF \| PubMed\| Editor’s summary

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: