Horizontal gene transfer into humans? I am not convinced. Full text of my comments to reporters here

Some news stories about a new paper claiming evidence for horizontal gene transfer into humans and other chordates. I got asked by many reporters about it and some used some of my email comments in their articles.

See for example

 Here is the full text of my responses:

“got asked by another reporter to comment on this

so – have seen the paper 

it is interesting .. but I am not overwhelmed by what they present in the paper itself. For example, the HAS story seems really incomplete as presented (e.g., the Figure showing the tree does not have all the HAS1, HAS2, HAS3 genes even though they imply they studied that). “

I have been looking through the supplemental information. I find it impossible to judge the quality of this paper without being able to see the alignments they used for each phylogenetic tree. I cannot find alignments for the trees even after going to their Figshare site with the trees. I therefore think there is not much to say about the paper until being able to see those. 

Without seeing the alignments I offer multiple alternative hypothesis for their findings

  1. They have identified genes for which they are unable to produce reasonable alingnments. Alignments are central to phylogenetic analysis and if their alignments are poor quality then the trees will show all sorts of anomalies that have nothing to do with phylogenetic history. By scanning through 1000s of genes and flagging those with unusual patterns they may be selectively identifying genes for which producing good alignments between species is tough. I note – clustalw is a bit notorious for not producing idea alignments in some cases.
  2. I do not buy their arguments for why gene loss is not a possible explanation. They need to present more detail on how many gene losses would be required for each gene family under consideration and then present some evidence for why that # of gene losses is less likely than HGT.
  3. They have not even considered as far as I can tell, the possibility of divergent evolution (as opposed to gene loss) in many taxa which could lead to them being unable to identify homologs in some species
  4. I am not convinced by the arguments against long branch attraction as an explanation for some of the tree patterns.
  5. Related to alignments they need to show which regions of alignments they excluded from phylo
  6. Convergent evolution could also explain some of the patterns they observe.
  7. I could go on. I am NOT saying that HGT into chordates is impossible. It seems plausible. But it is up to them to exclude other MORE plausible alternatives and I just do not think they have done that.

Reporter: asking if it was OK to quote me

Yes it is OK to quote from me. I would like to reiterate – I am not saying they are wrong. Just that I would like to see (1) all the data (e.g., alignments) that unreels their conclusions and (2) them do more to exclude other possibilities.

Reporter asking what other analyses could they do

So – I don’t want to be difficult, but it is their job to figure out how to do such tests before claiming they have strong evidence for HGT. 

In general, this is pretty typical of claims of HGT. Many researchers show evidence that is consistent with the occurence of HGT (which they did here) but few actually explicitly test alternative hypotheses such as gene loss, bad alignments, convergence, divergence, contamination, random noise, and more. I think their work is certainly interesting, but they just have not tested all of these alternatives. And I personally have grown a bit tired of pointing out how people can do better controls for their papers.

Reporter asking about initial impressions:

I see little here that is particularly convincing evidence for HGT …

My follow up email

Note – I am not saying that this is a bad paper — just that I am not overwhelmed by their evidence and especially by what they put in the paper. 

For example, the HAS1 gene story seems incomplete.  Figure 3 seems to show just HAS1 but in the text the say they show the same thing for all HAS genes.  And the tree they show shows a tiny subset of all the available sequences (e.g., HAS1 HAS2 HAS3 and fungal and bacterial homologs).  They claim that they now have proof that HAS1 was transferred near the base of chordates but I just don’t see how they tested alternative hypotheses …

Some related links:

Also here are some presentations from many years ago with some discussion of HGT

Quick post – new paper of interest on "The Infinitely Many Genes Model …"

This paper seems of potential interest: The Infinitely Many Genes Model for the Distributed Genome of Bacteria by Franz Baumdicker, Wolfgang R. Hess, and Peter Pfaffelhuber


The distributed genome hypothesis states that the gene pool of a bacterial taxon is much more complex than that found in a single individual genome. However, the possible fitness advantage, why such genomic diversity is maintained, whether this variation is largely adaptive or neutral, and why these distinct individuals can coexist, remains poorly understood. Here, we present the infinitely many genes (IMG) model, which is a quantitative, evolutionary model for the distributed genome. It is based on a genealogy of individual genomes and the possibility of gene gain (from an unbounded reservoir of novel genes, e.g., by horizontal gene transfer from distant taxa) and gene loss, for example, by pseudogenization and deletion of genes, during reproduction. By implementing these mechanisms, the IMG model differs from existing concepts for the distributed genome, which cannot differentiate between neutral evolution and adaptation as drivers of the observed genomic diversity. Using the IMG model, we tested whether the distributed genome of 22 full genomes of picocyanobacteria (Prochlorococcus and Synechococcus) shows signs of adaptation or neutrality. We calculated the effective population size of Prochlorococcus at 1.01 × 1011 and predicted 18 distinct clades for this population, only six of which have been isolated and cultured thus far. We predicted that the Prochlorococcus pangenome contains 57,792 genes and found that the evolution of the distributed genome of Prochlorococcus was possibly neutral, whereas that of Synechococcus and the combined sample shows a clear deviation from neutrality.

Wish they had gone beyond these two cyanobacteria … but still seems of possible interest. ResearchBlogging.org Baumdicker, F., Hess, W., & Pfaffelhuber, P. (2012). The Infinitely Many Genes Model for the Distributed Genome of Bacteria Genome Biology and Evolution, 4 (4), 443-456 DOI: 10.1093/gbe/evs016

Some things to read in light of reported human DNA in bacterial genomes vs. contamination

Well, there is an interesting few papers out there relating to human DNA and whether or not there have been some recent lateral transfers of it into microbial genomes.  See for example

  • this paper in mBio that suggests there has been lateral transfer of LINE elements from humans to Neisseria species
  • but then see this paper suggesting massive contamination of sequence databases with LINE elements (PLoS One paper on contamination)
So what is going on?  Not clear.  If you want more detail about these papers I suggest reading one of the following
There were other stories out there … but since Hannah and Ed interviewed me, I am a bit biased about which ones are worth reading.  Here are some others to read though
Personally, I am a bit skeptical of the LGT claim because most of the evidence they present relies on amplification (ie PCR).  But without getting into too many of the details myself I thought I would just post some background reading connected to some of my past work in this area for anyone interested in this type of thing
Information about claim of HGT into humans from bacteria that was in the Lander et al Human Genome paper:
A short story I wrote in 1998 about, well, contamination in genome databases
My colleagues assembling of nearly complete bacterial genomes from the raw sequence reads from fly genome projects
Complete mitochondrial genome(s) found in Chromosome II of Arabidopsis.  Was very difficult to sort out which reads came from nuclear genome and which from mitochondria