Some notes on "Citations for Sale" about King Abdulaziz University offerring me $$ to become an adjunct faculty

There is a news story in the Daily Cal titled “Citations for Sale” by Megan Messerly about King Abdullah University King Abdulaziz University trying to pay researchers who are highly cited to become adjunct professors there to boost their rankings.  This article stemmed from a blog post by Lior Pachter.  I was interviewed by the Daily Cal reporter about this because I had sent Lior some of the communications I had had with people from KAU where they tried to get me to do this.
I am posting here some of the email discussions / threads that I shared with Lior and Megan.
Thread #1.
Here is one thread of emails in which KAU tried to get me to become an Adjunct Professor.  I have pulled out the text of the emails and removed the senders ID just in case this would get him in trouble.
Received this email 3/6/14

Dear Prof Jonathan, 

How are you? I hope every thing is going well. I hope to share work with you and initiate a collaboration between you and me in biology department, Kind Abdulaziz university . My research focuses on (redacted detail). I hope that you would agree . 

Redacted Name,Ph.D
Assistant Professor, Faculty of Sciences, King Abdulaziz University, Jeddah, KSA.

My response:

What kind of collaboration are you imagining?

His response:

Hi Prof Jonathan, 

Let me to explain that the king abdulaziz university initiated a project which is called Highly Cited Professor ( HiCi). This project can provide a contract between you and our university and from this contract you can get 7000 US$ as a monthly salary .So  this project will allow you to generate two research proposal between you and a research stuff here in order to make an excellent publications in high quality journals as you always do. 

I hope that I was clear. I’ m looking forward to hear from you. Finally, I think that a very good chance to learn from you. 

 Another email from him:

Dear prof Jonathan, 

I’ d like to tell that Prof Inder Verma will come tomorrow to our university  as a highly cited professor and he also signed a contract with us. At March 28 Prof Paul Hebert will come to our university and we actually generated two projects with Prof Paul. I hope you trust me and you call Prof  Inder and  Paul to be sure. 

From me:

I trust you – just am very busy so not sure if this is right for me
Sent from my iPhone

From him:

You will come to our university for just two visits annually and each visit will take only  one week. Take you time to think. Bye

Another email from him

Seat Dr Jonathan, 

What is your decision?

My response:

You have not really provided me with enough information about this.

From him:

Well, you will sign a contract as a highly cited professor between you and KAU. if it happen you will get 7,000 US$ per month for one year as a salary.  From  this project you would be able to generate two proposal with around 200,000 US$ and you will get incentives from each one. In the further we can initiate a mega project with 1.5 million US$.   Is that clear? 

From me:

I could use a formal , legal description of the agreement that one is expected to sign 

From him:

You can ask Prof Dr. Inder Verma he is now in my department and he did two presentation today. Also you can ask my professor prof  Paul Hebert, biodiversity institute of Ontario who will come to my department in March 28,2014.

From him:

if you would agree . Coul you please provide me with your CV with list of publication? 

From him:

Are you agree or no?

From me:


You have not provided me with anywhere near enough info to evaluate this 

Do you have any legal agreement I can look at?

From him:

Agreement from KAU
without providing me with your CV I could not be able to talk to university administration. I told you before ask under verma or Paul Hebert both of them have contract. Dr verma ” editor in chief of PNAS who is left KAS since 4 hours ago. Finally, its up to.

From me:

No thanks
Not interested from what you have told me

Thread #2
Received this email on 12/17/13

Dr. Mansour Almazroui
to jaeisen
Dear Prof. Jonathan Eisen ,

I am Dr. Mansour Almazroui, Highly Cited Program Manager, at King Abdulaziz University (KAU), Jeddah, Saudi Arabia. On behalf of KAU with great pleasure, I would like to invite you to join our innovative collaboration program that is called “International Affiliation program”.

KAU is considered as the largest university in the region serving more than 150,000 students, with around 4,000 faculty members and 30 colleges. For more information please locate us at:

The envisaged program aims to elevate our local research activities in various fields. We only extend our invitation to highly ranked researchers like you, with a solid track record in research and publications to work with KAU professors.

Joining our program will immediately put you on an annual contract, as a Distinguished Adjunct Professor. In this regard, you will only be required to work at KAU premises for three weeks in each year of your contract.

We hope you to accept our invitation and looking forward to welcome you.  Please don’t hesitate to contact me for any further query or clarification.


Dr. Mansour Almazroui
Highly Cited Program Manager,
Office of the Vice President for Graduated Studies and Research,
King Abdulaziz University (KAU).
Director, Center of Excellence for Climate Change Research
King Abdulaziz University
P. O. Box 80234, Jeddah 21589,
Saudi Arabia

I wrote back

I am intrigued but need more information about the three weeks of time at KAU and the details on the contract. 

Jonathan Eisen  

Sent from my iPhone

Got this back

Dear Prof. Jonathan Eisen , 

Hope this email finds you in good health. Thank you for your interest. Please find below the information you requested to be a “Distinguished Adjunct Professor” at KAU. 

1. Joining our program will put you on an annual contract initially for one year but further renewable. However, either party can terminate           its association with one month prior notice.
2. The Salary per month is $ 6000 for the period of contract.
3. You will be required to work at KAU premises for three weeks in each contract year. For this you will be accorded with expected three         visits to KAU.
4. Each visit will be at least for one week long but extendable as suited for research needs.
5. Air tickets entitlement will be in Business-class and stay in Jeddah will be in a five star hotel. The KAU will cover all travel and living             expenses of your visits.
6. You have to collaborate with KAU local researchers to work on KAU funded (up to $100,000.00) projects.
7. It is highly recommended to work with KAU researchers to submit an external funded project by different agencies in Saudi Arabia.
8. May submit an international patent.
9. It is expected to publish some papers in ISI journals with KAU affiliation.
10. You will be required to amend your ISI highly cited affiliation details at the ISI web site to include your employment and         affiliation with KAU.   

Kindly let me know your acceptance so that the official contract may be preceded.

I promtly forwarded this to my brother with a note:

One way to make some extra money … Sell your reputation / ISI index  

Sent from my iPhone

And my brother eventually shared this with Lior  …
UPDATE 1: 12/5/2014

One key question is – what are the rules and guidelines and ehitcs of listing affiliations on papers.  Here are some tidbits on this

From Nature Communications:

The primary affiliation for each author should be the institution where the majority of their work was done.

From Taylor and Francis

The affiliations of all named co-authors should be the affiliation where the research was conducted.


Present the authors’ affiliation addresses (where the actual work was done) below the names.

UPDATE 2: Some other posts of relevance

UPDATE 3: A Storify

Today’s Open Science Reading: the Open Science Reviewer’s Oath

Well this certainly is interesting: The Open Science Peer Review Oath – F1000Research.  This emerged apparently from the AllBio: Open Science & Reproducibility Best Practice Workshop.  The “Oath” is summarized in the following text from a box in their paper:

Box 1. While reviewing this manuscript:

  1. I will sign my review in order to be able to have an open dialogue with you
  2. I will be honest at all times
  3. I will state my limits
  4. I will turn down reviews I am not qualified to provide
  5. I will not unduly delay the review process
  6. I will not scoop research that I had not planned to do before reading the manuscript
  7. I will be constructive in my criticism
  8. I will treat reviews as scientific discourses
  9. I will encourage discussion, and respond to your and/or editors’ questions
  10. I will try to assist in every way I ethically can to provide criticism and praise that is valid, relevant and cognisant of community norms
  11. I will encourage the application of any other open science best practices relevant to my field that would support transparency, reproducibility, re-use and integrity of your research
  12. If your results contradict earlier findings, I will allow them to stand, provided the methodology is sound and you have discussed them in context
  13. I will check that the data, software code and digital object identifiers are correct, and the models presented are archived, referenced, and accessible
  14. I will comment on how well you have achieved transparency, in terms of materials and methodology, data and code access, versioning, algorithms, software parameters and standards, such that your experiments can be repeated independently
  15. I will encourage deposition with long-term unrestricted access to the data that underpin the published concept, towards transparency and re-use
  16. I will encourage central long-term unrestricted access to any software code and support documentation that underpin the published concept, both for reproducibility of results and software availability
  17. I will remind myself to adhere to this oath by providing a clear statement and link to it in each review I write, hence helping to perpetuate good practice to the authors whose work I review.

I note – I reformatted the presentation a tiny bit here.   The Roman numerals in the paper annoyed me.  Regardless of the formatting, this is a pretty long oath.  I think it is probably too long.  Some of this could be reduced.  I am reposting the Oath below with some comments:

  1. I will sign my review in order to be able to have an open dialogue with you.  I think this is OK to have in the oath. 
  2. I will be honest at all times. Seems unnecessary.
  3. I will state my limits. Not sure what this means or how it differs from #4.  I would suggest deleting or merging with #4.
  4. I will turn down reviews I am not qualified to provide.  This is good though not sure how it differs from #3. 
  5. I will not unduly delay the review process. Good. 
  6. I will not scoop research that I had not planned to do before reading the manuscript. Good. 
  7. I will be constructive in my criticism. Good. 
  8. I will treat reviews as scientific discourses.  Not sure what this means or how it is diffeent from #9. 
  9. I will encourage discussion, and respond to your and/or editors’ questions.  Good though not sure how it differs from #8. 
  10. I will try to assist in every way I ethically can to provide criticism and praise that is valid, relevant and cognisant of community norms. OK though this seems to cancel the need for #7. 
  11. I will encourage the application of any other open science best practices relevant to my field that would support transparency, reproducibility, re-use and integrity of your research.  Good.  Seems to cancel the need for #13, #14, #15, #16. 
  12. If your results contradict earlier findings, I will allow them to stand, provided the methodology is sound and you have discussed them in context. OK though I am not sure why this raises to the level of a part of the oath over other things that should be part of a review. 
  13. I will check that the data, software code and digital object identifiers are correct, and the models presented are archived, referenced, and accessible.  Seems to be covered in #11. 
  14. I will comment on how well you have achieved transparency, in terms of materials and methodology, data and code access, versioning, algorithms, software parameters and standards, such that your experiments can be repeated independently. Seems to be covered in #11. 
  15. I will encourage deposition with long-term unrestricted access to the data that underpin the published concept, towards transparency and re-use. Seems to be covered in #11. 
  16. I will encourage central long-term unrestricted access to any software code and support documentation that underpin the published concept, both for reproducibility of results and software availability. Seems to be covered in #11. 
  17. I will remind myself to adhere to this oath by providing a clear statement and link to it in each review I write, hence helping to perpetuate good practice to the authors whose work I review.  Not sure this is needed.

The paper then goes on to provide what they call a manifesto.  I very much prefer the items in the manifesto over those in the oath:

  • Principle 1: I will sign my name to my review – I will write under my own name
  • Principle 2: I will review with integrity
  • Principle 3: I will treat the review as a discourse with you; in particular, I will provide constructive criticism
  • Principle 4: I will be an ambassador for good science practice
  • Principle 5: Support other reviewers

In fact I propose here that the authors considering reversing the Oath and the Manifesto.  What they call the Manifesto shoud be the Oath.  It is short.  And works as an Oath.  The longer, somewhat repetitive list of specific details would work better as the basis for a Manifesto.

Anyway – the paper is worth taking a look at.  I support the push for more consideration of Open Science in review though I am not sure if this Oath is done right at this point.

Talk for UC Davis Pre-Health Meeting (#UCDPHSA): Opening up to Diversity

Sunday I gave a talk at the “12th National UC Davis Pre-Health Student Alliance Pre-Medical and Pre-Health Professions Conference“.  I normally try to not give talks on weekends (to spend time with my family) but I made an exception here since this meeting has a strong commitment to issues relating to diversity in health and STEM fields.  This mission statement for the meeting reads:

The UC Davis Pre-Health Student Alliance’s objective is to introduce and support academic, admission, and preparatory opportunities for all students interested in health professions with a focus on those underrepresented in healthcare (with regard to gender, economic, social, educational, linguistic, cultural, racial, and ethnic background). We target universities, community colleges and high schools throughout the United States. The UC Davis Pre-Health Student Alliance aims to impact health education, increase diversity amongst the healthcare workforce, and inspire future leaders of healthcare through hosting the largest national pre-health professions conference.

It was that mission statement that got me to ditch my wife and kids Sunday AM (and also much of Saturday PM for a dinner and to work on my talk).  I went to a dinner Saturday for some of the speakers with the new Dean of the UC Davis School of Medicine Julie Freischlag.  The dinner had about 20 or so people and I met some quite interesting folks there working on various aspects of human and animal health.

And then Sunday AM I got up early, decided to use slides (was not sure) and finished off the slide set I had worked on the night before.  I decided that, in the spirit of the meeting, I would talk about two main things – diversity and access.  And I planned to tell three stories about my work in this area.  I wove in some personal stories since, at the dinner the night before Barbara Ross-Lee (who I sat next to) helped remind me of the importance of making talks personal.  So in the end I talked about myself, diabetes, diversity of microbes, antibiotics, diversity in STEM, and open science.  I came up with a title I was OK with: Opening up to Diversity.

My talk went well, I think.  I am pretty sure it was vbideotaped but not sure where that recording will end up. I did however post my slides to slideshare.  See below:

Opening up to Diversity talk by @phylogenomics at #UCDPHSA from Jonathan Eisen

And I also recorded the talk using Camtasia (basically, it allows recording of the screen, the video camera on my computer, and the audio).  I posted the recording (without the video feed which shows mostly my neck) to Youtube.  See below:

UPDATE 10/16 –

I have scanned in my notes that I made in planning this talk.  Figured, why not post them.

Update: 12/10/2014 – just discovered a video of the talk was posted to Youtube 

Me: Will survey results be published openly. Them: yes. Me: OK – will do survey. #opensurveys

Got this email:

Dear Jonathan, 

Your peers at the University of California, Riverside, Stanford University, and the Coachella Valley Association of Governments are seeking participation in a survey designed to explore the relevancy and perceptions of basic natural history knowledge and skills among professionals and graduate students in environmental science-related fields.
Professionals and Faculty (including post-doctoral researchers) may access the survey through the following link: 

If you cannot complete the entire survey all at once, you can return to the survey within 30 days to complete it at your convenience. 

This survey will be distributed at universities, organizations and agencies state-wide, and we would greatly appreciate your participation! All responses will be kept completely confidential. The information collected will be used to provide summary statistics and form the basis of a peer-reviewed publication. 

Please feel free to forward this survey link to peers within California that may be interested in participating. Thank you for your assistance! 


Should you have questions about this study please contact us:

Michelle Murphy-Mariscal, M.S. (Center for Conservation Biology, UC Riverside)Cameron W. Barrows, Ph.D. (Center for Conservation Biology, UC Riverside)Rebecca Hernandez, Ph.D. (Stanford University, Carnegie Institution for Science)Kathleen Fleming, M.S. (Coachella Valley Association of Governments)

I wrote back

Thanks for the invite.

Can you tell me more about what will happen to the results from the survey? I only participate in surveys if the data and publications from the survey will be released in an open access manner.

Jonathan Eisen

And then got a very pleasing response:

Hi Jonathan, 

Yes, the results and data of the survey will definitely be published. Attached is a previous, survey-based study and we published the data set in Dryad (an online data repository that you are probably familiar with). The survey that my group and I are doing currently will follow the same format and strict adherence to open access. 

Thank you very much for your participation. 


And then I did the survey and sent this email:. 

Thanks so much for the response and I so pleased with your commitment to openness. I will now gladly participate and share w/ California colleagues.

Victoria Schlesinger in Al Jazeera America on Open Data Pros and Cons

Got interviewed last week by Victoria Schlesinger about open science and open data issues and she has now posted her article: Scientists threatened by demands to share data | Al Jazeera America.  The article includes a discussion primarily about the push for more open release of data (and also a bit about papers) and some of the challenges associated with this push.  There are some good quotes in the article both from Schlesinger’s text and from some key players in the field of data access including:

  • Christopher Lortie:  “There will be fantastic discoveries, and that’s all that really matters,” says Lortie.
  • From Schlesigner (a quote I do not agree with all of but some may like the metaphor): Sharing the results of scientific research is a bit like unveiling a newly built house, and scientists generally want it widely viewed, so the growth in open access publishing is a boon for most. Sharing data, on the other hand, is comparable to handing over the architectural plans and building materials used to construct the house. Others can scrutinize the quality of work and reuse the basic components to build their own house. That raises fears about discovery of errors and theft of future research ideas.
  • Heather Piwowar: “I think the public thinks that we’re all learning from everyone else’s work. That’s not true, and furthermore, it’s not true in ways that are even worse than you might think,” says Piwowar=
  • Me: “People are busy,” says Jonathan Eisen, a genetics professor at the University of California, Davis. “Everyone is overwhelmed with life and email and, in academia, trying to get funding and write papers. Whether something is open or not open is not highest on the priority list. There’s still need for making people aware of open science issues and making it easy for them to participate if they want to.”
  • Titus Brown: “My general attitude about open science is that I’d much rather be relevant. In science, that’s harder than anything else,” says Titus Brown, an assistant professor at Michigan State University who runs a genomics, evolution and development lab and practices open science. “If I make my work available, I have a higher chance of being relevant.” 
  • It has transformed the way we do science across biological scales, from the molecular all the way up to studying whole ecosystems,” says Carl Boettiger, a postdoctoral student at UC Santa Cruz. “The value is in enabling science to progress faster.”
The article is worth a look …

How Open Are You? Part 1: Metrics to Measure Openness and Free Availability of Publications

For many many years I have been raising a key questions in relation to open access publishing – how can we measure how open someone’s publications are.  Ideally we would have a way of measuring this in some sort of index.  A few years ago I looked around and asked around and did not find anything out there of obvious direct relevance to what I wanted so I started mapping out ways to do this.

When Aaron Swartz died I started drafting some ideas on this topic.  Here is what I wrote (in January 2013) but never posted:

With the death of Aaron Swartz on Friday there has been much talk of people posting their articles online (a short term solution) and moving more towards openaccess publishing (a long term solution).  One key component of the move to more openaccess publishing will be assessing people on just how good a job they are doing of sharing their academic work.

I have looked around the interwebs to see if there is some existing metric for this and I could not find one.  So I have decided to develop one – which I call the Swartz Openness Index (SOI).

Let A = # of objects being assessed (could be publications, data sets, software, or all of these together). 

Let B = # of objects that are released to the commons with a broad, open license. 

A simple (and simplistic) metric could be simply 

OI = B / A

This is a decent start but misses out on the degree of openness of different objects. So a more useful metric might be the one below.

A and B as above. 

Let C = # of objects available free of charge but not openly 

OI = ( B + (C/D) ) / A  

where D is the “penalty” for making material in C not openly available

This still seems not detailed enough.  A more detailed approach might be to weight diverse aspects of the openness of the objects.  Consider for example the “Open Access Spectrum.”  This has divided objects (publications in this case) into six categories in terms of potential openness: reader rights, reuse rights, copyrights, author posting rights, automatic posting, and machine readability.  And each of these is given different categories that assess the level of openness.  Seems like a useful parsing in ways.  Alas, since bizarrely the OAS is released under a somewhat restrictive CC BY-NC-ND  license I cannot technically make derivatives of it.  So I will not.  Mostly because I am pissed at PLoS and SPARC for releasing something in this way.  Inane.

But I can make my own openness spectrum.

And then I stopped writing because I was so pissed off at PLOS and SPARC for making something like this and then restricting it’s use.  I had a heated discussion with people from PLOS and SPARC about this but not sure if they updated their policy.  Regardless, the concept of an Openness Index of some kind fell out of my head after this buzzkill.  And it only just now came back to me. (Though I note – I did not find the Draft post I made until AFTER I wrote the rest of this post below … ).

To get some measure of openness in publications maybe a simple metric would be useful.  Something like the following

  • P = # of publications
  • A = # of fully open access papers
  • OI = Openness index

A simple OI would be

  • OI = 100 * A/P
However, one might want to account for relative levels of openness in this metric.  For example
  • AR = # of papers with a open but somewhat restricted license
  • F = # of papers that are freely available but not with an open license
  • C = some measure of how cheap the non freely available papers are
And so on.
Given that I am not into library science myself and not really familiar with playing around with this type of data I thought a much simpler metric would be to just go to Pubmed (which of course works only for publications in the arenas covered by Pubmed).
From Pubmed one can pull out some simple data. 
  • # of publications (for a person or Institution)
  • # of those publications in PubMed Central (a measure of free availability)
Thus one could easily measure the “Pubmed Central” index as
PMCI = 100 * (# publications in PMC / # of publications in Pubmed)
Some examples of the PMCI for various authors including some bigger names in my field, and some people I have worked with.
            Name                        #s                 PMCI    
Eisen JA
Eisen MB 
Collins FS
Lander ES
Lipman DJ
Nussinov R
Mardis E
Colwell RR
Varmus H
Brown PO
Darling AE
Coop G
Salzberg SL
Venter JC
Ward NL
Fraser CM
Quackenbush J
Ghedin E
Langille MG

And so on.  Obviously this is of limited value / accuracy in many ways.  Many papers are freely available but not in Pubmed Central.  Many papers are not covered by Pubmed or Pubmed Central.  Times change, so some measure of recent publications might be better than measuring all publications.  Author identification is challenging (until systems like ORCID get more use).  And so on.

Another thing one can do with Pubmed is to identify papers with free full text available somewhere (not just in PMC).  This can be useful for cases where material is not put into PMC for some reason.  And then with a similar search one can narrow this to just the last five years.  As openaccess has become more common maybe some people have shifted to it more and more over time (I have — so this search should give me a better index).

Lets call the % of publications with free full text somewhere the “Free Index” or FI.  Here are the values for the same authors.

5 years
FI – 5 
Eisen JA
Eisen MB 
83 79.8
Collins FS
263 50.5
Lander ES
200 53.1
Lipman DJ
59 80.8
Mardis E
135 72.2
Colwell RR
258 59.3
Varmus H
206 50.5
Brown PO
185 79.0
Darling AE
21 77.8
Coop G
28 71.8
Salzberg SL
128 79.0
Venter JC
85 35.9
Ward NL
30 51.7
Fraser CM
109 41.6
Quackenbush J
131 58.2
Ghedin E
56 68.3
Langille MG
11 78.6

Very happy to see that I score very well for the last five years. 180 papers in Pubmed.  178 of them with free full text somewhere that Pubmed recognizes. The large number of publications comes mostly from genome reports in the open access journals Standards in Genomic Sciences and Genome Announcements.  But most of my non genome report papers are also freely available.

I think in general it would be very useful to have measures of the degree of openness.  And such metrics should take into account sharing of other material like data, methods, etc.  In a way this could be a form of the altmetric calculations going on.

But before going any further I decided to look again into what has been done in this area. When I first thought of doing this a few years ago I searched and asked around and did not see much of anything.  (Although I do remember someone out there – maybe Carl Bergstrom – saying there were some metrics that might be relevant – but can’t figure out who / what this information in the back of my head is).

So I decided to do some searching anew.  And lo and behold there was something directly relevant. There is a paper in the Journal of Librarianship and Scholarly Communication called: The Accessibility Quotient: A New Measure of Open Access.  By Mathew A. Willmott, Katharine H. Dunn, and Ellen Finnie Duranceau from MIT.

Full Citation: Willmott, MA, Dunn, KH, Duranceau, EF. (2012). The Accessibility Quotient: A New Measure of Open Access. Journal of Librarianship and Scholarly Communication 1(1):eP1025.
Here is the abstract:

INTRODUCTION The Accessibility Quotient (AQ), a new measure for assisting authors and librarians in assessing and characterizing the degree of accessibility for a group of papers, is proposed and described. The AQ offers a concise measure that assesses the accessibility of peer-reviewed research produced by an individual or group, by incorporating data on open availability to readers worldwide, the degree of financial barrier to access, and journal quality. The paper reports on the context for developing this measure, how the AQ is calculated, how it can be used in faculty outreach, and why it is a useful lens to use in assessing progress towards more open access to research.
METHODS Journal articles published in 2009 and 2010 by faculty members from one department in each of MIT’s five schools were examined. The AQ was calculated using economist Ted Bergstrom’s Relative Price Index to assess affordability and quality, and data from SHERPA/RoMEO to assess the right to share the peer-reviewed version of an article.
RESULTS The results show that 2009 and 2010 publications by the Media Lab and Physics have the potential to be more open than those of Sloan (Management), Mechanical Engineering, and Linguistics & Philosophy.
DISCUSSION Appropriate interpretation and applications of the AQ are discussed and some limitations of the measure are examined, with suggestions for future studies which may improve the accuracy and relevance of the AQ.
CONCLUSION The AQ offers a concise assessment of accessibility for authors, departments, disciplines, or universities who wish to characterize or understand the degree of access to their research output, capturing additional dimensions of accessibility that matter to faculty.

I completely love it.  After all. it is directly related to what I have been thinking about and, well, they actually did some systematic analysis of their metrics.  I hope more things like this come out and are readily available for anyone to calculate.  Just how open someone is could be yet another metric used to evaluate them …

And then I did a little more searching and found the following which also seem directly relevant

So – it is good to see various people working on such metrics.  And I hope there are more and more.
Anyway – I know this is a bit incomplete but I simply do not have time right now to turn this into a full study or paper and I wanted to get these ideas out there.  I hope someone finds them useful …

10 things you can do to REALLY support #OpenAccess #PDFTribute

I wrote a post earlier today in relation to the #PDFTribute movement: Ten simple ways to share PDFs of your papers #PDFtribute.  I wrote it largely to give people an outlet and information and ideas about how to better share PDFs of their academic work.  I think the more people share the better.

However, I also got shit from my brother Michael – co founder of PLoS on Twitter about how this is partly a “feel good” action.  I do think he underestimates the surge of anger over the death of Aaron Swartz and the momentum right now in the semi-civil disobedience being seen in the #PDFTribute movement.  But I also think he is right in part. So, I thought I would follow up with suggestions for what people should do in the future to really support full and open access to the academic literature.

  1. Only publish in fully open access journals.  See DOAJ — Directory of Open Access Journals.
  2. Do not do ANY work for non open access journals. That includes reviewing, suggesting reviewers, etc. 
  3. Cancel all subscriptions to closed access journals. The subscription model is part of the problem. 
  4. Work for open access journals. 
  5. Embrace openness in other aspects of your academic work. See for example Open science – Wikipedia, the free encyclopedia and Open Humanities Alliance
  6. Learn the difference between “open” and “freely available.” See Peter Suber, Open Access Overview (definition, introduction) and Open Access | PLOS
  7. Reward people in job hiring, merits and promotions for their level of openness.  Do not reward them for closed activities.
  8. Lobby for more open access requirements at the Federal, State, and Institutional level.  Make sure they are not mealy mouthed or mediocre. See What the UC “open access” policy should say for example.
  9. Embrace other changes in scientific publishing such as post-publication review that enable more rapid sharing of publications (see The Glacial Pace of Change in Scientific Publishing). 
  10. Read up on what else you can do (e.g., Peter Suber, What you can do to promote open access) and come up with your own ideas.  Oh and share them.  Openly.

Related posts from The Tree of Life

Other ideas? Please post in comments.

A blast from the past: Plasmodium, plastids, phylogeny, and reproducibility

A few days ago I got an email from a colleague who I had not seen in many years.  It was from Malcolm Gardner who worked at TIGR when I was there and is now at Seattle Biomed.

His email was related to the 2002 publication of the complete genome sequence of Plasmodium falciparum the causative agent of most human malaria cases –  for which he was the lead author.   Someone had emailed Malcolm asking if he could provide details about the settings used in the blast searches that were part of the evolutionary analyses of the paper.   The paper is freely available at Nature – at least for now – every once in a while the Nature Publishing Group seems to put it behind a paywall despite their promises not to.

Malcolm was contacting me because I had run / coordinated much of the evolutionary analysis reported in that paper.  I note – as one of the only evolution focused people at TIGR it was pretty common for people to come to me and ask if I could help them with their genome.  I pretty much always said yes since, well, I loved doing that kind of thing and it was really exciting in the early days of genome sequencing to be the first person to ask some evolution related question about the data.

Malcolm included the email he had received (which did not have a lot of detail) and he and I wrote back and forth trying to figure out exactly what this person wanted.  And then I said, well, maybe the person should get in touch with me directly so I can figure out what they really want/need.  It seemed unusual that someone was asking about something like that from a 10 year old paper, but, whatever.  

As I was communicating with this person, I started digging through my files and my brain trying to remember exactly what had been done for this paper more than 10 years ago.  I remember Malcolm and others from the Plasmodium community organizing some “jamborees” looking at the annotation of the genome. At one of those jamborees I met with some of the folks from the Sanger Center (which was one of the big players in the P. falciparum genome sequencing) with Malcolm and – after some discussion I ended up doing three main things relating to the paper, which I describe below.

Thing 1: Conserved eukaryote genes

One of my analyses was to use the genome to look for genes conserved in eukaryotes but not present in bacteria or archaea.  I did this to try and find genes that could be considered likely to have been invented on the evolutionary branch leading up to the common ancestor of eukaryotes.

As an aside, at about the same time I was asked to write a News and Views for Nature about the publication of the Schizosaccharomyces pombe genome.  In the N&V I had written “Genome sequencing: Brouhaha over the other yeast” I noted how the authors had used the genome to do some interesting analysis of conserved eukaryotic genes.  With the help of the Nature staff I had also made a figure which demonstrated (sort of) what they were trying to do in their analysis – which was to find genes that originated on the branch leading up to the common ancestor of the eukaryotes for which genomes were available at the time.  As another aside – the S. pombe genome paper and my News and Views article are freely available …

Figure 1: The tree of life, with the branches labelled according to Wood et al.’s analysis of genes that might be specific to eukaryotes versus prokaryotes, and to multicellular versus single-celled organisms. Bacteria and archaea are prokaryotes (they do not have nuclei). From Nature 415, 845-848 (21 February 2002) | doi:10.1038/nature725. The eukaryotic portion of the tree is based on Baldauf et al. 2000

Anyway, I did a similar analysis to what was in the S. pombe genome paper and I found a reasonable number and helped write a section for the paper on this.

Comparative genome analysis with other eukaryotes for which the complete genome is available (excluding the parasite E. cuniculi) revealed that, in terms of overall genome content, P. falciparum is slightly more similar to Arabidopsis thaliana than to other taxa. Although this is consistent with phylogenetic studies (64), it could also be due to the presence in the P. falciparum nuclear genome of genes derived from plastids or from the nuclear genome of the secondary endosymbiont. Thus the apparent affinity of Plasmodium and Arabidopsis might not reflect the true phylogenetic history of the P. falciparum lineage. Comparative genomic analysis was also used to identify genes apparently duplicated in the P. falciparum lineage since it split from the lineages represented by the other completed genomes (Supplementary Table B). 

There are 237 P. falciparum proteins with strong matches to proteins in all completed eukaryotic genomes but no matches to proteins, even at low stringency, in any complete prokaryotic proteome (Supplementary Table C). These proteins help to define the differences between eukaryotes and prokaryotes. Proteins in this list include those with roles in cytoskeleton construction and maintenance, chromatin packaging and modification, cell cycle regulation, intracellular signalling, transcription, translation, replication, and many proteins of unknown function. This list overlaps with, but is somewhat larger than, the list generated by an analysis of the S. pombe genome (65). The differences are probably due in part to the different stringencies used to identify the presence or absence of homologues in the two studies.

The list of genes is available as supplemental material on the Nature web site.  Alas it is in MS Word format which is not the most useful thing.  But more on that issue at the end of this post.

Thing 2. Searching for lineage specific duplications

Another aspect of comparative genomic analysis that I used to do for most genomes at TIGR was to look for lineage specific duplications (i.e., genes that have undergone duplications in the lineage of the species being studied to the exclusion of the lineages for which other genomes are available).  The quick and dirty way we used to do this was to simply look for genes that had a better blast match to another gene from their own genome than to genes in any other genome.  The list of genes we identified this way is also provided as a Word document in Supplemental materials.

Thing 3: Searching for organelle derived genes in the nuclear genome of P. falciparum

The third thing I did for the paper was to search for organelle derived genes in the nuclear genome of Plasmodium.  Specifically I was looking for genes derived from the mitochondrial genome and plastid genome.  For those who do not know, Plasmodium is a member of the Apicomplexa – all organisms in this group have an unusual organelle called the Apicoplast.  Though the exact nature of this organelle had been debated, it’s evolutionary origins were determined by none other than Malcolm Gardner many years earlier (Gardner et al. 1994). They had shown that this organelle was in fact derived from chloroplasts (which themselves are derived from cyanoabcteria).  I am shamed to say that before hanging out with Malcolm and talking about Plasmodium I did not know this.  This finding of a chloroplast in an evolutionary group of eukaryotes that are not particularly closely related to plants is one of the key pieces of evidence in the “secondary endosymbiosis” hypothesis which proposes that some eukaryotes have brought into themselves as an endosymbiont a single-celled photosynthetic algae which had a chloroplast.  
Anyway – here we were – with the first full genome of a member of the Apicomplexans group.  And we could use it to discover some new details on plastid evolution and secondary endosymbioses.  So I adapted some methods I had used in analyzing the Arabidopsis genome (see Lin et al. 1999 and AGI 2000), and searched for plastid derived genes in the nuclear genome of Plasmodium.  Why look in the nuclear genome for plastid genes?  Or mitochondrial genes for that matter.  Well, it turns out that genes that were once in the organelle genomes frequently move to the nuclear genome of their “host”.  In fact, a lot of genes move.  So – if you want to study the evolution of an organism’s organelles, it is sometimes more fruitful to look in the nuclear genome than in the actual organelle’s genome.  OK – now back to the Plasmodium genome.  What I was doing was trying to find genes in the nuclear that had once been in the plastid genome.  How would you look for these?  
To find mitochondrial-derived genes I did blast searches against the same database of genomes used to study the evolution of eukaryotes but for this I looked for genes in Plasmodium that has decent matches to genes in alpha proteobacteria.  And for those I then build phylogenetic trees of each gene and its homologs, then screened through all the trees to look for any in which the gene from Plasmodium grouped in a tree inside a clade with sequences from alpha proteobacteria (and allowed for mitochondrial genes from other eukaryotes to be in this clade).  
To find plastid derived genes I did a similar screen except instead searched for genes that grouped in evolutionary trees with genes from cyanobacteria (or eukaryotic genes that were from plastids).  The section of the paper that I helped write is below:

A large number of nuclear-encoded genes in most eukaryotic species trace their evolutionary origins to genes from organelles that have been transferred to the nucleus during the course of eukaryotic evolution. Similarity searches against other complete genomes were used to identify P. falciparum nuclear-encoded genes that may be derived from organellar genomes. Because similarity searches are not an ideal method for inferring evolutionary relatedness (66), phylogenetic analysis was used to gain a more accurate picture of the evolutionary history of these genes. Out of 200 candidates examined, 60 genes were identified as being of probable mitochondrial origin. The proteins encoded by these genes include many with known or expected mitochondrial functions (for example, the tricarboxylic acid (TCA) cycle, protein translation, oxidative damage protection, the synthesis of haem, ubiquinone and pyrimidines), as well as proteins of unknown function. Out of 300 candidates examined, 30 were identified as being of probable plastid origin, including genes with predicted roles in transcription and translation, protein cleavage and degradation, the synthesis of isoprenoids and fatty acids, and those encoding four subunits of the pyruvate dehydrogenase complex. The origin of many candidate organelle-derived genes could not be conclusively determined, in part due to the problems inherent in analysing genes of very high (A + T) content. Nevertheless, it appears likely that the total number of plastid-derived genes in P. falciparum will be significantly lower than that in the plant A. thaliana (estimated to be over 1,000). Phylogenetic analysis reveals that, as with the A. thaliana plastid, many of the genes predicted to be targeted to the apicoplast are apparently not of plastid origin. Of 333 putative apicoplast-targeted genes for which trees were constructed, only 26 could be assigned a probable plastid origin. In contrast, 35 were assigned a probable mitochondrial origin and another 85 might be of mitochondrial origin but are probably not of plastid origin (they group with eukaryotes that have not had plastids in their history, such as humans and fungi, but the relationship to mitochondrial ancestors is not clear). The apparent non-plastid origin of these genes could either be due to inaccuracies in the targeting predictions or to the co-option of genes derived from the mitochondria or the nucleus to function in the plastid, as has been shown to occur in some plant species (67).

Thing 4: Analysis of DNA repair genes 

Arnab Pain from the Sanger Center and I analyzed genes predicted to be involved in DNA repair and recombination processes and wrote a section for the paper:

DNA repair processes are involved in maintenance of genomic integrity in response to DNA damaging agents such as irradiation, chemicals and oxygen radicals, as well as errors in DNA metabolism such as misincorporation during DNA replication. The P. falciparum genome encodes at least some components of the major DNA repair processes that have been found in other eukaryotes (111, 112). The core of eukaryotic nucleotide excision repair is present (XPB/Rad25, XPG/Rad2, XPF/Rad1, XPD/Rad3, ERCC1) although some highly conserved proteins with more accessory roles could not be found (for example, XPA/Rad4, XPC). The same is true for homologous recombinational repair with core proteins such as MRE11, DMC1, Rad50 and Rad51 present but accessory proteins such as NBS1 and XRS2 not yet found. These accessory proteins tend to be poorly conserved and have not been found outside of animals or yeast, respectively, and thus may be either absent or difficult to identify in P. falciparum. However, it is interesting that Archaea possess many of the core proteins but not the accessory proteins for these repair processes, suggesting that many of the accessory eukaryotic repair proteins evolved after P. falciparum diverged from other eukaryotes. 

The presence of MutL and MutS homologues including possible orthologues of MSH2, MSH6, MLH1 and PMS1 suggests that P. falciparum can perform post-replication mismatch repair. Orthologues of MSH4 and MSH5, which are involved in meiotic crossing over in other eukaryotes, are apparently absent in P. falciparum. The repair of at least some damaged bases may be performed by the combined action of the four base excision repair glycosylase homologues and one of the apurinic/apyrimidinic (AP) endonucleases (homologues of Xth and Nfo are present). Experimental evidence suggests that this is done by the long-patch pathway (113). 

The presence of a class II photolyase homologue is intriguing, because it is not clear whether P. falciparum is exposed to significant amounts of ultraviolet irradiation during its life cycle. It is possible that this protein functions as a blue-light receptor instead of a photolyase, as do members of this gene family in some organisms such as humans. Perhaps most interesting is the apparent absence of homologues of any of the genes encoding enzymes known to be involved in non-homologous end joining (NHEJ) in eukaryotes (for example, Ku70, Ku86, Ligase IV and XRCC1)(112). NHEJ is involved in the repair of double strand breaks induced by irradiation and chemicals in other eukaryotes (such as yeast and humans), and is also involved in a few cellular processes that create double strand breaks (for example, VDJ recombination in the immune system in humans). The role of NHEJ in repairing radiation-induced double strand breaks varies between species (114). For example, in humans, cells with defects in NHEJ are highly sensitive to -irradiation while yeast mutants are not. Double strand breaks in yeast are repaired primarily by homologous recombination. As NHEJ is involved in regulating telomere stability in other organisms, its apparent absence in P. falciparum may explain some of the unusual properties of the telomeres in this species (115).

Back to the story
Anyway … back to the story.  I do not have current access to all of TIGR’s old computer systems which is where my searches for the genome paper reside.  But I figured I might have some notes somewhere on my computer about what blast parameters I used for these searches.  And amazingly I did.  As I was getting ready to write back to Malcolm and to the person who has asked for the information I decided to double check to see what was in the paper.  And amazingly, much of the detail was right there all along.   

Plasmodium falciparum proteins were searched against a database of proteins from all complete genomes as well as from a set of organelle, plasmid and viral genomes. Putative recently duplicated genes were identified as those encoding proteins with better BLASTP matches (based on E value with a 10-15 cutoff) to other proteins in P. falciparum than to proteins in any other species. Proteins of possible organellar descent were identified as those for which one of the top six prokaryotic matches (based on E value) was to either a protein encoded by an organelle genome or by a species related to the organelle ancestors (members of the Rickettsia subgroup of the -Proteobacteria or cyanobacteria). Because BLAST matches are not an ideal method of inferring evolutionary history, phylogenetic analysis was conducted for all these proteins. For phylogenetic analysis, all homologues of each protein were identified by BLASTP searches of complete genomes and of a non-redundant protein database. Sequences were aligned using CLUSTALW, and phylogenetic trees were inferred using the neighbour-joining algorithms of CLUSTALW and PHYLIP. For comparative analysis of eukaryotes, the proteomes of all eukaryotes for which complete genomes are available (except the highly reduced E. cuniculi) were searched against each other. The proportion of proteins in each eukaryotic species that had a BLASTP match in each of the other eukaryotic species was determined, and used to infer a ‘whole-genome tree’ using the neighbour-joining algorithm. Possible eukaryotic conserved and specific proteins were identified as those with matches to all the complete eukaryotic genomes (10-30 E-value cutoff) but without matches to any complete prokaryotic genome (10-15 cutoff).

Alas, I cannot for the life of me find what other parameters I used for the blastp searches.  I am 99.9999% sure I used default settings but alas, I don’t know what default settings for blast were in that era.  And I am not even sure which version of blastp was installed on the TIGR computer systems then.  I certainly need to do a better job of making sure everything I do is truly reproducible.


This all brings me to the actual real part of this story.  Reproducibility.  It is a big deal.  Anyone should be able to reproduce what was done in a study.  And alas, it is difficult to do that when not all the methods are fully described.  And one should also provide intermediate results so that people to do not have to redo everything you did in a study but can just reproduce part of it.   It would be good to have, for example, released all the phylogenetic trees from the analysis of organellar genes in Plasmodium.  Alas, I do not seem to have all of these files as they were stored in a directory at TIGR dedicated to this genome project and as I am no longer at TIGR I do not have ready access to that material.  It is probably still lounging around somewhere on the JCVI computer systems (TIGR alas, no longer officially exists … it was swallowed by the J. Craig Venter Institute …).  But I will keep digging and I will post them to some place like FigShare if/when I find them.

Perhaps more importantly, I will be working with my lab to make sure that in the future we store/record/make available EVERYTHING that would allow people to reproduce, re-analyze, re-jigger, re-whatever anything from our papers.

The key lesson – plan in advance for how you are going to share results, methods, data, etc …

Three talks, 1.5 days at #ISMB … phylogeny, phylogenomics, open science and more

Gave three talks in 1.5 days here in Long Beach as part of the satellite meetings associated with the “Intelligent Systems for Molecular Biology” (ISMB) 2012 Conference. I will write more about the meeting and the craziness of giving three very different talks in 1.5 days. But for now I wanted to at least get my talks posted here since I posted the slides to slideshare and recorded the audio in synch with the slides and posted these “slideshows” to YouTube. Here are the talks below:

Talk 1 for the “Bioinformatics Open Source Conference” BOSC2012.  Was asked to talk about Open Science … so … I did …

Slideshow with audio:

Talk 2 for the Student Council Symposium SCS2012. Sort of supposed to be a career guidance discussion so I geared my talk on the lines of “lessons learned” …

Slideshow with audio:

Talk 3 for the “Automated function prediction” AFP2012 satellite meeting.  I decided to talk about phylogenetic and phylogenomics approaches to functional prediction …

Slideshow with audio:[<a href=”” target=”_blank”>View the story “Jonathan Eisen @phylogenomics talks at #ISMB Satellite Meetings” on Storify</a>]

If the International Whaling Commission really wanted to improve cetacean-science it could require openness rather than allowing whaling

There is a bit of a kerfuffle going on over South Korea announcing plans to increase whaling for “scientific reasons.”

See for example: Grist: South Korea may start hunting whales again, for ‘science’ and CNN: South Korea says it may resume whaling, angering environment groups‘ and WSJ South Korea Whaling Plan Sparks Outcry

This seems to me to be pretty cut and dry.  The Korean’s do not seem to be truly interested in the science here.  And I note – nor does the “International Whaling Commission”: Commission information.  If they really wanted to expand the scientific study of whales they would do things like foster sharing of samples, collaboration across groups, open access to data and resources, and such.  But as far as I can tell they do no such things.

The whole operation here smells fishy – or whaly.  Sounds like this is pretty much all about hunting and making money and giving in to pressure to find something other than people to blame for mismanagement of fish stocks.