Advice needed from a future reviewer…

I found myself writing this email to some collaborators, but halfway through realized that it’d be nice to get EVERYBODY’s input. Probably, one of you is going to review my next paper, so how awesome would it be for you to just tell me what you think now, and make both of our lives easier later.

To test whether taxa vary significantly across groups of samples, we first need to filter the OTU table to get rid of OTUs that are not present in most of the samples and/or that do not vary across samples. This must happen for statistical reasons.

As far as I know, there are two ways to do this. One, is to remove OTUs that occur in fewer than 25% of the samples (25% is suggested by the QIIME folks). The other is to calculate the variance of the OTUs across samples and remove the OTUs that have a variance less than 0.00001 (0.00001 is an arbitrary number thrown out there by the phyloseq developer.)

A third option would be to apply both criteria.

My inclination would be to go with the third option, but mostly because I want to limit as much as possible the number of hypothesis tests that we do in order to avoid draconian p-value corrections.

I’m not a big fan of arbitrary thresholds, but they are so frequently required that I’ve made my peace with them. However, if someone can suggest a non-arbitrary threshold, that’d be great.

But mostly, I want to make sure that everyone agrees now on the method that we use so that I only have to do this once. Thoughts?

One thought on “Advice needed from a future reviewer…

  1. Have you looked at RNAseq analysis methods for differential gene expression analysis? Adapting edgeR (as an example) to shotgun metagenomics datasets is actually pretty interesting, and scalable. Picking an arbitrary Q1 / Q4 quartile cutoff seems sort of icky – especially where you want to know which OTUs are (“statistically speaking” lol) over/under represented in some data sets, but not others. Do you have biological replicate samples or technical replicates at all? Or are you mainly looking at collections of single samples?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s