Rarefying your dataset is just the first step. Before calculating diversity metrics, it’s good practice to remove singletons — OTUs that appear only once across the whole dataset. These are usually sequencing noise and can artificially inflate richness estimates.
Here’s how to do it safely using microeco:
Step 1: Clone the original microtable
Always keep your raw data intact. Start by cloning your microtable object:
mt_nosingletons <- clone(mt_rarefied)
Step 2: Extract the OTU table
otu <- mt_nosingletons$otu_table
Step 3: Identify singletons
Count OTUs that appear only once across all samples:
otu_sums <- rowSums(otu)
singletons <- names(otu_sums[otu_sums == 1])
cat("Number of singletons:", length(singletons), "\n")
Step 4: Remove singletons
Keep only OTUs with total abundance >1:
keep_otus <- rownames(otu)[otu_sums > 1]
# Subset OTU table
mt_nosingletons$otu_table <- otu[keep_otus, , drop = FALSE]
# Subset taxonomy table
mt_nosingletons$tax_table <- mt_rarefied$tax_table[keep_otus, , drop = FALSE]
# Subset phylogenetic tree
mt_nosingletons$phylo_tree <- ape::keep.tip(mt_nosingletons$phylo_tree, keep_otus)
# Subset representative sequences if present
if (!is.null(mt_nosingletons$rep_fasta)) {
mt_nosingletons$rep_fasta <- mt_nosingletons$rep_fasta[keep_otus]
}
Step 5: Verify filtering
Check taxa counts before and after:
cat("Taxa before filtering:", nrow(otu), "\n")
cat("Taxa after filtering: ", length(keep_otus), "\n")
cat("Removed:", nrow(otu) - length(keep_otus), "singletons\n")
Step 6: Compare alpha diversity (Observed richness & Shannon)
# Observed richness
obs_before <- colSums(mt_rarefied$otu_table > 0)
obs_after <- colSums(mt_nosingletons$otu_table > 0)
# Shannon diversity
shannon_before <- apply(mt_rarefied$otu_table, 2, vegan::diversity, index = "shannon")
shannon_after <- apply(mt_nosingletons$otu_table, 2, vegan::diversity, index = "shannon")
# Combine into tidy dataframe for plotting
alpha_df <- data.frame(
Sample = names(obs_before),
Observed_before = obs_before,
Observed_after = obs_after,
Shannon_before = shannon_before,
Shannon_after = shannon_after
)
alpha_long <- alpha_df %>%
tidyr::pivot_longer(-Sample, names_to = c("Metric", "Stage"),
names_sep = "_", values_to = "Value")
Step 7: Visualize alpha diversity before vs after
ggplot(alpha_long, aes(x = Stage, y = Value)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.12, size = 1.6, alpha = 0.7) +
facet_wrap(~ Metric, scales = "free_y") +
theme_minimal() +
labs(title = "Alpha diversity: before vs after removing singletons",
x = "", y = "Value")
Step 8: Save the singleton-free dataset (optional)
saveRDS(mt_nosingletons, "path/mt_nosingletons.rds")
✅ Summary:
- Count singletons to see how many “one-off” OTUs exist.
- Remove singletons safely, keeping the microtable intact.
- Check before-and-after taxa counts.
- Compare alpha diversity (Observed richness & Shannon) to see the effect.
- Your filtered dataset is now ready for clean diversity analysis.
