Step-by-Step Guide: Removing Singletons in a microtable (microeco)

Rarefying your dataset is just the first step. Before calculating diversity metrics, it’s good practice to remove singletons — OTUs that appear only once across the whole dataset. These are usually sequencing noise and can artificially inflate richness estimates.

Here’s how to do it safely using microeco:


Step 1: Clone the original microtable

Always keep your raw data intact. Start by cloning your microtable object:

mt_nosingletons <- clone(mt_rarefied)

Step 2: Extract the OTU table

otu <- mt_nosingletons$otu_table

Step 3: Identify singletons

Count OTUs that appear only once across all samples:

otu_sums <- rowSums(otu)
singletons <- names(otu_sums[otu_sums == 1])
cat("Number of singletons:", length(singletons), "\n")

Step 4: Remove singletons

Keep only OTUs with total abundance >1:

keep_otus <- rownames(otu)[otu_sums > 1]

# Subset OTU table
mt_nosingletons$otu_table <- otu[keep_otus, , drop = FALSE]

# Subset taxonomy table
mt_nosingletons$tax_table <- mt_rarefied$tax_table[keep_otus, , drop = FALSE]

# Subset phylogenetic tree
mt_nosingletons$phylo_tree <- ape::keep.tip(mt_nosingletons$phylo_tree, keep_otus)

# Subset representative sequences if present
if (!is.null(mt_nosingletons$rep_fasta)) {
  mt_nosingletons$rep_fasta <- mt_nosingletons$rep_fasta[keep_otus]
}

Step 5: Verify filtering

Check taxa counts before and after:

cat("Taxa before filtering:", nrow(otu), "\n")
cat("Taxa after filtering: ", length(keep_otus), "\n")
cat("Removed:", nrow(otu) - length(keep_otus), "singletons\n")

Step 6: Compare alpha diversity (Observed richness & Shannon)

# Observed richness
obs_before <- colSums(mt_rarefied$otu_table > 0)
obs_after  <- colSums(mt_nosingletons$otu_table > 0)

# Shannon diversity
shannon_before <- apply(mt_rarefied$otu_table, 2, vegan::diversity, index = "shannon")
shannon_after  <- apply(mt_nosingletons$otu_table, 2, vegan::diversity, index = "shannon")

# Combine into tidy dataframe for plotting
alpha_df <- data.frame(
  Sample = names(obs_before),
  Observed_before = obs_before,
  Observed_after  = obs_after,
  Shannon_before  = shannon_before,
  Shannon_after   = shannon_after
)

alpha_long <- alpha_df %>%
  tidyr::pivot_longer(-Sample, names_to = c("Metric", "Stage"),
                      names_sep = "_", values_to = "Value")

Step 7: Visualize alpha diversity before vs after

ggplot(alpha_long, aes(x = Stage, y = Value)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.12, size = 1.6, alpha = 0.7) +
  facet_wrap(~ Metric, scales = "free_y") +
  theme_minimal() +
  labs(title = "Alpha diversity: before vs after removing singletons",
       x = "", y = "Value")

Step 8: Save the singleton-free dataset (optional)

saveRDS(mt_nosingletons, "path/mt_nosingletons.rds")

Summary:

  • Count singletons to see how many “one-off” OTUs exist.
  • Remove singletons safely, keeping the microtable intact.
  • Check before-and-after taxa counts.
  • Compare alpha diversity (Observed richness & Shannon) to see the effect.
  • Your filtered dataset is now ready for clean diversity analysis.