1. Import data

We’ll kick things off by importing four datasets from your QIIME 2 outputs. Importing datasets and formatting them correctly is just as important as troubleshooting. It’s critical to double-check the data structure and explicitly format it as needed to avoid any issues down the line. Trust me—taking a moment now to ensure everything is properly set up can save you countless headaches later. Let’s make sure everything is seamless before moving forward!

Got other variables like biomass measurements or environmental factors? Bring them along! These are the secret sauce for meaningful analysis. Linking these variables to changes in your microbiome (later in Step 1.5) is what makes the whole exercise worthwhile. After all, what’s the point of staring at pretty graphs if they don’t connect to the experimental variables you actually care about? This is where the real magic happens—tying microbiome shifts to your study’s key drivers!

Resolve dataframe issues: Heads-up: I hit some bumps when working with the tidy_dataset function later in the pipeline. This function makes sure that OTU (ASV) and sample information stay consistent across all files in the dataset object. But if your data frames aren’t formatted properly, you can run into errors.

For example:

> mt$tidy_dataset() # Check if resolved
Error in private$tidy_samples(self) :
No same sample name found between rownames of sample_table and colnames of otu_table! Please first check whether the rownames of sample_table are sample names! Then check through the sample names of each table!

What this error really means is that the row names in your different data frames don’t match up.

✅ The fix: take care of potential formatting issues right after importing the data. A bit of cleanup at the start will save you a lot of debugging headaches later.

I’ll first show you how the sample metadata dataframe (sample_info_16S) is set. But keep in mind — you’ll need to check and fix the row names/IDs for all three key inputs:

— the Sample Info table

— the Feature (ASV/OTU) table

— the Taxonomy table

This step is very important because consistency across these three data frames is what makes the downstream functions (like tidy_dataset) work smoothly.

NOTE: So, you think converting it to a data frame will magically fix everything?

Well… think again! 😅 There are still plenty of ways things can go sideways. But let’s not panic just yet. First, we’ll make sure everything holds up through the data import process. Once we’re there, I’ll guide you through troubleshooting—step by step. Stay with me! 🚀

1.1. Importing Metadata

# allow more waiting time to download each package
options(timeout = 1000)
# If a package is not installed, it will be installed from CRAN
# First select the packages of interest
tmp <- c("microeco", "mecoturn", "MASS", "GUniFrac", "ggpubr", "randomForest", "ggdendro", "ggrepel", "agricolae", "igraph", "picante", "pheatmap", "rgexf", 
         "ggalluvial", "ggh4x", "rcompanion", "FSA", "gridExtra", "aplot", "NST", "GGally", "ggraph", "networkD3", "poweRlaw", "ggtern", "SRS", "performance")
# Now check or install
for(x in tmp){
  if(!require(x, character.only = TRUE)) {
    install.packages(x, dependencies = TRUE)
  }
}


library(file2meco)
library(microeco)
library(readr)
library(qiime2R)
library(dplyr)
library(tidyr)  # Required for `separate()`
library(tidyverse)
library(magrittr)  # Load magrittr for %<>%


sample_info_16S <- read_tsv("/path/metadata.tsv")
sample_info_16S <- as.data.frame(sample_info_16S) # convert the sample_table_16S to data.frame
rownames(sample_info_16S)

👉 Now this will print the first 6 rows (head) and the row names. Here’s the the output when things don’t look right (for our row names):

> class(sample_info_16S)
[1] "data.frame"
> head(sample_info_16S)
  sample-id        Plot        Year
1 #q2:types categorical categorical
2   C1-2023       plot1       y2023
3   G1-2023       plot1       y2023
4   N1-2023       plot1       y2023
5   C2-2023       plot2       y2023
6   G2-2023       plot2       y2023
> rownames(sample_info_16S)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
[27] "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44" "45"

Here’s the fix — we just need to remove the Qiime2 remove the type declaration row (2nd row) starting with “#q2:types” and set the row names properly using the sample-id column from the metadata:

# remove the row where sample-id starts with "#q2:types"
sample_info_16S <- sample_info_16S[ sample_info_16S$`sample-id` != "#q2:types", ]
rownames(sample_info_16S) <- sample_info_16S$`sample-id`
rownames(sample_info_16S)

⚠️ If your column name contains special characters (e.g., #SampleID or sample-id), you need to enclose it in backticks when referencing it in R.

If your column name is a simple one (e.g., SampleID), you can drop the backticks and use it directly.

After making this adjustment, when you check the row names again, they should appear correctly.

> rownames(sample_info_16S)
 [1] "C1-2023"  "G1-2023"  "N1-2023"  "C2-2023"  "G2-2023"  "N2-2023"  "C3-2023"  "G3-2023"  "N3-2023"  "C4-2023"  "G4-2023"  "N4-2023"  "C5-2023"  "G5-2023"  "N5-2023"  "C6-2023"  "G6-2023" 
[18] "N6-2023"  "C7-2023"  "G7-2023"  "N7-2023"  "C8-2023"  "G8-2023"  "N8-2023"  "C9-2023"  "G9-2023"  "N9-2023"  "C10-2023" "G10-2023" "N10-2023" "C1-2022"  "G1-2022"  "N1-2022"  "C2-2022" 
[35] "G2-2022"  "N2-2022"  "C3-2022"  "G3-2022"  "N3-2022"  "C4-2022"  "G4-2022"  "N4-2022"  "C5-2022"  "G5-2022"  "N5-2022"  "C6-2022"  "G6-2022"  "N6-2022"  "C7-2022"  "G7-2022"  "N7-2022" 
[52] "C8-2022"  "G8-2022"  "N8-2022"  "C9-2022"  "G9-2022"  "N9-2022"  "C10-2022" "G10-2022" "N10-2022"

1.2. Importing Feature table (or ASV/OTU table)

otu_table_16S <- read_qza("/path/table.qza")
otu_table_16S <- as.data.frame(otu_table_16S$data) # converts to data.frame
rownames(otu_table_16S)
class(otu_table_16S) 
head(otu_table_16S)

1.3. Importing Taxonomy table

taxonomy_table_16s <- read_qza("/path/taxonomy.qza")
taxonomy_table_16s <- taxonomy_table_16s$data # converts to data.frame
head(rownames(taxonomy_table_16s))
rownames(taxonomy_table_16s) <- taxonomy_table_16s$"Feature.ID"
head(rownames(taxonomy_table_16s))
head(taxonomy_table_16s)

When you run the head function now, you’ll notice something interesting: all the taxonomy ranks—Kingdom, Phylum, Class, and so on—are bundled together in a single column labeled Taxon. While it’s neat to have everything in one place, it’s not quite practical for our future analyses. To make the most out of this data, we need to separate it into individual columns for each taxonomic rank. Let’s tackle that next!

# Define taxonomic ranks
tax_ranks <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")

# Process taxonomy data while keeping prefixes
taxonomy_table_16s <- taxonomy_table_16s %>%
  mutate(Taxon = str_split(Taxon, ";")) %>%  # Split into lists
  unnest_wider(Taxon, names_sep = "_") %>%  # Expand list columns into separate columns
  rename_with(~ tax_ranks, starts_with("Taxon_"))  # Rename the new columns

# Check the result
head(taxonomy_table_16s)

Troubleshooting memory issues

For some strange reason, my RStudio hit a memory cap—something that never happened in the past eight months. 🤯 I tried different fixes, but nothing worked. So instead of battling it out, I took a different approach.

Here’s the alternative code that got the job done:

library(tidyverse)

# Define taxonomic ranks
tax_ranks <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species") # Process taxonomy data while keeping prefixes taxonomy_table_16s <- taxonomy_table_16s %>%
mutate(Taxon = str_split(Taxon, “;”)) %>% # Split into lists
unnest_wider(Taxon, names_sep = “_”) %>% # Expand list columns into separate columns
rename_with(~ tax_ranks, starts_with(“Taxon_”)) # Rename the new columns

# Check the result
head(taxonomy_table_16s)

taxonomy_table_16s <- as.data.frame(taxonomy_table_16s) taxonomy_table_16s %<>% tidy_taxonomy()

Even after splitting the taxonomy into individual columns, taxonomic tables often come with hidden issues (empty slots with NA, taxa with different prefixes) that can complicate analysis. To tackle these and ensure a clean dataset, we’ll use the tidy_taxonomy function. It’s a quick and efficient fix—this command should run in just a couple of seconds.

taxonomy_table_16s %<>% tidy_taxonomy
head(taxonomy_table_16s)

At this point, you might notice something strange—some of the Feature.ID and Confidence values have prefixes like f__ or c__. I’m not exactly sure why this happens, but rather than diving into a time-consuming troubleshooting session, let’s take the simpler route and clean them up.

We can easily remove these prefixes using the following code:

taxonomy_table_16s <- as.data.frame(taxonomy_table_16s)
taxonomy_table_16s %<>% tidy_taxonomy()
head(taxonomy_table_16s)

taxonomy_table_16s %<>% 
  tidy_taxonomy %>%
  mutate(Feature.ID = gsub("^f__", "", Feature.ID), # Remove "f__" from Feature.ID
         Confidence = gsub("^c__", "", Confidence) # Remove "c__" from Confidence
  )
head(taxonomy_table_16s)

If everything worked as expected, your cleaned taxonomy table should look something like this when you run head(taxonomy_table_16s):

The next step is to designate Feature.ID as the row names for our taxonomy table. This helps ensure that each feature is uniquely identified and aligns with downstream analyses.

We can achieve this with the following code:

rownames(taxonomy_table_16s) <- taxonomy_table_16s$Feature.ID
taxonomy_table_16s <- taxonomy_table_16s[, -1] # This will remove the additional column

1.4. Importing phylogenetic tree

phylo_tree_16S <- read_qza("path/Feature_TreeRooted.qza")
#Extract the tree data and convert it to a phylo object
phylo_tree_16S <- phylo_tree_16S$data
class(phylo_tree_16S) # Should return as "phylo"

1.5. Importing Environmental data (response variables (biomass, TOC), etc.)

You can use these data to correlate with changes in the microbiome—not mandatory, but highly recommended. It’s always exciting to see how your variables align with shifts in the microbiome. Understanding these relationships can add depth to your findings and make your analysis far more insightful.

env_data_16S <- read_csv("path/env_data.csv")
colnames(env_data_16S)[colnames(env_data_16S) == "SampleID"] <- "" # Remove the column name for "Feature ID" in sample_info_16S
env_data_16S <- as.data.frame(env_data_16S)
head(env_data_16S)
class(env_data_16S)
Have you checked everything? Once you’ve completed Steps 1, 2, 3, and 5, don’t forget to test your data. This step is already included in the steps above, but consider this your friendly nudge to double-check everything before diving deeper. It’s like tuning an instrument before the big performance—small adjustments now ensure harmony later!
class(taxonomy_table_16S) # example for checking taxonomy table
head(rownames(taxonomy_table_16s))
head(rownames(sample_info_16S))
head(rownames(otu_table_16S))

✅ Make sure the objects have the correct class after import:

  • For the files you obtained from steps 1, 2, 3, and 5, the output should be: [1] "data.frame" and
  • For the file you obtained from step 4, the output should be: [1] "phylo"

If everything looks good with the taxonomy table and Feature.ID has been successfully set as the row names, we’re ready to move on and create the micro table! 🚀.

However, if something doesn’t look right, don’t ignore it. It’s important to troubleshoot and resolve any issues with the table before proceeding. Most of the issues arise from the incorrect recognition fof row names and column headers.

Using rownames(your_data_frame) and colnames(your_data_frame) is very helpful for checking and assigning the correct identifiers. You can assign row names from a specific column by using rownames(your_data_frame) <- your_data_frame$CorrectHeader.

It is important that the column name (CorrectHeader) is exactly correct, including capitalization or any special characters, to ensure the row assignments are accurate.