3. Data filtering

There may be some artifacts you still want to filter out. Or you can completely filter out some treatments/groups at this point if you wish to at this point.

3.1 Filter Bacteria

I am going to filter all other domain except for Bacteria. My sequencing protocol was amlpifying specifically the V3V4 of Bacteria. Therefore I am removing everything else except Bacteria.

mt$tax_table %<>% base::subset(Kingdom == "k__Bacteria")
mt

3.2 Remove “mitochondria” or “chloroplast”

These may have assigned to as taxonomy levels when you used the database. We remove OTUs with the taxonomic assignments “mitochondria” or “chloroplast”.

# This will remove the lines containing the taxa word regardless of taxonomic ranks and ignoring word case in the tax_table.
# So if you want to filter some taxa not considerd pollutions, please use subset like the previous operation to filter tax_table.
mt$filter_pollution(taxa = c("mitochondria", "chloroplast"))

3.3 Trim and tidy up

To make the ASV and sample information consistent across all files in the object, we use function tidy_dataset to trim the data.

mt$tidy_dataset()
mt

If data is consistent, it should return something like this, pay attention to the number of rows and tips

microtable-class object:
sample_table have 72 rows and 4 columns
otu_table have 20790 rows and 72 columns
tax_table have 20790 rows and 7 columns
phylo_tree have 20790 tips

Why the name “mt”? Well, it’s short for “micro table” – simple, right? Naming your objects thoughtfully isn’t just for show; it’s a lifesaver when you’re juggling multiple packages and analyses. Imagine having a jungle of objects like table1, data2, or final_final_v3. Good luck remembering what’s what! 🙃

Plus, not all functions work on all objects, so having a clear, intuitive naming convention helps ensure that everything works smoothly and you avoid unnecessary headaches. Keep it clean, keep it simple!

By giving your tables, objects, or variables meaningful names, you create a mental map of your workflow. For example, when you’re working with microbiome data, naming your table mt instantly tells you, “Hey, this is my micro table!” Pair this with clear package-specific naming, and you’re golden. If you’re using the phyloseq package and have rarefied your data, you might call your object physeq_rarefied. This tells you exactly what the object is.

3.4 Filter the samples

Now is the time to filter out samples that you need to remove from the table. For example, I have two sets of data collected at two time points (Drought phase and Recovery phase). I am going to analyse them separately and therefore, filtering out the unwanted groups. Before doing any of that make sure you clone the full table and work on that clone instead of the original table. And then do the filtering.

Drought_mt <- clone(mt) # clone
Drought_mt$sample_table <- subset(Drought_mt$sample_table, Phase == "Drought") # filter by "Drought" in Phase column
Drought_mt$tidy_dataset() # to make sure consistency is established among files
Drought_mt

So far we have only prepared our data, the fun begins from here. However, if you fail at tidy_dataset() and tidy_taxonomy functions there is no point of going forward from this point onward. Go back and sort out the issues.

If you are successful so far, well done! Lets go to the next file where we start the actual microbiome analysis.