One-Click Reports Generation
MicrobiomeStat equips researchers with automated workflows to quickly analyze the data and report the major findings by providing a one-click report-generation function. The standardized report integrates multidimensional perspectives, including alpha diversity, beta diversity, and differential abundance analysis, for comprehensive insights. The report contains both visualizations and statistical summaries to aid in biological interpretation and facilitate result dissemination. By automating time-consuming manual analytical tasks, MicrobiomeStat enables rapid, reproducible, and robust reporting to advance microbiome research.
The mStat_generate_report_single()
function generates an integrated report for cross-sectional (single time point) analysis. It performs:
Alpha diversity analysis: This calls functions
mStat_calculate_alpha_diversity()
,generate_alpha_boxplot_single()
, andgenerate_alpha_test_single()
.Beta diversity analysis: This calls functions
mStat_calculate_beta_diversity()
,generate_beta_ordination_single()
, andgenerate_beta_test_single()
.Feature-level analysis: This calls functions
generate_taxa_barplot_single()
,generate_taxa_dotplot_single()
,generate_taxa_heatmap_single()
,generate_taxa_test_single()
andgenerate_taxa_volcano_single()
.
The function then compiles these analyses into a comprehensive PDF report. The report includes:
A data summary from
mStat_summarize_data_obj()
Visual representations including alpha diversity boxplots, beta diversity ordination plots, and taxa composition visualizations
Tables detailing statistical test results
Commentary on key findings
Before using the function, it is important to understand the parameters:
data.obj
: A list object in MicrobiomeStat data format, which includes components feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). This object can be converted from other formats using several functions from the MicrobiomeStat package, or constructed manually. For more detailed information on how to convert data from other formats or how to construct thedata.obj
manually, please refer to the following page.
Once the data object has been successfully constructed, users may wish to perform some preprocessing of the data, including sample-level and feature-level filtering, renaming and recoding of the variables, changing the data types of the variables (e.g., character to numeric, character to factor, etc.) and rearranging the levels for a factor variable (so the reference category appears as the first level). These preprocessing will make sure the analyses proceed smoothly. We refer users to the following document for data manupilation.
group.var
: The name of the variable of primary interest. We currently support categorical type but plan to support numeric type in future versions. For a continuous variable, the user could dichotomize their variable for the exploratory purpose.test.adj.vars
: Names of columns in the metadata containing covariates to be adjusted for in statistical tests and models. Default is NULL, which indicates no covariates are adjusted for in statistical testing.vis.adj.vars
: For alpha and beta diversity visualization functions, thevis.adj.vars
parameter specifies the covariates whose effects will be removed before visualization. This is achieved by taking reisuals after regressing the alpha diversity and beta diversity PCs on the variables specified byvis.adj.vars
. This step is important to reveal the signal of interest when the unwanted variation invis.adj.vars
dominates/obscures the signals. Although it is equally important to do similar adjustment to feature-level data, due to the complexity of the data characteristics (zero inflation, etc.), such linear model-based adjustment may not be sufficient.strata.var
: Variable to stratify the data in visualization.subject.var
: Variable name used for subject identification.time.var
: Variable name used for time points.t.level
: Character string specifying the time level/value to subset the data to, only if bothtime.var
andt.level
are specified. If NULL, all data will be used.alpha.obj
: A matrix containing pre-calculated alpha diversity measures (row - samples, column - measures). If NULL (default), alpha diversity measures will be calculated usingmStat_calculate_alpha_diversity
function after data rarefaction bymStat_rarefy_data
. The rarefaction depth can be specified withdepth
parameter or minimum depth will be used ifdepth = NULL
.alpha.name
: The alpha diversity measures to be used. Supported measures include "shannon", "simpson", "observed_species", "chao1", "ace", and "pielou". If this parameter is set to NULL, the report will not include any alpha diversity results.depth
: An integer. Rarefaction depth when rarefaction is needed. If NULL, the minimum sequencing depth will be used.dist.name
: A character vector specifying which beta diversity measures to calculate. Supported measures are "BC" (Bray-Curtis), "Jaccard", "UniFrac" (unweighted UniFrac), "GUniFrac" (generalized UniFrac), "WUniFrac" (weighted UniFrac), and "JS" (Jensen-Shannon divergence). If this parameter is set to NULL, the report will not include any beta diversity results.dist.obj
: A list of distance matrices between samples. If NULL, beta diversity distance matrices will be automatically computed fromdata.obj
usingmStat_calculate_beta_diversity
after data rarefaction.pc.obj
: A list containing the principal coordinates calculated on the beta diversity distance matrices. If NULL (default), dimension reduction will be automatically performed using metric multidimensional scaling (MDS) viamStat_calculate_PC
. Thepc.obj
list structure should contain:$points
: A matrix with samples as rows and PCs as columns containing the coordinates.$eig
: Eigenvalues for each PC dimension.$vectors
: Feature loadings vectors for each PC.Other metadata like
$method
,$dist.name
, etc. SeemStat_calculate_PC
function for details.
feature.dat.type
: The type of the feature data, which determines how the data is handled in downstream analyses. Should be one of: "count": Raw count data from a sequencing experiment (e.g. ASV/OTU count); "proportion": Data that has already been normalized to proportions/percentages (e.g., functional data); "other": Other non-compositional data types, where the data will be analyzed directly without normalization and transformation. The user needs to determine the data-specific QC, normalization, and transformation. If the user wants to normalize/transform the abundance data on his own way, he can also use this option.feature.analysis.rarafy
: Logical, indicating whether to rarefy the data for feature-level analysis. If TRUE, the feature data will be rarefied before visualization and analysis. Default is TRUE. Note: When the majority of the features are of low-abundance, their presence/absence strongly depends on the sequencing depth. Rarefaction can be used to remove the unwanted variation due to sequencing depth and could increase the power for the analysis of rare features.vis.feature.level
: The feature levels to be visualized for an overview of the data (stacked barplot, heatmap, etc.). Feature levels should correspond to the column names in the feature annotation matrix (feature.ann
) of data.obj. It could also contain the "original" level, which is the raw feature level without aggregation.bar.area.feature.no
: A numeric value indicating the number of top abundant features to retain in both barplot and areaplot. Features with average relative abundance ranked below this number will be grouped into 'Other'. Default 20. Only applicable to count and proportion data.heatmap.feature.no
: A numeric value indicating the number of top abundant features to retain in the heatmap. Features with average relative abundance ranked below this number will be grouped into 'Other'. Default 20.dotplot.feature.no
: A numeric value indicating the number of top abundant features to retain in the dotplot. Features with average relative abundance ranked below this number will be grouped into 'Other'. Default 40. Only applicable to count and proportion data.test.feature.level
: The feature levels to be tested. Similar tovis.feature.level
. The signficant features will be visualized collectively and individually.feature.mt.method
: Character, multiple testing method to identify differential features, "fdr" or "none". Default is "fdr".feature.sig.level
: Numeric, significance cutoff for declaring differntial features, default is 0.1.feature.box.axis.transform
: A string indicating the transformation to be applied to abundance data before plotting. This parameter is only used ingenerate_taxa_boxplot_single
andgenerate_taxa_indiv_boxplot_single
. Options are:"identity": No transformation (default),
"sqrt": Square root transformation,
"log": Logarithmic transformation.
base.size
: Base font size for the generated plots.theme.choice
: Plot theme choice. Can be one of: "prism": ggprism::theme_prism(), "classic": theme_classic(), "gray": theme_gray(), "bw": theme_bw().output.file
: A character string specifying the output file name for the report. Full path can be specified using, for example, "path_to_your_location/report.pdf".
Note: Before running the function, please be aware of potential compatibility issues between RStudio and LaTeX. These issues can lead to problems such as images from the RStudio Viewer appearing in unexpected locations in the PDF report. To avoid this, it is recommended to clear the current images in the RStudio Viewer before running the function. You can do this by clicking on the broom icon in the RStudio Viewer.
Now, let's see how we can implement the function:
The automated report reduces the need for manual interaction and ensures consistency.
Last updated