One-Click Reports Generation
Automate integrated analysis reporting for longitudinal microbiome studies with MicrobiomeStat.
MicrobiomeStat provides an efficient solution by generating integrated reports for cross-sectional studies. These reports include:
Visualizations to illustrate key findings
Statistical summaries that highlight significant results
The mStat_generate_report_long()
function generates an integrated reports. It performs:
Alpha diversity: Functions used include
generate_alpha_boxplot_long()
,generate_alpha_spaghettiplot_long()
,generate_alpha_trend_test_long()
, andgenerate_alpha_volatility_test_long()
.Beta diversity: Functions used include
generate_beta_ordination_long()
,generate_beta_pc_boxplot_long()
,generate_beta_change_spaghettiplot_long()
,generate_beta_trend_test_long()
, andgenerate_beta_volatility_test_long()
.Feature-level: Functions used include
generate_taxa_areaplot_long()
,generate_taxa_heatmap_long()
,generate_taxa_trend_test_long()
,generate_taxa_volatility_test_long()
,generate_taxa_boxplot_long()
,generate_taxa_spaghettiplot_long()
,generate_taxa_trend_volcano_long()
,generate_taxa_volatility_volcano_long()
,generate_taxa_barplot_long()
,generate_taxa_change_heatmap_long()
,generate_taxa_indiv_spaghettiplot_long()
, andgenerate_taxa_indiv_boxplot_long()
.
It then compiles the results into a publication-ready PDF report containing:
The function then compiles these analyses into a comprehensive PDF report. The report includes:
A data summary from
mStat_summarize_data_obj()
Visual representations like alpha diversity boxplots, beta diversity ordination plots, and taxa composition visualizations
Tables detailing statistical test results
Commentary on key findings
Before using the function, it's important to understand the parameters:
data.obj
: A list object in a specific format to MicrobiomeStat, which can include components such as feature.tab (matrix), feature.ann (matrix), meta.dat (data.frame), tree, and feature.agg.list (list). This object can be converted from other formats using several functions from the MicrobiomeStat package, or constructed manually. For more detailed information on how to convert data from other formats or how to construct thedata.obj
manually, please refer to the following page.
Once the data has been successfully constructed, users may wish to perform minor adjustments to the data, such as removing levels with few subjects from the group.var. In such cases, you can refer to this document for guidance.
group.var
: The name of the variable of primary interest.test.adj.vars
: Names of columns in the metadata containing covariates to be adjusted for in statistical tests and models. Default is NULL, which indicates no covariates are adjusted for in statistical testing.vis.adj.vars
: For alpha and beta diversity visualization functions, thevis.adj.vars
parameter designates the column names in the metadata that correspond to covariates. When covariates are specified,vis.adj.vars
will adjust thealpha.obj
anddist.obj
accordingly, resulting in an alpha diversity index and a distance matrix that have been modified to account for the additional covariates.strata.var
: Variable to stratify the analysis by (optional).subject.var
: Variable name used for subject identification.time.var
: This parameter represents the time variable in your data. It can take three forms:Numeric: When
time.var
is numeric, you don't need to sett0.level
andts.levels
, as they will be automatically arranged in ascending order.Factor: If
time.var
is a factor, and ift0.level
andts.levels
are not set, they will be automatically arranged according to the levels of the factor. However, if you have specific needs for the order of the levels, you can manually adjustt0.level
andts.levels
.Character: If
time.var
is character, it's recommended to manually sett0.level
andts.levels
, as the automatic arrangement might not reflect the correct order.
t0.level
: This parameter represents the first time point in your data.ts.levels
: This parameter represents the other time points in your data, arranged in the order you desire, excludingt0.level
.alpha.obj
: An optional list containing pre-calculated alpha diversity indices. If NULL (default), alpha diversity indices will be calculated usingmStat_calculate_alpha_diversity
function.alpha.name
: The alpha diversity index to be plotted. Supported indices include "shannon", "simpson", "observed_species", "chao1", "ace", and "pielou". If you are not interested in alpha diversity for datasets other than microbiome data, you can set this parameter to NULL. In this case, the report will not include any alpha diversity results, only feature-level analysis will be presented.depth
: An integer. The sequencing depth to be used for the "Rarefy" and "Rarefy-TSS" methods. If NULL, the smallest total count across samples is used as the rarefaction depth.dist.obj
: Distance matrix between samples, usually calculated usingmStat_calculate_beta_diversity
function. If NULL, beta diversity will be automatically computed fromdata.obj
usingmStat_calculate_beta_diversity
.dist.name
: A character vector specifying which beta diversity indices to calculate. Supported indices are "BC" (Bray-Curtis), "Jaccard", "UniFrac" (unweighted UniFrac), "GUniFrac" (generalized UniFrac), "WUniFrac" (weighted UniFrac), and "JS" (Jensen-Shannon divergence). Similar toalpha.name
, if you do not care about beta diversity for non-microbiome data, you can set this parameter to NULL. The report will then exclude beta diversity results and only provide feature-level analysis.pc.obj
: A list containing the results of dimension reduction/Principal Component Analysis. This should be the output from functions likemStat_calculate_PC
, containing the PC coordinates and other metadata. If NULL (default), dimension reduction will be automatically performed using metric multidimensional scaling (MDS) viamStat_calculate_PC
. Thepc.obj
list structure should contain:$points
: A matrix with samples as rows and PCs as columns containing the coordinates.$eig
: Eigenvalues for each PC dimension.$vectors
: Loadings vectors for features onto each PC.Other metadata like
$method
,$dist.name
, etc. SeemStat_calculate_PC
function for details on output format.
vis.feature.level
: The column name in the feature annotation matrix (feature.ann) of data.obj to use for visualization and plotting. This can be a taxonomic level like "Phylum" or "Genus" for microbiome data. For KEGG data, this could be a pathway level such as "Pathway_L1", "Pathway_L2", or "Pathway_L3". If you want to avoid aggregation, you can set it to "original", and no aggregation will be performed. The selected feature level will be used to aggregate the data at the specified level in the generated visualizations.test.feature.level
: The column name in the feature annotation matrix (feature.ann) of data.obj to use for testing or analytical purposes. This can be a taxonomic level like "Phylum" or "Genus" for microbiome data. For single-cell data, this could be a cell type identifier such as "CellType". For KEGG data, this could be a pathway level such as "Pathway_L1", "Pathway_L2", or "Pathway_L3". If you want to avoid aggregation, you can set it to "original", and no aggregation will be performed. The selected feature level will be used to aggregate the data at the specified level for statistical tests and models.feature.dat.type
: The type of the feature data, which determines how the data is handled in downstream analyses. Should be one of: "count": Raw count data, will be normalized innerally. "proportion": Data that has already been normalized to proportions/percentages. "other": Other data types such as transcriptomic and metabolomic data, where the user will decide the QC, normalization, and transformation. If the user wants to normalize/transform the abundance data on his own way, he can also use this option.feature.change.func
: A function or character string specifying how to calculate the change from baseline value. This allows flexible options:If a function is provided, it will be applied to each row to calculate change. The function should take 2 arguments: value at timepoint t and value at baseline t0.
If a character string is provided, following options are supported:
'relative change': (value_t - value_t0) / (value_t + value_t0)
'absolute change': value_t - value_t0
'log fold change': log2(value_t + 1e-5) - log2(value_t0 + 1e-5)
Default is 'relative change'.
If none of the above options are matched, an error will be thrown indicating the acceptable options or prompting the user to provide a custom function.
feature.box.axis.transform
: A string indicating the transformation to apply to the data before plotting. Options are:"identity": No transformation (default),
"sqrt": Square root transformation,
"log": Logarithmic transformation.
feature.analysis.rarafy
: Logical, indicating whether to rarefy the data at the feature-level for analysis. If TRUE, the feature data will be rarefied before analysis. Default is TRUE.bar.area.feature.no
: A numeric value indicating the number of top abundant features to retain in both barplot and areaplot. Features with average relative abundance ranked below this number will be grouped into 'Other'. Default 20.heatmap.feature.no
: A numeric value indicating the number of top abundant features to retain in the heatmap. Features with average relative abundance ranked below this number will be grouped into 'Other'. Default 20.dotplot.feature.no
: A numeric value indicating the number of top abundant features to retain in the dotplot. Features with average relative abundance ranked below this number will be grouped into 'Other'. Default 40.feature.mt.method
: Character, multiple testing method for features, "fdr" or "none", default is "fdr".feature.sig.level
: Numeric, significance level cutoff for highlighting features, default is 0.1.base.size
: Base font size for the generated plots.theme.choice
: Plot theme choice. Can be one of: "prism": ggprism::theme_prism(), "classic": theme_classic(), "gray": theme_gray(), "bw": theme_bw().output.file
: A character string specifying the output file name for the report. This will also determine the title of the generated report. For example, if you set it to "path_to_your_location/report.pdf", the title of the report will be "report".
Note: Before running the function, please be aware of potential compatibility issues between RStudio and LaTeX. These issues can lead to problems such as images from the RStudio Viewer appearing in unexpected locations in the PDF report. To avoid this, it's recommended to clear the current images in the RStudio Viewer before running the function. You can do this by clicking on the broom icon in the RStudio Viewer.
Now, let's see how we can implement the function:
The automated report reduces the need for manual analysis and ensures consistency. By integrating results from different analytical dimensions, it provides a comprehensive view of the data.
MicrobiomeStat equips researchers with automated workflows to quickly synthesize and report findings from cross-sectional studies. The standardized reports integrate multidimensional perspectives, including alpha diversity, beta diversity, and differential abundance analysis, for comprehensive insights.
The reports contain clear visualizations and statistical summaries to aid in biological interpretation and facilitate result dissemination. By automating time-consuming manual analytical tasks, MicrobiomeStat enables rapid, reproducible, and robust reporting to advance microbiome research.
Last updated