Alpha Diversity Analysis
This section introduces alpha diversity analysis in longitudinal studies, focusing on how within-sample diversity changes over time. The first step is to filter out samples with low sequencing depth, as alpha diversity analysis is sensitive to this factor.
MicrobiomeStat includes "shannon", "simpson", "observed_species", "chao1", "ace", and "pielou". Each of these metrics furnishes distinct insights into the species richness and evenness inherent in the microbiome samples.
For those functions performing alpha diversity analysis, they all include an alpha.obj parameter, which is a list output from calling mStat_calculate_alpha_diversity
If alpha.obj parameter is NULL, mStat_calculate_alpha_diversity
will be called autonomously. To speed up computation, we recommend calling mStat_calculate_alpha_diversity
once and store the alpha diversity indices in the alpha.obj, which can be used later repeatedly.
To note, mStat_calculate_alpha_diversity
by deafult, undertakes data rarefaction. This process ensures the datasets are rendered more comparable by equalizing the sequencing depth across samples. Such standardization is vital for comparative analyses. You can set the desired rarefaction depth in mStat_calculate_alpha_diversity
or if not specified, the minimum sequencing depth will be used. If you aim to work with non-rarefied data when determining alpha diversity, You can pre-calculate your own alpha diversity indices and pass them to the alpha.obj parameter in these alpha diversity analysis functions.
To account for potential confounding factors that may influence the relationship between alpha diversity and time, the function allows for the inclusion of additional variables (adj.vars
) in the model. These variables are adjusted for in the linear mixed-effects model, ensuring that the influence of time on alpha diversity is evaluated more accurately.
After the initial preparation, we can use MicrobiomeStat's functions to test for differences in alpha diversity across timepoints. One such function is generate_alpha_trend_test_long
, which uses a linear mixed-effects model to analyze longitudinal alpha diversity data. It tests whether alpha diversity changes significantly over time, while taking into account individual variability and other potential confounding factors. The mixed-effects model allows for within-subject and between-subject variability, making it suitable for unbalanced data and repeated measures. The trend test checks if the association between alpha diversity and time is statistically significant, providing insights into temporal dynamics in microbiome diversity.
The time.var
argument in the function generate_alpha_trend_test_long
needs to be numeric. If you provide a string, it will automatically be converted to a factor and then to numeric, which might cause issues. Please ensure that the time variable is numeric to avoid any potential problems.
Shannon Diversity
(Intercept)
2.59
0.204
12.7
5.13e-27
subject_racecaucasian
0.214
0.231
0.928
3.55e-1
subject_racehispanic_or_latino
0.197
0.441
0.446
6.56e-1
visit_number_num
-0.00852
0.0547
-0.156
8.76e-1
subject_racecaucasian:visit_number_num
-0.0105
0.0617
-0.170
8.65e-1
subject_racehispanic_or_latino:visit_number_num
0.0515
0.116
0.443
6.58e-1
subject_race:visit_number_num
NA
NA
0.175
8.40e-1
Observed Species Diversity
(Intercept)
118
16.1
7.36
3.22e-12
subject_racecaucasian
23.2
18.2
1.28
2.02e-1
subject_racehispanic_or_latino
20.1
34.6
0.580
5.62e-1
visit_number_num
-0.152
4.35
-0.0349
9.72e-1
subject_racecaucasian:visit_number_num
-0.792
4.90
-0.162
8.72e-1
subject_racehispanic_or_latino:visit_number_num
2.71
9.28
0.292
7.70e-1
subject_race:visit_number_num
NA
NA
0.0910
9.13e-1
In the trend test, our primary focus is on the interaction term between group.var
and time.var
. When the levels of group.var
are greater than 2, an ANOVA is performed, which is represented in the last row of the table.
Another useful function is generate_alpha_volatility_test_long
. This function calculates the volatility of alpha diversity measures in longitudinal data and tests the association between the volatility and a group variable. Volatility is calculated as the mean of absolute differences between consecutive alpha diversity measures, normalized by the time difference.
In mathematical terms, volatility (V) can be represented like this:
First, let's define a few terms:
"alpha_i" is the alpha diversity measure at time "i".
"delta_t_i" is the time difference between time "i" and "i+1".
With these terms, the volatility for a subject is calculated as:
Here:
"N" is the total number of time points for the subject.
The summation "sum" is over all time points for which "alpha_(i+1)" is defined.
Shannon
(Intercept)
0.544
0.0959
5.67
0.000000423
subject_racecaucasian
0.0222
0.108
0.205
0.838
subject_racehispanic_or_latino
-0.0705
0.222
-0.318
0.752
subject_race
NA
NA
0.114
0.893
Residuals
NA
NA
NA
NA
Observed Species
(Intercept)
40.6
5.82
6.98
2.51e-9
subject_racecaucasian
-1.95
6.56
-0.297
7.67e-1
subject_racehispanic_or_latino
-7.04
13.4
-0.524
6.02e-1
subject_race
NA
NA
0.143
8.67e-1
Residuals
NA
NA
NA
NA
After discussing the functions generate_alpha_trend_test_long
and generate_alpha_volatility_test_long
, let's explore another important aspect of analyzing longitudinal alpha diversity data in the context of Type 2 Diabetes (T2D) dataset.
In addition to the trend and volatility tests, MicrobiomeStat provides the capability to perform detailed alpha diversity tests at each time point in a longitudinal study. This is achieved using the `` function. This function allows for a comprehensive examination of alpha diversity measures such as Shannon, Simpson, Observed Species, Chao1, ACE, and Pielou's Evenness across different time points in the dataset.
To perform the longitudinal alpha diversity test for the T2D dataset, we apply the generate_alpha_per_time_test_long
function. This function requires specifying various parameters including alpha diversity measures, time variable, levels for time points, group variable, and any additional variables for adjustment. Here's an example:
Subsequently, to visualize the results of these tests, the generate_alpha_dotplot_long
function can be utilized. This function creates dot plots for the alpha diversity measures, allowing for an intuitive understanding of the changes and differences across time points and groups. The visualization includes specifying the group and time variables, setting the levels for time points, and choosing the desired theme and base size for the plot. The following example demonstrates how to generate dot plots for the T2D dataset results:
In the dot plots generated by generate_alpha_dotplot_long
, you'll notice that some dots are marked with an asterisk (*). These asterisks signify statistical significance.
Further enhancing our analysis, we introduce the generate_alpha_change_per_time_test_long
function. This function is specifically designed to assess the change in alpha diversity for each subject at different time points relative to a baseline level (t0.level
). It performs statistical tests to evaluate the significance of changes in alpha diversity, using measures like "log fold change" to quantify these alterations. This approach is particularly insightful in longitudinal studies where the focus is on understanding how individual subjects' microbial communities evolve over time.
To visualize the results from generate_alpha_change_per_time_test_long
, we use the generate_alpha_dotplot_long
function. This function creates dot plots for the alpha diversity measures, providing an intuitive understanding of the changes and differences across time points and groups. The plots help in visually interpreting the statistical significance and trends in the data. Here's how you can generate these plots for the T2D dataset:
These functions, generate_alpha_per_time_test_long
, generate_alpha_change_per_time_test_long
, and generate_alpha_dotplot_long
, complement the earlier discussed functions by providing a more granular view of alpha diversity over time. They are especially useful in studies with multiple time points like the T2D dataset, offering a comprehensive perspective on the dynamics of microbial diversity.
Before we proceed with the visualization, it's crucial to understand the time.var
, t0.level
, and ts.levels
parameters used in the functions generate_alpha_spaghettiplot_long
and generate_alpha_boxplot_long
.
The time.var
parameter can take three forms:
Numeric: When
time.var
is numeric, you don't need to sett0.level
andts.levels
, as they will be automatically arranged in ascending order.Factor: If
time.var
is a factor, and ift0.level
andts.levels
are not set, they will be automatically arranged according to the levels of the factor. However, if you have specific needs for the order of the levels, you can manually adjustt0.level
andts.levels
.Character: If
time.var
is character, it's recommended to manually sett0.level
andts.levels
, as the automatic arrangement might not reflect the correct order.
The t0.level
parameter represents the first time point, while ts.levels
represents the other time points, arranged in the order you desire, excluding t0.level
.
Understanding these parameters is pivotal for effective visualization of temporal changes in alpha diversity. Now, let's proceed with creating the spaghetti plot and box plot to visualize alpha diversity changes over time for each individual in the study, grouped by race.
Before we proceed with the generate_alpha_boxplot_long()
function, it's important to note that when dealing with multiple time points, a boxplot might not provide the most effective visualization. The boxplot can become cluttered and harder to interpret with the addition of more time points. In such cases, a spaghetti plot, as generated by the generate_alpha_spaghettiplot_long()
function, is often a better choice as it can more clearly illustrate the individual changes over time.
However, if you still wish to use a boxplot for visualizing alpha diversity over time, here's how you can do it with the generate_alpha_boxplot_long()
function:
Last updated