Alpha Diversity Analysis

This section introduces alpha diversity analysis in longitudinal studies, focusing on how within-sample diversity changes over time. The first step is to filter out samples with low sequencing depth, as alpha diversity analysis is sensitive to this factor.

MicrobiomeStat includes "shannon", "simpson", "observed_species", "chao1", "ace", and "pielou". Each of these metrics furnishes distinct insights into the species richness and evenness inherent in the microbiome samples.

For those functions performing alpha diversity analysis, they all include an alpha.obj parameter, which is a list output from calling mStat_calculate_alpha_diversity If alpha.obj parameter is NULL, mStat_calculate_alpha_diversity will be called autonomously. To speed up computation, we recommend calling mStat_calculate_alpha_diversity once and store the alpha diversity indices in the alpha.obj, which can be used later repeatedly.

To note, mStat_calculate_alpha_diversity by deafult, undertakes data rarefaction. This process ensures the datasets are rendered more comparable by equalizing the sequencing depth across samples. Such standardization is vital for comparative analyses. You can set the desired rarefaction depth in mStat_calculate_alpha_diversity or if not specified, the minimum sequencing depth will be used. If you aim to work with non-rarefied data when determining alpha diversity, You can pre-calculate your own alpha diversity indices and pass them to the alpha.obj parameter in these alpha diversity analysis functions.

To account for potential confounding factors that may influence the relationship between alpha diversity and time, the function allows for the inclusion of additional variables (adj.vars) in the model. These variables are adjusted for in the linear mixed-effects model, ensuring that the influence of time on alpha diversity is evaluated more accurately.

After the initial preparation, we can use MicrobiomeStat's functions to test for differences in alpha diversity across timepoints. One such function is generate_alpha_trend_test_long, which uses a linear mixed-effects model to analyze longitudinal alpha diversity data. It tests whether alpha diversity changes significantly over time, while taking into account individual variability and other potential confounding factors. The mixed-effects model allows for within-subject and between-subject variability, making it suitable for unbalanced data and repeated measures. The trend test checks if the association between alpha diversity and time is statistically significant, providing insights into temporal dynamics in microbiome diversity.

The time.var argument in the function generate_alpha_trend_test_long needs to be numeric. If you provide a string, it will automatically be converted to a factor and then to numeric, which might cause issues. Please ensure that the time variable is numeric to avoid any potential problems.

data("subset_T2D.obj") 
alpha_trend_test_results <- generate_alpha_trend_test_long(
  data.obj = subset_T2D.obj,
  alpha.name = c("shannon","observed_species"),
  time.var = "visit_number_num",
  subject.var = "subject_id",
  group.var = "subject_race",
  adj.vars = NULL
)

Shannon Diversity

TermEstimateStd.ErrorStatisticP.Value

(Intercept)

2.59

0.204

12.7

5.13e-27

subject_racecaucasian

0.214

0.231

0.928

3.55e-1

subject_racehispanic_or_latino

0.197

0.441

0.446

6.56e-1

visit_number_num

-0.00852

0.0547

-0.156

8.76e-1

subject_racecaucasian:visit_number_num

-0.0105

0.0617

-0.170

8.65e-1

subject_racehispanic_or_latino:visit_number_num

0.0515

0.116

0.443

6.58e-1

subject_race:visit_number_num

NA

NA

0.175

8.40e-1

Observed Species Diversity

TermEstimateStd.ErrorStatisticP.Value

(Intercept)

118

16.1

7.36

3.22e-12

subject_racecaucasian

23.2

18.2

1.28

2.02e-1

subject_racehispanic_or_latino

20.1

34.6

0.580

5.62e-1

visit_number_num

-0.152

4.35

-0.0349

9.72e-1

subject_racecaucasian:visit_number_num

-0.792

4.90

-0.162

8.72e-1

subject_racehispanic_or_latino:visit_number_num

2.71

9.28

0.292

7.70e-1

subject_race:visit_number_num

NA

NA

0.0910

9.13e-1

In the trend test, our primary focus is on the interaction term between group.var and time.var. When the levels of group.var are greater than 2, an ANOVA is performed, which is represented in the last row of the table.

Another useful function is generate_alpha_volatility_test_long. This function calculates the volatility of alpha diversity measures in longitudinal data and tests the association between the volatility and a group variable. Volatility is calculated as the mean of absolute differences between consecutive alpha diversity measures, normalized by the time difference.

In mathematical terms, volatility (V) can be represented like this:

First, let's define a few terms:

  • "alpha_i" is the alpha diversity measure at time "i".

  • "delta_t_i" is the time difference between time "i" and "i+1".

With these terms, the volatility for a subject is calculated as:

V=1Nαi+1αiΔtiV = \frac{1}{N} \sum \left| \frac{\alpha_{i+1} - \alpha_i}{\Delta t_i} \right|

Here:

  • "N" is the total number of time points for the subject.

  • The summation "sum" is over all time points for which "alpha_(i+1)" is defined.

data("subset_T2D.obj") 
alpha_volatility_test_results <- generate_alpha_volatility_test_long(
  data.obj = subset_T2D.obj,
  alpha.obj = NULL,
  alpha.name = c("shannon","observed_species"),
  time.var = "visit_number_num",
  subject.var = "subject_id",
  group.var = "subject_race",
  adj.vars = "sample_body_site"
)

Shannon

TermEstimateStd.ErrorStatisticP.Value

(Intercept)

0.544

0.0959

5.67

0.000000423

subject_racecaucasian

0.0222

0.108

0.205

0.838

subject_racehispanic_or_latino

-0.0705

0.222

-0.318

0.752

subject_race

NA

NA

0.114

0.893

Residuals

NA

NA

NA

NA

Observed Species

TermEstimateStd.ErrorStatisticP.Value

(Intercept)

40.6

5.82

6.98

2.51e-9

subject_racecaucasian

-1.95

6.56

-0.297

7.67e-1

subject_racehispanic_or_latino

-7.04

13.4

-0.524

6.02e-1

subject_race

NA

NA

0.143

8.67e-1

Residuals

NA

NA

NA

NA

After discussing the functions generate_alpha_trend_test_long and generate_alpha_volatility_test_long, let's explore another important aspect of analyzing longitudinal alpha diversity data in the context of Type 2 Diabetes (T2D) dataset.

In addition to the trend and volatility tests, MicrobiomeStat provides the capability to perform detailed alpha diversity tests at each time point in a longitudinal study. This is achieved using the generate_alpha_test_long function. This function allows for a comprehensive examination of alpha diversity measures such as Shannon, Simpson, Observed Species, Chao1, ACE, and Pielou's Evenness across different time points in the dataset.

To perform the longitudinal alpha diversity test for the T2D dataset, we apply the generate_alpha_test_long function. This function requires specifying various parameters including alpha diversity measures, time variable, levels for time points, group variable, and any additional variables for adjustment. Here's an example:

alpha_test_results_T2D <- generate_alpha_test_long(
  data.obj = subset_T2D.obj,
  alpha.name = c("shannon", "simpson", "observed_species", "chao1", "ace", "pielou"),
  time.var = "visit_number",
  t0.level = unique(subset_T2D.obj$meta.dat$visit_number)[1],
  ts.levels = unique(subset_T2D.obj$meta.dat$visit_number)[-1],
  group.var = "subject_race",
  adj.vars = c("sample_body_site")
)

Subsequently, to visualize the results of these tests, the generate_alpha_dotplot_long function can be utilized. This function creates dot plots for the alpha diversity measures, allowing for an intuitive understanding of the changes and differences across time points and groups. The visualization includes specifying the group and time variables, setting the levels for time points, and choosing the desired theme and base size for the plot. The following example demonstrates how to generate dot plots for the T2D dataset results:

dot_plots_T2D <- generate_alpha_dotplot_long(
  data.obj = subset_T2D.obj,
  test.list = alpha_test_results_T2D,
  group.var = "subject_race",
  time.var = "visit_number",
  t0.level = unique(subset_T2D.obj$meta.dat$visit_number)[1],
  ts.levels = unique(subset_T2D.obj$meta.dat$visit_number)[-1],
  base.size = 16,
  theme.choice = "bw"
)

In the dot plots generated by generate_alpha_dotplot_long, you'll notice that some dots are marked with an asterisk (*). These asterisks signify statistical significance.

Further enhancing our analysis, we introduce the generate_alpha_change_test_long function. This function is specifically designed to assess the change in alpha diversity for each subject at different time points relative to a baseline level (t0.level). It performs statistical tests to evaluate the significance of changes in alpha diversity, using measures like "log fold change" to quantify these alterations. This approach is particularly insightful in longitudinal studies where the focus is on understanding how individual subjects' microbial communities evolve over time.

alpha_test_results_T2D <- generate_alpha_change_test_long(
  data.obj = subset_T2D.obj,
  alpha.name = c("shannon", "simpson", "observed_species", "chao1", "ace", "pielou"),
  time.var = "visit_number",
  t0.level = unique(subset_T2D.obj$meta.dat$visit_number)[1],
  ts.levels = unique(subset_T2D.obj$meta.dat$visit_number)[-1],
  subject.var = "subject_id",
  group.var = "subject_race",
  adj.vars = c("sample_body_site"),
  alpha.change.func = "log fold change"
)

To visualize the results from generate_alpha_change_test_long, we use the generate_alpha_dotplot_long function. This function creates dot plots for the alpha diversity measures, providing an intuitive understanding of the changes and differences across time points and groups. The plots help in visually interpreting the statistical significance and trends in the data. Here's how you can generate these plots for the T2D dataset:

dot_plots_T2D <- generate_alpha_dotplot_long(
  data.obj = subset_T2D.obj,
  test.list = alpha_test_results_T2D,
  group.var = "subject_race",
  time.var = "visit_number",
  t0.level = unique(subset_T2D.obj$meta.dat$visit_number)[1],
  ts.levels = unique(subset_T2D.obj$meta.dat$visit_number)[-1],
  base.size = 16,
  theme.choice = "bw"
)

These functions, generate_alpha_test_long, generate_alpha_change_test_long, and generate_alpha_dotplot_long, complement the earlier discussed functions by providing a more granular view of alpha diversity over time. They are especially useful in studies with multiple time points like the T2D dataset, offering a comprehensive perspective on the dynamics of microbial diversity.

Before we proceed with the visualization, it's crucial to understand the time.var, t0.level, and ts.levels parameters used in the functions generate_alpha_spaghettiplot_long and generate_alpha_boxplot_long.

The time.var parameter can take three forms:

  • Numeric: When time.var is numeric, you don't need to set t0.level and ts.levels, as they will be automatically arranged in ascending order.

  • Factor: If time.var is a factor, and if t0.level and ts.levels are not set, they will be automatically arranged according to the levels of the factor. However, if you have specific needs for the order of the levels, you can manually adjust t0.level and ts.levels.

  • Character: If time.var is character, it's recommended to manually set t0.level and ts.levels, as the automatic arrangement might not reflect the correct order.

The t0.level parameter represents the first time point, while ts.levels represents the other time points, arranged in the order you desire, excluding t0.level.

Understanding these parameters is pivotal for effective visualization of temporal changes in alpha diversity. Now, let's proceed with creating the spaghetti plot and box plot to visualize alpha diversity changes over time for each individual in the study, grouped by race.

# Generate a spaghetti plot of alpha diversity over time
generate_alpha_spaghettiplot_long(
  data.obj = subset_T2D.obj,
  alpha.obj = NULL,
  alpha.name = c("shannon"),
  depth = NULL,
  subject.var = "subject_id",
  time.var = "visit_number",
  t0.level = sort(unique(subset_T2D.obj$meta.dat$visit_number))[1],
  ts.levels = sort(unique(subset_T2D.obj$meta.dat$visit_number))[-1],
  group.var = "subject_race",
  strata.var = NULL,
  theme.choice = "bw",
  palette = NULL,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5
)

Before we proceed with the generate_alpha_boxplot_long() function, it's important to note that when dealing with multiple time points, a boxplot might not provide the most effective visualization. The boxplot can become cluttered and harder to interpret with the addition of more time points. In such cases, a spaghetti plot, as generated by the generate_alpha_spaghettiplot_long() function, is often a better choice as it can more clearly illustrate the individual changes over time.

However, if you still wish to use a boxplot for visualizing alpha diversity over time, here's how you can do it with the generate_alpha_boxplot_long() function:

# Render a boxplot encapsulating alpha diversity across chosen time points
generate_alpha_boxplot_long(
  data.obj = subset_T2D.obj,
  alpha.obj = NULL,
  alpha.name = c("shannon"),
  depth = NULL,
  subject.var = "subject_id",
  time.var = "visit_number_num",
  t0.level = sort(unique(subset_T2D.obj$meta.dat$visit_number_num))[1],
  ts.levels = sort(unique(subset_T2D.obj$meta.dat$visit_number_num))[2:4],
  group.var = "subject_race",
  strata.var = NULL,
  base.size = 20,
  theme.choice = "bw",
  palette = NULL,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 20,
  pdf.hei = 8.5
)

Last updated