Last updated: 2022-12-13

Checks: 6 1

Knit directory: esoph-micro-cancer-workflow/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200916) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version be3dfad. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.Rhistory
    Ignored:    data.zip
    Ignored:    data/
    Ignored:    output/Supplement Figure 2 (2).zip
    Ignored:    output/Updated Heatmaps.zip
    Ignored:    output/updated-figures-2022-12-12.zip
    Ignored:    output/updated_figures_2022_12_08.zip

Untracked files:
    Untracked:  output/nci-2022-12-12.pdf
    Untracked:  output/nci-2022-12-12.png
    Untracked:  output/slide-10-heatmap-2022-12-12.pdf
    Untracked:  output/slide-10-heatmap-2022-12-12.png
    Untracked:  output/slide-11-bar-chart-2022-12-12.pdf
    Untracked:  output/slide-11-bar-chart-2022-12-12.png
    Untracked:  output/slide-12-bar-chart-2022-12-12.pdf
    Untracked:  output/slide-12-bar-chart-2022-12-12.png
    Untracked:  output/slide-13-bar-chart-2022-12-12.pdf
    Untracked:  output/slide-13-bar-chart-2022-12-12.png
    Untracked:  output/slide-5-heatmap-001-2022-12-10.pdf
    Untracked:  output/slide-5-heatmap-001-2022-12-10.png
    Untracked:  output/slide-5-heatmap-001-2022-12-12.pdf
    Untracked:  output/slide-5-heatmap-001-2022-12-12.png
    Untracked:  output/slide-5-heatmap-01-2022-12-12.pdf
    Untracked:  output/slide-5-heatmap-01-2022-12-12.png
    Untracked:  output/slide-5-heatmap-05-2022-12-12.pdf
    Untracked:  output/slide-5-heatmap-05-2022-12-12.png
    Untracked:  output/slide-6-heatmap-001-2022-12-12.pdf
    Untracked:  output/slide-6-heatmap-001.-2022-12-12.png
    Untracked:  output/slide-6-heatmap-01-2022-12-12.pdf
    Untracked:  output/slide-6-heatmap-01-2022-12-12.png
    Untracked:  output/slide-6-heatmap-05-2022-12-12.pdf
    Untracked:  output/slide-6-heatmap-05-2022-12-12.png
    Untracked:  output/slide-7-heatmap-001-2022-12-12.pdf
    Untracked:  output/slide-7-heatmap-001-2022-12-12.png
    Untracked:  output/slide-7-heatmap-01-2022-12-12.pdf
    Untracked:  output/slide-7-heatmap-01-2022-12-12.png
    Untracked:  output/slide-7-heatmap-05-2022-12-12.pdf
    Untracked:  output/slide-7-heatmap-05-2022-12-12.png
    Untracked:  output/slide-8-heatmap-2022-12-12.pdf
    Untracked:  output/slide-8-heatmap-2022-12-12.png
    Untracked:  output/slide-9-heatmap-2022-12-12.pdf
    Untracked:  output/slide-9-heatmap-2022-12-12.png
    Untracked:  output/tcga-rna-001-2022-12-12.pdf
    Untracked:  output/tcga-rna-001.-2022-12-12.png
    Untracked:  output/tcga-rna-2022-12-12.pdf
    Untracked:  output/tcga-rna-2022-12-12.png
    Untracked:  output/updated-figures-2022-12-12/
    Untracked:  output/updated_figures_2022_12_08/

Unstaged changes:
    Modified:   analysis/slide-figures-heatmaps-dendrograms-updates.Rmd
    Modified:   output/slide-6-heatmap-001-2022-12-08.pdf
    Modified:   output/slide-6-heatmap-001.-2022-12-08.png

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/slide-figures-heatmaps-dendrograms-updates.Rmd) and HTML (docs/slide-figures-heatmaps-dendrograms-updates.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd be3dfad noah-padgett 2022-12-08 updated with new color scheme
html be3dfad noah-padgett 2022-12-08 updated with new color scheme
Rmd b1264da noah-padgett 2022-09-07 revised figures for publication
Rmd 9e1d57c noah-padgett 2022-08-19 updated and corrected heatmap color and labels
html 9e1d57c noah-padgett 2022-08-19 updated and corrected heatmap color and labels
Rmd b9a498d noah-padgett 2021-01-26 fixed issue with subsetting
html b9a498d noah-padgett 2021-01-26 fixed issue with subsetting

Intro

This page contains the updated code for generating the joint figures of heatmaps with the dendrogram. The update was needed to fix how the OTUs were subset based on average relative abundance. Prior to ESCAh heatmap will be a table of the OTUs that meet the given criteria.

I figured out that I originally subset based on the OTU relative abundance for ESCAh individual and sample, meaning that I subset according the just the raw Abundance irrespective of any OTU average abundance. This mistake is corrected in this document.

Heatmaps and Dendrograms

Slide 5 - NCI 16S Data

get counts

Barretts, blood normal, non-tumor, tumor, tumor w/ barretts

nci_sample_data <- dat.16s.s %>%
  filter(OTU == "Fusobacterium nucleatum")
  
table(nci_sample_data$tumor)

 0  1 
93 65 
table(nci_sample_data$tissue)

BO  N  T 
 5 88 65 
table(nci_sample_data$Barretts.)

 N  Y 
71 87 
table(nci_sample_data$tumor, nci_sample_data$Barretts.)
   
     N  Y
  0 42 51
  1 29 36
nrow(nci_sample_data)
[1] 158

Relative Abudance Cutoff: 0.05

analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")
AverageRelativeAbundance OTU
0.272 Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis
0.065 Pseudomonas_rhodesiae
0.054 Prevotella_melaninogenica
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(accession.number),
                Genus = substr(Genus, 4, 1000),
                Phylum = substr(Phylum, 4, 1000)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)

# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2(
    "Abundance",
    trans="sqrt",
    high=myColor[1],
    mid = myColor[50],
    low=myColor[100]
    , midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "BO", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-05-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/nci-05-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.01

analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.01 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="400px")
AverageRelativeAbundance OTU
0.272 Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis
0.065 Pseudomonas_rhodesiae
0.054 Prevotella_melaninogenica
0.049 Lactobacillus_gasseri:Lactobacillus_johnsonii
0.041 Veillonella_dispar
0.038 Stenotrophomonas_maltophilia
0.035 Fusobacterium_nucleatum
0.031 Acinetobacter_guillouiae
0.025 Granulicatella_adiacens:Granulicatella_paraadiacens
0.023 Rothia_mucilaginosa
0.022 Staphylococcus_epidermidis:Staphylococcus_hominis
0.020 Gemella_haemolysans
0.018 Haemophilus_parainfluenzae
0.016 Selenomonas_sputigena
0.014 Lachnoanaerobaculum_orale
0.014 otu19913:Actinobacillus_minor:Actinobacillus_porcinus:Actinobacillus_rossii:Haemophilus_paraphrohaemolyticus
0.011 Actinomyces_odontolyticus
0.010 Neisseria_flavescens
0.010 otu16698:Tannerella_forsythia
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(accession.number),
                Genus = substr(Genus, 4, 1000),
                Phylum = substr(Phylum, 4, 1000)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)

# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "BO", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 4: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap4 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())

# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+plt_hmap4+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-01-",save.Date,".pdf"), plot=p, units="in", width=11, height=7)
  ggsave(paste0("output/nci-01-",save.Date,".png"), plot=p, units="in", width=11, height=7)
}

Relative Abudance Cutoff: 0.001

analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="400px")
AverageRelativeAbundance OTU
0.272 Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis
0.065 Pseudomonas_rhodesiae
0.054 Prevotella_melaninogenica
0.049 Lactobacillus_gasseri:Lactobacillus_johnsonii
0.041 Veillonella_dispar
0.038 Stenotrophomonas_maltophilia
0.035 Fusobacterium_nucleatum
0.031 Acinetobacter_guillouiae
0.025 Granulicatella_adiacens:Granulicatella_paraadiacens
0.023 Rothia_mucilaginosa
0.022 Staphylococcus_epidermidis:Staphylococcus_hominis
0.020 Gemella_haemolysans
0.018 Haemophilus_parainfluenzae
0.016 Selenomonas_sputigena
0.014 Lachnoanaerobaculum_orale
0.014 otu19913:Actinobacillus_minor:Actinobacillus_porcinus:Actinobacillus_rossii:Haemophilus_paraphrohaemolyticus
0.011 Actinomyces_odontolyticus
0.010 Neisseria_flavescens
0.010 otu16698:Tannerella_forsythia
0.009 Sphingomonas_pituitosa
0.008 Clostridium_perfringens:Clostridium_thermophilus
0.008 Bacteroides_fragilis
0.008 Atopostipes_suicloacalis
0.008 Acidovorax_temperans
0.008 otu4906:Bacillus_oleronius
0.007 Leptotrichia_wadei
0.007 Corynebacterium_kroppenstedtii
0.006 TM7_phylum
0.005 Serratia_marcescens:Serratia_nematodiphila
0.005 Campylobacter_rectus:Campylobacter_showae
0.005 Geobacillus_stearothermophilus
0.005 Sarcina_maxima
0.004 Actinobacillus_rossii
0.004 Lysinibacillus_fusiformis:Lysinibacillus_mangiferahumi:Lysinibacillus_sphaericus
0.004 Cloacibacterium_normanense
0.004 Afipia_broomeae:Bradyrhizobium_elkanii:Bradyrhizobium_genosp:Bradyrhizobium_japonicum:Bradyrhizobium_jicamae:Bradyrhizobium_lablabi:Bradyrhizobium_pachyrhizi:Bradyrhizobium_retamae
0.003 Brucella_abortus:Brucella_canis:Brucella_ceti:Brucella_inopinata:Brucella_melitensis:Brucella_microti:Brucella_ovis:Brucella_pinnipedialis:Brucella_suis:Ochrobactrum_anthropi:Ochrobactrum_cytisi:Ochrobactrum_grignonense:Ochrobactrum_haematophilum:Ochrobactrum_lupini:Ochrobactrum_pecoris:Ochrobactrum_thiophenivorans:Ochrobactrum_tritici
0.003 Bifidobacterium_longum
0.003 Aggregatibacter_segnis
0.003 Brevundimonas_diminuta:Brevundimonas_naejangsanensis:Brevundimonas_vancanneytii:Nitrobacteria_hamadaniensis:Nitrobacteria_iranicum
0.003 Proteus_mirabilis:Proteus_penneri
0.003 Capnocytophaga_sputigena
0.003 Stomatobaculum_longum
0.003 Kocuria_kristinae
0.003 otu17927:Alloprevotella_rava
0.003 Parvimonas_micra
0.002 Megasphaera_micronuciformis
0.002 Oribacterium_sinus
0.002 Helicobacter_pylori
0.002 Faecalibacterium_prausnitzii
0.002 Enhydrobacter_aerosaccus
0.002 Castellaniella_denitrificans
0.002 otu21781:Centipeda_periodontii
0.002 Escherichia_coli:Escherichia_fergusonii
0.002 Paracoccus_aestuarii:Paracoccus_beibuensis:Paracoccus_marinus
0.002 Aquabacterium_commune
0.002 Citrobacter_freundii
0.002 Micrococcus_alkanovora:Micrococcus_antarcticus:Micrococcus_endophyticus:Micrococcus_indicus:Micrococcus_luteus:Micrococcus_yunnanensis
0.002 Lachnoanaerobaculum_umeaense
0.002 otu34876:Clostridium_indolis:Fusicatenibacter_saccharivorans
0.002 Lactococcus_lactis
0.002 Sutterella_wadsworthensis
0.001 Porphyromonas_endodontalis
0.001 Dialister_pneumosintes
0.001 Chryseobacterium_proteolyticum
0.001 Zimmermannella_faecalis
0.001 Caldimonas_hydrothermale:Caldimonas_manganoxidans:Caldimonas_taiwanensis
0.001 Massilia_lurida
0.001 Shinella_kummerowiae:Shinella_zoogloeoides
0.001 Rubellimicrobium_mesophilum
0.001 Blautia_wexlerae
0.001 Abiotrophia_defectiva
0.001 Mycobacterium_diernhoferi:Mycobacterium_fluoranthenivorans:Mycobacterium_frederiksbergense:Mycobacterium_hackensackense:Mycobacterium_sacrum
0.001 Enterobacter_aerogenes
0.001 Psychromonas_arctica
0.001 otu33966:Pseudolabrys_taiwanensis
0.001 Microvirga_guangxiensis
0.001 Tepidimonas_fonticaldi
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(accession.number),
                Genus = substr(Genus, 4, 1000),
                Phylum = substr(Phylum, 4, 1000)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)

# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "BO", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=10)
  ggsave(paste0("output/nci-001-",save.Date,".png"), plot=p, units="in", width=11, height=10)
}
# melt data down for use
dat.16s <- psmelt(phylo.data.nci.umd)

# set up variables
dat.16s$sample_type <- 0
dat.16s$sample_type[dat.16s$tissue=="T" &
                      dat.16s$Histology=="ADC"] <- "ESCA tissues"
dat.16s$sample_type[dat.16s$tissue=="N" &
                      dat.16s$Histology=="ADC"] <- "ESCA-adjacent tissue"
dat.16s$sample_type[dat.16s$tissue=="BO" &
                      dat.16s$Histology=="Barrets only"] <-"Barretts Only"

# make tumor vs normal variable
dat.16s$tumor.cat <- factor(dat.16s$tissue, levels=c("BO", "N", "T"), labels = c("Non-Tumor", "Non-Tumor", "Tumor"))

# dataset id
dat.16s$source <- "16s"

# plotting ids
dat.16s$X <- paste0(dat.16s$source, "-", dat.16s$tumor.cat)

# relabel as (0/1) for analysis
dat.16s$tumor <- as.numeric(factor(dat.16s$tissue, levels=c("BO", "N", "T"), labels = c("Non-Tumor", "Non-Tumor", "Tumor"))) - 1

# presence- absence
dat.16s$pres <- ifelse(dat.16s$Abundance > 0, 1, 0)
dat.16s$pres[is.na(dat.16s$pres)] <- 0

# subset to fuso. nuc. only
# Streptococcus sanguinis 
# Campylobacter concisus
# Prevotella spp.

dat.16s.s <- filter(
  dat.16s,
  OTU %in% c(
    "Fusobacterium_nucleatum",
    unique(dat.16s$OTU[dat.16s$OTU %like% "Streptococcus_"]),
    unique(dat.16s$OTU[dat.16s$OTU %like% "Campylobacter_"]),
    "Prevotella_melaninogenica")
)

# new names
dat.16s.s$OTU1 <- factor(
  dat.16s.s$OTU,
  levels = c(
    "Fusobacterium_nucleatum",
    "Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis",
    "Campylobacter_rectus:Campylobacter_showae",
    "Prevotella_melaninogenica"
  ),
  labels = c(
    "Fusobacterium nucleatum",
    "Streptococcus sanguinis",
    "Campylobacter concisus",
    "Prevotella melaninogenica"
  )
)

# rename bacteria
dat.16s.s$OTU <- factor(
  dat.16s.s$OTU,
  levels = c(
    "Fusobacterium_nucleatum",
    "Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis",
    "Campylobacter_rectus:Campylobacter_showae",
    "Prevotella_melaninogenica"
  ),
  labels = c(
    "Fusobacterium nucleatum",
    "Streptococcus spp.",
    "Campylobacter concisus",
    "Prevotella melaninogenica"
  )
)

analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="400px")
AverageRelativeAbundance OTU
0.289 Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis
0.057 Pseudomonas_rhodesiae
0.057 Prevotella_melaninogenica
0.043 Lactobacillus_gasseri:Lactobacillus_johnsonii
0.038 Veillonella_dispar
0.035 Fusobacterium_nucleatum
0.034 Stenotrophomonas_maltophilia
0.031 Acinetobacter_guillouiae
0.024 Rothia_mucilaginosa
0.023 Haemophilus_parainfluenzae
0.023 Gemella_haemolysans
0.022 Granulicatella_adiacens:Granulicatella_paraadiacens
0.021 Staphylococcus_epidermidis:Staphylococcus_hominis
0.013 Selenomonas_sputigena
0.013 Lachnoanaerobaculum_orale
0.012 otu19913:Actinobacillus_minor:Actinobacillus_porcinus:Actinobacillus_rossii:Haemophilus_paraphrohaemolyticus
0.011 Neisseria_flavescens
0.011 Lysinibacillus_fusiformis:Lysinibacillus_mangiferahumi:Lysinibacillus_sphaericus
0.011 Actinomyces_odontolyticus
0.009 otu4906:Bacillus_oleronius
0.009 otu16698:Tannerella_forsythia
0.008 Bacteroides_fragilis
0.008 Sphingomonas_pituitosa
0.007 Atopostipes_suicloacalis
0.007 Clostridium_perfringens:Clostridium_thermophilus
0.006 Acidovorax_temperans
0.006 Leptotrichia_wadei
0.006 Brucella_abortus:Brucella_canis:Brucella_ceti:Brucella_inopinata:Brucella_melitensis:Brucella_microti:Brucella_ovis:Brucella_pinnipedialis:Brucella_suis:Ochrobactrum_anthropi:Ochrobactrum_cytisi:Ochrobactrum_grignonense:Ochrobactrum_haematophilum:Ochrobactrum_lupini:Ochrobactrum_pecoris:Ochrobactrum_thiophenivorans:Ochrobactrum_tritici
0.006 Corynebacterium_kroppenstedtii
0.005 TM7_phylum
0.005 Geobacillus_stearothermophilus
0.005 Serratia_marcescens:Serratia_nematodiphila
0.005 Helicobacter_pylori
0.005 Campylobacter_rectus:Campylobacter_showae
0.004 Actinobacillus_rossii
0.004 otu17927:Alloprevotella_rava
0.004 otu17904:Deinococcus_aetherius
0.004 Sarcina_maxima
0.003 Cloacibacterium_normanense
0.003 Parvimonas_micra
0.003 Bifidobacterium_longum
0.003 Afipia_broomeae:Bradyrhizobium_elkanii:Bradyrhizobium_genosp:Bradyrhizobium_japonicum:Bradyrhizobium_jicamae:Bradyrhizobium_lablabi:Bradyrhizobium_pachyrhizi:Bradyrhizobium_retamae
0.003 Aggregatibacter_segnis
0.003 Proteus_mirabilis:Proteus_penneri
0.003 Oribacterium_sinus
0.003 Stomatobaculum_longum
0.003 Capnocytophaga_sputigena
0.003 Brevundimonas_diminuta:Brevundimonas_naejangsanensis:Brevundimonas_vancanneytii:Nitrobacteria_hamadaniensis:Nitrobacteria_iranicum
0.002 Collinsella_aerofaciens
0.002 Kocuria_kristinae
0.002 Megasphaera_micronuciformis
0.002 Castellaniella_denitrificans
0.002 Faecalibacterium_prausnitzii
0.002 otu21781:Centipeda_periodontii
0.002 Peptoniphilus_lacrimalis
0.002 Lactococcus_lactis
0.002 Sutterella_wadsworthensis
0.002 Enhydrobacter_aerosaccus
0.002 Escherichia_coli:Escherichia_fergusonii
0.002 Lachnoanaerobaculum_umeaense
0.002 Micrococcus_alkanovora:Micrococcus_antarcticus:Micrococcus_endophyticus:Micrococcus_indicus:Micrococcus_luteus:Micrococcus_yunnanensis
0.001 Dialister_pneumosintes
0.001 Paracoccus_aestuarii:Paracoccus_beibuensis:Paracoccus_marinus
0.001 Aquabacterium_commune
0.001 Citrobacter_freundii
0.001 Rubellimicrobium_mesophilum
0.001 Chryseobacterium_proteolyticum
0.001 Mycobacterium_diernhoferi:Mycobacterium_fluoranthenivorans:Mycobacterium_frederiksbergense:Mycobacterium_hackensackense:Mycobacterium_sacrum
0.001 otu34876:Clostridium_indolis:Fusicatenibacter_saccharivorans
0.001 Porphyromonas_endodontalis
0.001 otu19808:Gemmiger_formicilis
0.001 Solobacterium_moorei
0.001 Massilia_lurida
0.001 Zimmermannella_faecalis
0.001 Caldimonas_hydrothermale:Caldimonas_manganoxidans:Caldimonas_taiwanensis
0.001 Abiotrophia_defectiva
0.001 Shinella_kummerowiae:Shinella_zoogloeoides
0.001 Enterobacter_aerogenes
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(accession.number),
                Genus = substr(Genus, 4, 1000),
                Phylum = substr(Phylum, 4, 1000)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)

# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "BO", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj.", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-all-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=10)
  ggsave(paste0("output/nci-all-001-",save.Date,".png"), plot=p, units="in", width=11, height=10)
}

Slide 6 - TCGA RNAseq Data

The heatmaps for slide 6 focus on the TCGA RNAseq data.

get counts

Barretts, blood normal, non-tumor, tumor, tumor w/ barretts

tcga_rna_sample_data <- dat.rna.s %>%
  filter(OTU == "Fusobacterium nucleatum")
  
table(tcga_rna_sample_data$tumor)

  0   1 
 11 162 
table(tcga_rna_sample_data$tissue)

C15.1 C15.3 C15.4 C15.5 C15.9 C16.0 
    1     5    35   122     8     2 
table(tcga_rna_sample_data$RNAseq_Solid_Tissue_Normal)

  0   1 
154  19 
table(tcga_rna_sample_data$Barrett.s.Esophagus.Reported)

           No Not Available           Yes 
          113            31            29 
table(tcga_rna_sample_data$tumor, tcga_rna_sample_data$Barrett.s.Esophagus.Reported)
   
     No Not Available Yes
  0   7             1   3
  1 106            30  26
nrow(tcga_rna_sample_data)
[1] 173

Relative Abudance Cutoff: 0.05

analysis.dat <- dat.rna # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")
AverageRelativeAbundance otu2
0.426 Pseudomonas fluorescens group
0.320 Pseudomonas sp. UW4
0.075 Arthrobacter phenanthrenivorans
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA RNAseq Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-rnaseq-05-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-rnaseq-05-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.01

analysis.dat <- dat.rna # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.01 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")
AverageRelativeAbundance otu2
0.426 Pseudomonas fluorescens group
0.320 Pseudomonas sp. UW4
0.075 Arthrobacter phenanthrenivorans
0.041 Pseudomonas sp. UK4
0.029 Bradyrhizobium sp. BTAi1
0.027 Pseudomonas putida group
0.011 Bacillus cereus group
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA RNAseq Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-rnaseq-01-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-rnaseq-01-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.001

analysis.dat <- dat.rna # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")
AverageRelativeAbundance otu2
0.426 Pseudomonas fluorescens group
0.320 Pseudomonas sp. UW4
0.075 Arthrobacter phenanthrenivorans
0.041 Pseudomonas sp. UK4
0.029 Bradyrhizobium sp. BTAi1
0.027 Pseudomonas putida group
0.011 Bacillus cereus group
0.009 Propionibacterium acnes
0.006 Escherichia coli
0.004 Arthrobacter sp. FB24
0.004 Bacillus megaterium
0.003 Haemophilus influenzae
0.003 Pseudomonas aeruginosa group
0.003 Pseudomonas brassicacearum
0.002 Enterobacter cloacae complex
0.001 Psychrobacter sp. PRwf-1
0.001 Enterococcus faecalis
0.001 Arthrobacter chlorophenolicus
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.10, breaks=c(0, 0.05, 0.10, 0.15, 0.20)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.10, breaks=c(0, 0.05, 0.10, 0.15, 0.20)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.10, breaks=c(0, 0.05, 0.10, 0.15, 0.20)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA RNAseq Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-rnaseq-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-rnaseq-001.-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Slide 7 - TCGA WGS Data

The heatmaps for slide76 focus on the TCGA WGS data.

get counts

Barretts, blood normal, non-tumor, tumor, tumor w/ barretts

tcga_wgs_sample_data <- dat.wgs.s %>%
  filter(OTU == "Fusobacterium nucleatum")
  
table(tcga_wgs_sample_data$tumor)

 0  1 
70 69 
table(tcga_wgs_sample_data$tissue)

C15.3 C15.4 C15.5 C15.9 
    2    28    72     9 
table(tcga_wgs_sample_data$WGS_Blood_Normal)

 0  1  2 
44 76 19 
table(tcga_wgs_sample_data$WGS_Solid_Tissue_Normal)

 0  1  2 
90 42  7 
table(tcga_wgs_sample_data$Barrett.s.Esophagus.Reported)

           No Not Available           Yes 
           54            47            10 
table(tcga_wgs_sample_data$tumor, tcga_wgs_sample_data$Barrett.s.Esophagus.Reported)
   
    No Not Available Yes
  0 26            24   6
  1 28            23   4
nrow(tcga_wgs_sample_data)
[1] 139

Relative Abudance Cutoff: 0.05

analysis.dat <- dat.wgs # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")
AverageRelativeAbundance otu2
0.465 Coprobacillus sp. D7
0.106 Propionibacterium acnes
0.074 Haemophilus influenzae
0.057 Prevotella melaninogenica
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA WGS Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-wgs-05-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-wgs-05-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.01

analysis.dat <- dat.wgs # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.01 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")
AverageRelativeAbundance otu2
0.465 Coprobacillus sp. D7
0.106 Propionibacterium acnes
0.074 Haemophilus influenzae
0.057 Prevotella melaninogenica
0.028 Cyanothece sp. CCY0110
0.027 Fusobacterium nucleatum
0.022 Campylobacter concisus
0.019 Bradyrhizobium sp. BTAi1
0.016 Bradyrhizobium diazoefficiens
0.014 Streptococcus pneumoniae
0.012 Bradyrhizobium japonicum
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100],midpoint = 0.40,  breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA WGS Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-wgs-01-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-wgs-01-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.001

analysis.dat <- dat.wgs # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")
AverageRelativeAbundance otu2
0.465 Coprobacillus sp. D7
0.106 Propionibacterium acnes
0.074 Haemophilus influenzae
0.057 Prevotella melaninogenica
0.028 Cyanothece sp. CCY0110
0.027 Fusobacterium nucleatum
0.022 Campylobacter concisus
0.019 Bradyrhizobium sp. BTAi1
0.016 Bradyrhizobium diazoefficiens
0.014 Streptococcus pneumoniae
0.012 Bradyrhizobium japonicum
0.010 Bradyrhizobium sp. S23321
0.008 candidate division TM7 single-cell isolate TM7a
0.008 Veillonella parvula
0.008 Streptococcus parasanguinis
0.007 Rhodopseudomonas palustris
0.006 Pseudomonas fluorescens group
0.006 Bradyrhizobium sp. ORS 278
0.006 Streptococcus mitis
0.005 Beggiatoa sp. PS
0.004 Staphylococcus epidermidis
0.004 Corynebacterium tuberculostearicum
0.004 Streptococcus suis
0.003 Haemophilus parainfluenzae
0.003 Streptococcus pseudopneumoniae
0.003 Delftia sp. Cs1-4
0.003 Lactobacillus gasseri
0.003 [Eubacterium] hallii
0.002 Delftia acidovorans
0.002 Streptococcus oralis
0.002 Pseudomonas putida group
0.002 Streptococcus thermophilus
0.002 Pseudomonas aeruginosa group
0.002 Atopobium rimae
0.002 Prevotella intermedia
0.001 candidate division TM7 single-cell isolate TM7c
0.001 Acidovorax delafieldii
0.001 Staphylococcus capitis
0.001 Escherichia coli
0.001 Oligotropha carboxidovorans
0.001 Acinetobacter junii
0.001 Acidovorax ebreus
0.001 Lactococcus garvieae
0.001 Gemmata obscuriglobus
0.001 Lactobacillus crispatus
0.001 Streptococcus gordonii
0.001 Xanthomonas campestris
0.001 Bacteroides sp. 3_1_33FAA
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA WGS Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-wgs-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-wgs-001-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Slide 8 - NCI 16S Data with Specific OTUs

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ plt_hmap4+
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual"
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/nci-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Slide 9 - TCGA RNAseq Data with Specific OTUs

The heatmaps for slide 9 focus on the TCGA RNAseq data. The difference here is we focus on 4 genera.

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA RNAseq Data showing average relative abundance of genera by individual"
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-rna-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-rna-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Slide 10 - TCGA WGS Data with Specific OTUs

The heatmaps for slide76 focus on the TCGA WGS data.

analysis.dat <- dat.wgs.s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")
AverageRelativeAbundance otu2
0.022 Campylobacter concisus
0.027 Fusobacterium nucleatum
0.057 Prevotella melaninogenica
0.006 Streptococcus mitis
0.002 Streptococcus oralis
0.008 Streptococcus parasanguinis
0.014 Streptococcus pneumoniae
0.001 Streptococcus salivarius
0.001 Streptococcus sanguinis
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA WGS Data showing average relative abundance of genera by individual"
  )
p

if(save.plots == T){
  ggsave(paste0("output/slide-10-heatmap-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/slide-10-heatmap-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Stacked Bar Charts

Slide 11 - NCI 16s Data

plot.dat <- dat.16s.s %>% filter(sample_type != "0") %>%
  mutate(ID = as.factor(accession.number),
         Genus = substr(Genus, 4, 1000),
         Phylum = substr(Phylum, 4, 1000))%>%
  dplyr::group_by(sample_type, OTU1)%>%
  dplyr::summarise(
    Abundance = mean(Abundance, na.rm=T)
  )
  
p1 <- ggplot(plot.dat, aes(x=sample_type, y = Abundance, fill=OTU1)) +
  geom_bar(stat="identity")+
  labs(title="NCI 16s Data",
       x = "Tissue Group",
       y="Average Relative Abundance") +
  theme_classic()+
  theme(
    axis.text.x = element_text(angle=30, vjust=0.95, hjust=0.9),
    legend.title = element_blank()
  )
p1

if(save.plots == T){
  ggsave(paste0("output/slide-11-bar-chart-",save.Date,".pdf"), plot=p1, units="in", width=8, height=5)
  ggsave(paste0("output/slide-11-bar-chart-",save.Date,".png"), plot=p1, units="in", width=8, height=5)
}

Slide 12 - TCGA RNAseq Data

plot.dat <- dat.rna.s %>% filter(sample_type != "0", is.na(OTU1) == F)%>%
  dplyr::group_by(sample_type, OTU1)%>%
  dplyr::summarise(
    Abundance = mean(Abundance, na.rm=T)
  )
  
p1 <- ggplot(plot.dat, aes(x=sample_type, y = Abundance, fill=OTU1)) +
  geom_bar(stat="identity")+
  labs(title="TCGA RNAseq Data",
       x = "Tissue Group",
       y="Average Relative Abundance") +
  theme_classic() +
  theme(
    axis.text.x = element_text(angle=30, vjust=0.95, hjust=0.9),
    legend.title = element_blank()
  )
p1

if(save.plots == T){
  ggsave(paste0("output/slide-12-bar-chart-",save.Date,".pdf"), plot=p1, units="in", width=8, height=5)
  ggsave(paste0("output/slide-12-bar-chart-",save.Date,".png"), plot=p1, units="in", width=8, height=5)
}

Slide 13 - TCGA WGS Data

plot.dat <- dat.wgs.s %>% filter(sample_type != "0", is.na(OTU1) == F)%>%
  dplyr::group_by(sample_type, OTU1)%>%
  dplyr::summarise(
    Abundance = mean(Abundance, na.rm=T)
  )
  
p1 <- ggplot(plot.dat, aes(x=sample_type, y = Abundance, fill=OTU1)) +
  geom_bar(stat="identity")+
  labs(title="TCGA WGS Data",
       x = "Tissue Group",
       y="Average Relative Abundance") +
  scale_y_continuous(trans="sqrt", breaks=c(0,0.01, 0.1, 0.2, 0.4, 0.6))+
  theme_classic() +
  theme(
    axis.text.x = element_text(angle=30, vjust=0.95, hjust=0.9),
    legend.title = element_blank()
  )
p1

if(save.plots == T){
  ggsave(paste0("output/slide-13-bar-chart-",save.Date,".pdf"), plot=p1, units="in", width=8, height=5)
  ggsave(paste0("output/slide-13-bar-chart-",save.Date,".png"), plot=p1, units="in", width=8, height=5)
}

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.1     dendextend_1.16.0 ggdendro_0.1.23   reshape2_1.4.4   
 [5] car_3.1-0         carData_3.0-5     gvlma_1.0.0.3     patchwork_1.1.1  
 [9] viridis_0.6.2     viridisLite_0.4.0 gridExtra_2.3     xtable_1.8-4     
[13] kableExtra_1.3.4  MASS_7.3-58.1     data.table_1.14.2 readxl_1.4.0     
[17] forcats_0.5.1     stringr_1.4.0     dplyr_1.0.9       purrr_0.3.4      
[21] readr_2.1.2       tidyr_1.2.0       tibble_3.1.7      ggplot2_3.3.6    
[25] tidyverse_1.3.2   lmerTest_3.1-3    lme4_1.1-30       Matrix_1.5-1     
[29] vegan_2.6-2       lattice_0.20-45   permute_0.9-7     phyloseq_1.40.0  

loaded via a namespace (and not attached):
  [1] googledrive_2.0.0      minqa_1.2.4            colorspace_2.0-3      
  [4] ellipsis_0.3.2         rprojroot_2.0.3        XVector_0.36.0        
  [7] fs_1.5.2               rstudioapi_0.13        farver_2.1.1          
 [10] fansi_1.0.3            lubridate_1.8.0        xml2_1.3.3            
 [13] codetools_0.2-18       splines_4.2.2          cachem_1.0.6          
 [16] knitr_1.39             ade4_1.7-19            jsonlite_1.8.0        
 [19] workflowr_1.7.0        nloptr_2.0.3           broom_1.0.0           
 [22] cluster_2.1.4          dbplyr_2.2.1           BiocManager_1.30.18   
 [25] compiler_4.2.2         httr_1.4.3             backports_1.4.1       
 [28] assertthat_0.2.1       fastmap_1.1.0          gargle_1.2.0          
 [31] cli_3.4.1              later_1.3.0            htmltools_0.5.2       
 [34] tools_4.2.2            igraph_1.3.4           gtable_0.3.0          
 [37] glue_1.6.2             GenomeInfoDbData_1.2.8 Rcpp_1.0.8.3          
 [40] Biobase_2.56.0         cellranger_1.1.0       jquerylib_0.1.4       
 [43] vctrs_0.4.1            Biostrings_2.64.0      rhdf5filters_1.8.0    
 [46] multtest_2.52.0        svglite_2.1.0          ape_5.6-2             
 [49] nlme_3.1-160           iterators_1.0.14       xfun_0.31             
 [52] rvest_1.0.2            lifecycle_1.0.1        googlesheets4_1.0.0   
 [55] zlibbioc_1.42.0        scales_1.2.0           ragg_1.2.4            
 [58] hms_1.1.1              promises_1.2.0.1       parallel_4.2.2        
 [61] biomformat_1.24.0      rhdf5_2.40.0           yaml_2.3.5            
 [64] sass_0.4.2             stringi_1.7.6          highr_0.9             
 [67] S4Vectors_0.34.0       foreach_1.5.2          BiocGenerics_0.42.0   
 [70] boot_1.3-28            GenomeInfoDb_1.32.2    systemfonts_1.0.4     
 [73] rlang_1.0.6            pkgconfig_2.0.3        bitops_1.0-7          
 [76] evaluate_0.15          Rhdf5lib_1.18.2        labeling_0.4.2        
 [79] tidyselect_1.1.2       plyr_1.8.7             magrittr_2.0.3        
 [82] R6_2.5.1               IRanges_2.30.0         generics_0.1.3        
 [85] DBI_1.1.3              pillar_1.8.0           haven_2.5.0           
 [88] whisker_0.4            withr_2.5.0            mgcv_1.8-41           
 [91] abind_1.4-5            survival_3.4-0         RCurl_1.98-1.8        
 [94] modelr_0.1.8           crayon_1.5.1           utf8_1.2.2            
 [97] tzdb_0.3.0             rmarkdown_2.14         grid_4.2.2            
[100] git2r_0.30.1           webshot_0.5.3          reprex_2.0.1          
[103] digest_0.6.29          httpuv_1.6.5           numDeriv_2016.8-1.1   
[106] textshaping_0.3.6      stats4_4.2.2           munsell_0.5.0         
[109] bslib_0.4.0