Last updated: 2022-12-13

Checks: 6 1

Knit directory: esoph-micro-cancer-workflow/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200916)

The command set.seed(20200916) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: be3dfad

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version be3dfad. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.Rhistory
    Ignored:    data.zip
    Ignored:    data/
    Ignored:    output/Supplement Figure 2 (2).zip
    Ignored:    output/Updated Heatmaps.zip
    Ignored:    output/updated-figures-2022-12-12.zip
    Ignored:    output/updated_figures_2022_12_08.zip

Untracked files:
    Untracked:  output/nci-2022-12-12.pdf
    Untracked:  output/nci-2022-12-12.png
    Untracked:  output/slide-10-heatmap-2022-12-12.pdf
    Untracked:  output/slide-10-heatmap-2022-12-12.png
    Untracked:  output/slide-11-bar-chart-2022-12-12.pdf
    Untracked:  output/slide-11-bar-chart-2022-12-12.png
    Untracked:  output/slide-12-bar-chart-2022-12-12.pdf
    Untracked:  output/slide-12-bar-chart-2022-12-12.png
    Untracked:  output/slide-13-bar-chart-2022-12-12.pdf
    Untracked:  output/slide-13-bar-chart-2022-12-12.png
    Untracked:  output/slide-5-heatmap-001-2022-12-10.pdf
    Untracked:  output/slide-5-heatmap-001-2022-12-10.png
    Untracked:  output/slide-5-heatmap-001-2022-12-12.pdf
    Untracked:  output/slide-5-heatmap-001-2022-12-12.png
    Untracked:  output/slide-5-heatmap-01-2022-12-12.pdf
    Untracked:  output/slide-5-heatmap-01-2022-12-12.png
    Untracked:  output/slide-5-heatmap-05-2022-12-12.pdf
    Untracked:  output/slide-5-heatmap-05-2022-12-12.png
    Untracked:  output/slide-6-heatmap-001-2022-12-12.pdf
    Untracked:  output/slide-6-heatmap-001.-2022-12-12.png
    Untracked:  output/slide-6-heatmap-01-2022-12-12.pdf
    Untracked:  output/slide-6-heatmap-01-2022-12-12.png
    Untracked:  output/slide-6-heatmap-05-2022-12-12.pdf
    Untracked:  output/slide-6-heatmap-05-2022-12-12.png
    Untracked:  output/slide-7-heatmap-001-2022-12-12.pdf
    Untracked:  output/slide-7-heatmap-001-2022-12-12.png
    Untracked:  output/slide-7-heatmap-01-2022-12-12.pdf
    Untracked:  output/slide-7-heatmap-01-2022-12-12.png
    Untracked:  output/slide-7-heatmap-05-2022-12-12.pdf
    Untracked:  output/slide-7-heatmap-05-2022-12-12.png
    Untracked:  output/slide-8-heatmap-2022-12-12.pdf
    Untracked:  output/slide-8-heatmap-2022-12-12.png
    Untracked:  output/slide-9-heatmap-2022-12-12.pdf
    Untracked:  output/slide-9-heatmap-2022-12-12.png
    Untracked:  output/tcga-rna-001-2022-12-12.pdf
    Untracked:  output/tcga-rna-001.-2022-12-12.png
    Untracked:  output/tcga-rna-2022-12-12.pdf
    Untracked:  output/tcga-rna-2022-12-12.png
    Untracked:  output/updated-figures-2022-12-12/
    Untracked:  output/updated_figures_2022_12_08/

Unstaged changes:
    Modified:   analysis/slide-figures-heatmaps-dendrograms-updates.Rmd
    Modified:   output/slide-6-heatmap-001-2022-12-08.pdf
    Modified:   output/slide-6-heatmap-001.-2022-12-08.png

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/slide-figures-heatmaps-dendrograms-updates.Rmd) and HTML (docs/slide-figures-heatmaps-dendrograms-updates.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	be3dfad	noah-padgett	2022-12-08	updated with new color scheme
html	be3dfad	noah-padgett	2022-12-08	updated with new color scheme
Rmd	b1264da	noah-padgett	2022-09-07	revised figures for publication
Rmd	9e1d57c	noah-padgett	2022-08-19	updated and corrected heatmap color and labels
html	9e1d57c	noah-padgett	2022-08-19	updated and corrected heatmap color and labels
Rmd	b9a498d	noah-padgett	2021-01-26	fixed issue with subsetting
html	b9a498d	noah-padgett	2021-01-26	fixed issue with subsetting

Intro

This page contains the updated code for generating the joint figures of heatmaps with the dendrogram. The update was needed to fix how the OTUs were subset based on average relative abundance. Prior to ESCAh heatmap will be a table of the OTUs that meet the given criteria.

I figured out that I originally subset based on the OTU relative abundance for ESCAh individual and sample, meaning that I subset according the just the raw Abundance irrespective of any OTU average abundance. This mistake is corrected in this document.

Heatmaps and Dendrograms

Slide 5 - NCI 16S Data

get counts

Barretts, blood normal, non-tumor, tumor, tumor w/ barretts

nci_sample_data <- dat.16s.s %>%
  filter(OTU == "Fusobacterium nucleatum")
  
table(nci_sample_data$tumor)


 0  1 
93 65

table(nci_sample_data$tissue)


BO  N  T 
 5 88 65

table(nci_sample_data$Barretts.)


 N  Y 
71 87

table(nci_sample_data$tumor, nci_sample_data$Barretts.)

nrow(nci_sample_data)

[1] 158

Relative Abudance Cutoff: 0.05

analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")

AverageRelativeAbundance	OTU
0.272	Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis
0.065	Pseudomonas_rhodesiae
0.054	Prevotella_melaninogenica

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(accession.number),
                Genus = substr(Genus, 4, 1000),
                Phylum = substr(Phylum, 4, 1000)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)

# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2(
    "Abundance",
    trans="sqrt",
    high=myColor[1],
    mid = myColor[50],
    low=myColor[100]
    , midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "BO", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-05-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/nci-05-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.01

analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.01 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="400px")

AverageRelativeAbundance	OTU
0.272	Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis
0.065	Pseudomonas_rhodesiae
0.054	Prevotella_melaninogenica
0.049	Lactobacillus_gasseri:Lactobacillus_johnsonii
0.041	Veillonella_dispar
0.038	Stenotrophomonas_maltophilia
0.035	Fusobacterium_nucleatum
0.031	Acinetobacter_guillouiae
0.025	Granulicatella_adiacens:Granulicatella_paraadiacens
0.023	Rothia_mucilaginosa
0.022	Staphylococcus_epidermidis:Staphylococcus_hominis
0.020	Gemella_haemolysans
0.018	Haemophilus_parainfluenzae
0.016	Selenomonas_sputigena
0.014	Lachnoanaerobaculum_orale
0.014	otu19913:Actinobacillus_minor:Actinobacillus_porcinus:Actinobacillus_rossii:Haemophilus_paraphrohaemolyticus
0.011	Actinomyces_odontolyticus
0.010	Neisseria_flavescens
0.010	otu16698:Tannerella_forsythia

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(accession.number),
                Genus = substr(Genus, 4, 1000),
                Phylum = substr(Phylum, 4, 1000)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)

# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "BO", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 4: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap4 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())

# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+plt_hmap4+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-01-",save.Date,".pdf"), plot=p, units="in", width=11, height=7)
  ggsave(paste0("output/nci-01-",save.Date,".png"), plot=p, units="in", width=11, height=7)
}

Relative Abudance Cutoff: 0.001

analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="400px")

AverageRelativeAbundance	OTU
0.272	Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis
0.065	Pseudomonas_rhodesiae
0.054	Prevotella_melaninogenica
0.049	Lactobacillus_gasseri:Lactobacillus_johnsonii
0.041	Veillonella_dispar
0.038	Stenotrophomonas_maltophilia
0.035	Fusobacterium_nucleatum
0.031	Acinetobacter_guillouiae
0.025	Granulicatella_adiacens:Granulicatella_paraadiacens
0.023	Rothia_mucilaginosa
0.022	Staphylococcus_epidermidis:Staphylococcus_hominis
0.020	Gemella_haemolysans
0.018	Haemophilus_parainfluenzae
0.016	Selenomonas_sputigena
0.014	Lachnoanaerobaculum_orale
0.014	otu19913:Actinobacillus_minor:Actinobacillus_porcinus:Actinobacillus_rossii:Haemophilus_paraphrohaemolyticus
0.011	Actinomyces_odontolyticus
0.010	Neisseria_flavescens
0.010	otu16698:Tannerella_forsythia
0.009	Sphingomonas_pituitosa
0.008	Clostridium_perfringens:Clostridium_thermophilus
0.008	Bacteroides_fragilis
0.008	Atopostipes_suicloacalis
0.008	Acidovorax_temperans
0.008	otu4906:Bacillus_oleronius
0.007	Leptotrichia_wadei
0.007	Corynebacterium_kroppenstedtii
0.006	TM7_phylum
0.005	Serratia_marcescens:Serratia_nematodiphila
0.005	Campylobacter_rectus:Campylobacter_showae
0.005	Geobacillus_stearothermophilus
0.005	Sarcina_maxima
0.004	Actinobacillus_rossii
0.004	Lysinibacillus_fusiformis:Lysinibacillus_mangiferahumi:Lysinibacillus_sphaericus
0.004	Cloacibacterium_normanense
0.004	Afipia_broomeae:Bradyrhizobium_elkanii:Bradyrhizobium_genosp:Bradyrhizobium_japonicum:Bradyrhizobium_jicamae:Bradyrhizobium_lablabi:Bradyrhizobium_pachyrhizi:Bradyrhizobium_retamae
0.003	Brucella_abortus:Brucella_canis:Brucella_ceti:Brucella_inopinata:Brucella_melitensis:Brucella_microti:Brucella_ovis:Brucella_pinnipedialis:Brucella_suis:Ochrobactrum_anthropi:Ochrobactrum_cytisi:Ochrobactrum_grignonense:Ochrobactrum_haematophilum:Ochrobactrum_lupini:Ochrobactrum_pecoris:Ochrobactrum_thiophenivorans:Ochrobactrum_tritici
0.003	Bifidobacterium_longum
0.003	Aggregatibacter_segnis
0.003	Brevundimonas_diminuta:Brevundimonas_naejangsanensis:Brevundimonas_vancanneytii:Nitrobacteria_hamadaniensis:Nitrobacteria_iranicum
0.003	Proteus_mirabilis:Proteus_penneri
0.003	Capnocytophaga_sputigena
0.003	Stomatobaculum_longum
0.003	Kocuria_kristinae
0.003	otu17927:Alloprevotella_rava
0.003	Parvimonas_micra
0.002	Megasphaera_micronuciformis
0.002	Oribacterium_sinus
0.002	Helicobacter_pylori
0.002	Faecalibacterium_prausnitzii
0.002	Enhydrobacter_aerosaccus
0.002	Castellaniella_denitrificans
0.002	otu21781:Centipeda_periodontii
0.002	Escherichia_coli:Escherichia_fergusonii
0.002	Paracoccus_aestuarii:Paracoccus_beibuensis:Paracoccus_marinus
0.002	Aquabacterium_commune
0.002	Citrobacter_freundii
0.002	Micrococcus_alkanovora:Micrococcus_antarcticus:Micrococcus_endophyticus:Micrococcus_indicus:Micrococcus_luteus:Micrococcus_yunnanensis
0.002	Lachnoanaerobaculum_umeaense
0.002	otu34876:Clostridium_indolis:Fusicatenibacter_saccharivorans
0.002	Lactococcus_lactis
0.002	Sutterella_wadsworthensis
0.001	Porphyromonas_endodontalis
0.001	Dialister_pneumosintes
0.001	Chryseobacterium_proteolyticum
0.001	Zimmermannella_faecalis
0.001	Caldimonas_hydrothermale:Caldimonas_manganoxidans:Caldimonas_taiwanensis
0.001	Massilia_lurida
0.001	Shinella_kummerowiae:Shinella_zoogloeoides
0.001	Rubellimicrobium_mesophilum
0.001	Blautia_wexlerae
0.001	Abiotrophia_defectiva
0.001	Mycobacterium_diernhoferi:Mycobacterium_fluoranthenivorans:Mycobacterium_frederiksbergense:Mycobacterium_hackensackense:Mycobacterium_sacrum
0.001	Enterobacter_aerogenes
0.001	Psychromonas_arctica
0.001	otu33966:Pseudolabrys_taiwanensis
0.001	Microvirga_guangxiensis
0.001	Tepidimonas_fonticaldi

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(accession.number),
                Genus = substr(Genus, 4, 1000),
                Phylum = substr(Phylum, 4, 1000)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)

# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "BO", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=10)
  ggsave(paste0("output/nci-001-",save.Date,".png"), plot=p, units="in", width=11, height=10)
}

# melt data down for use
dat.16s <- psmelt(phylo.data.nci.umd)

# set up variables
dat.16s$sample_type <- 0
dat.16s$sample_type[dat.16s$tissue=="T" &
                      dat.16s$Histology=="ADC"] <- "ESCA tissues"
dat.16s$sample_type[dat.16s$tissue=="N" &
                      dat.16s$Histology=="ADC"] <- "ESCA-adjacent tissue"
dat.16s$sample_type[dat.16s$tissue=="BO" &
                      dat.16s$Histology=="Barrets only"] <-"Barretts Only"

# make tumor vs normal variable
dat.16s$tumor.cat <- factor(dat.16s$tissue, levels=c("BO", "N", "T"), labels = c("Non-Tumor", "Non-Tumor", "Tumor"))

# dataset id
dat.16s$source <- "16s"

# plotting ids
dat.16s$X <- paste0(dat.16s$source, "-", dat.16s$tumor.cat)

# relabel as (0/1) for analysis
dat.16s$tumor <- as.numeric(factor(dat.16s$tissue, levels=c("BO", "N", "T"), labels = c("Non-Tumor", "Non-Tumor", "Tumor"))) - 1

# presence- absence
dat.16s$pres <- ifelse(dat.16s$Abundance > 0, 1, 0)
dat.16s$pres[is.na(dat.16s$pres)] <- 0

# subset to fuso. nuc. only
# Streptococcus sanguinis 
# Campylobacter concisus
# Prevotella spp.

dat.16s.s <- filter(
  dat.16s,
  OTU %in% c(
    "Fusobacterium_nucleatum",
    unique(dat.16s$OTU[dat.16s$OTU %like% "Streptococcus_"]),
    unique(dat.16s$OTU[dat.16s$OTU %like% "Campylobacter_"]),
    "Prevotella_melaninogenica")
)

# new names
dat.16s.s$OTU1 <- factor(
  dat.16s.s$OTU,
  levels = c(
    "Fusobacterium_nucleatum",
    "Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis",
    "Campylobacter_rectus:Campylobacter_showae",
    "Prevotella_melaninogenica"
  ),
  labels = c(
    "Fusobacterium nucleatum",
    "Streptococcus sanguinis",
    "Campylobacter concisus",
    "Prevotella melaninogenica"
  )
)

# rename bacteria
dat.16s.s$OTU <- factor(
  dat.16s.s$OTU,
  levels = c(
    "Fusobacterium_nucleatum",
    "Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis",
    "Campylobacter_rectus:Campylobacter_showae",
    "Prevotella_melaninogenica"
  ),
  labels = c(
    "Fusobacterium nucleatum",
    "Streptococcus spp.",
    "Campylobacter concisus",
    "Prevotella melaninogenica"
  )
)

analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="400px")

AverageRelativeAbundance	OTU
0.289	Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis
0.057	Pseudomonas_rhodesiae
0.057	Prevotella_melaninogenica
0.043	Lactobacillus_gasseri:Lactobacillus_johnsonii
0.038	Veillonella_dispar
0.035	Fusobacterium_nucleatum
0.034	Stenotrophomonas_maltophilia
0.031	Acinetobacter_guillouiae
0.024	Rothia_mucilaginosa
0.023	Haemophilus_parainfluenzae
0.023	Gemella_haemolysans
0.022	Granulicatella_adiacens:Granulicatella_paraadiacens
0.021	Staphylococcus_epidermidis:Staphylococcus_hominis
0.013	Selenomonas_sputigena
0.013	Lachnoanaerobaculum_orale
0.012	otu19913:Actinobacillus_minor:Actinobacillus_porcinus:Actinobacillus_rossii:Haemophilus_paraphrohaemolyticus
0.011	Neisseria_flavescens
0.011	Lysinibacillus_fusiformis:Lysinibacillus_mangiferahumi:Lysinibacillus_sphaericus
0.011	Actinomyces_odontolyticus
0.009	otu4906:Bacillus_oleronius
0.009	otu16698:Tannerella_forsythia
0.008	Bacteroides_fragilis
0.008	Sphingomonas_pituitosa
0.007	Atopostipes_suicloacalis
0.007	Clostridium_perfringens:Clostridium_thermophilus
0.006	Acidovorax_temperans
0.006	Leptotrichia_wadei
0.006	Brucella_abortus:Brucella_canis:Brucella_ceti:Brucella_inopinata:Brucella_melitensis:Brucella_microti:Brucella_ovis:Brucella_pinnipedialis:Brucella_suis:Ochrobactrum_anthropi:Ochrobactrum_cytisi:Ochrobactrum_grignonense:Ochrobactrum_haematophilum:Ochrobactrum_lupini:Ochrobactrum_pecoris:Ochrobactrum_thiophenivorans:Ochrobactrum_tritici
0.006	Corynebacterium_kroppenstedtii
0.005	TM7_phylum
0.005	Geobacillus_stearothermophilus
0.005	Serratia_marcescens:Serratia_nematodiphila
0.005	Helicobacter_pylori
0.005	Campylobacter_rectus:Campylobacter_showae
0.004	Actinobacillus_rossii
0.004	otu17927:Alloprevotella_rava
0.004	otu17904:Deinococcus_aetherius
0.004	Sarcina_maxima
0.003	Cloacibacterium_normanense
0.003	Parvimonas_micra
0.003	Bifidobacterium_longum
0.003	Afipia_broomeae:Bradyrhizobium_elkanii:Bradyrhizobium_genosp:Bradyrhizobium_japonicum:Bradyrhizobium_jicamae:Bradyrhizobium_lablabi:Bradyrhizobium_pachyrhizi:Bradyrhizobium_retamae
0.003	Aggregatibacter_segnis
0.003	Proteus_mirabilis:Proteus_penneri
0.003	Oribacterium_sinus
0.003	Stomatobaculum_longum
0.003	Capnocytophaga_sputigena
0.003	Brevundimonas_diminuta:Brevundimonas_naejangsanensis:Brevundimonas_vancanneytii:Nitrobacteria_hamadaniensis:Nitrobacteria_iranicum
0.002	Collinsella_aerofaciens
0.002	Kocuria_kristinae
0.002	Megasphaera_micronuciformis
0.002	Castellaniella_denitrificans
0.002	Faecalibacterium_prausnitzii
0.002	otu21781:Centipeda_periodontii
0.002	Peptoniphilus_lacrimalis
0.002	Lactococcus_lactis
0.002	Sutterella_wadsworthensis
0.002	Enhydrobacter_aerosaccus
0.002	Escherichia_coli:Escherichia_fergusonii
0.002	Lachnoanaerobaculum_umeaense
0.002	Micrococcus_alkanovora:Micrococcus_antarcticus:Micrococcus_endophyticus:Micrococcus_indicus:Micrococcus_luteus:Micrococcus_yunnanensis
0.001	Dialister_pneumosintes
0.001	Paracoccus_aestuarii:Paracoccus_beibuensis:Paracoccus_marinus
0.001	Aquabacterium_commune
0.001	Citrobacter_freundii
0.001	Rubellimicrobium_mesophilum
0.001	Chryseobacterium_proteolyticum
0.001	Mycobacterium_diernhoferi:Mycobacterium_fluoranthenivorans:Mycobacterium_frederiksbergense:Mycobacterium_hackensackense:Mycobacterium_sacrum
0.001	otu34876:Clostridium_indolis:Fusicatenibacter_saccharivorans
0.001	Porphyromonas_endodontalis
0.001	otu19808:Gemmiger_formicilis
0.001	Solobacterium_moorei
0.001	Massilia_lurida
0.001	Zimmermannella_faecalis
0.001	Caldimonas_hydrothermale:Caldimonas_manganoxidans:Caldimonas_taiwanensis
0.001	Abiotrophia_defectiva
0.001	Shinella_kummerowiae:Shinella_zoogloeoides
0.001	Enterobacter_aerogenes

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(accession.number),
                Genus = substr(Genus, 4, 1000),
                Phylum = substr(Phylum, 4, 1000)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)

# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "BO", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj.", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-all-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=10)
  ggsave(paste0("output/nci-all-001-",save.Date,".png"), plot=p, units="in", width=11, height=10)
}

Slide 6 - TCGA RNAseq Data

The heatmaps for slide 6 focus on the TCGA RNAseq data.

get counts

Barretts, blood normal, non-tumor, tumor, tumor w/ barretts

tcga_rna_sample_data <- dat.rna.s %>%
  filter(OTU == "Fusobacterium nucleatum")
  
table(tcga_rna_sample_data$tumor)


  0   1 
 11 162

table(tcga_rna_sample_data$tissue)


C15.1 C15.3 C15.4 C15.5 C15.9 C16.0 
    1     5    35   122     8     2

table(tcga_rna_sample_data$RNAseq_Solid_Tissue_Normal)


  0   1 
154  19

table(tcga_rna_sample_data$Barrett.s.Esophagus.Reported)


           No Not Available           Yes 
          113            31            29

table(tcga_rna_sample_data$tumor, tcga_rna_sample_data$Barrett.s.Esophagus.Reported)

   
     No Not Available Yes
  0   7             1   3
  1 106            30  26

nrow(tcga_rna_sample_data)

[1] 173

Relative Abudance Cutoff: 0.05

analysis.dat <- dat.rna # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")

AverageRelativeAbundance	otu2
0.426	Pseudomonas fluorescens group
0.320	Pseudomonas sp. UW4
0.075	Arthrobacter phenanthrenivorans

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        axis.ticks.y = element_blank(),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA RNAseq Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-rnaseq-05-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-rnaseq-05-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.01

analysis.dat <- dat.rna # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.01 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")

AverageRelativeAbundance	otu2
0.426	Pseudomonas fluorescens group
0.320	Pseudomonas sp. UW4
0.075	Arthrobacter phenanthrenivorans
0.041	Pseudomonas sp. UK4
0.029	Bradyrhizobium sp. BTAi1
0.027	Pseudomonas putida group
0.011	Bacillus cereus group

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA RNAseq Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-rnaseq-01-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-rnaseq-01-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.001

analysis.dat <- dat.rna # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")

AverageRelativeAbundance	otu2
0.426	Pseudomonas fluorescens group
0.320	Pseudomonas sp. UW4
0.075	Arthrobacter phenanthrenivorans
0.041	Pseudomonas sp. UK4
0.029	Bradyrhizobium sp. BTAi1
0.027	Pseudomonas putida group
0.011	Bacillus cereus group
0.009	Propionibacterium acnes
0.006	Escherichia coli
0.004	Arthrobacter sp. FB24
0.004	Bacillus megaterium
0.003	Haemophilus influenzae
0.003	Pseudomonas aeruginosa group
0.003	Pseudomonas brassicacearum
0.002	Enterobacter cloacae complex
0.001	Psychrobacter sp. PRwf-1
0.001	Enterococcus faecalis
0.001	Arthrobacter chlorophenolicus

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.10, breaks=c(0, 0.05, 0.10, 0.15, 0.20)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.10, breaks=c(0, 0.05, 0.10, 0.15, 0.20)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.10, breaks=c(0, 0.05, 0.10, 0.15, 0.20)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA RNAseq Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-rnaseq-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-rnaseq-001.-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Slide 7 - TCGA WGS Data

The heatmaps for slide76 focus on the TCGA WGS data.

get counts

Barretts, blood normal, non-tumor, tumor, tumor w/ barretts

tcga_wgs_sample_data <- dat.wgs.s %>%
  filter(OTU == "Fusobacterium nucleatum")
  
table(tcga_wgs_sample_data$tumor)


 0  1 
70 69

table(tcga_wgs_sample_data$tissue)


C15.3 C15.4 C15.5 C15.9 
    2    28    72     9

table(tcga_wgs_sample_data$WGS_Blood_Normal)


 0  1  2 
44 76 19

table(tcga_wgs_sample_data$WGS_Solid_Tissue_Normal)


 0  1  2 
90 42  7

table(tcga_wgs_sample_data$Barrett.s.Esophagus.Reported)


           No Not Available           Yes 
           54            47            10

table(tcga_wgs_sample_data$tumor, tcga_wgs_sample_data$Barrett.s.Esophagus.Reported)

   
    No Not Available Yes
  0 26            24   6
  1 28            23   4

nrow(tcga_wgs_sample_data)

[1] 139

Relative Abudance Cutoff: 0.05

analysis.dat <- dat.wgs # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")

AverageRelativeAbundance	otu2
0.465	Coprobacillus sp. D7
0.106	Propionibacterium acnes
0.074	Haemophilus influenzae
0.057	Prevotella melaninogenica

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA WGS Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-wgs-05-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-wgs-05-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.01

analysis.dat <- dat.wgs # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.01 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")

AverageRelativeAbundance	otu2
0.465	Coprobacillus sp. D7
0.106	Propionibacterium acnes
0.074	Haemophilus influenzae
0.057	Prevotella melaninogenica
0.028	Cyanothece sp. CCY0110
0.027	Fusobacterium nucleatum
0.022	Campylobacter concisus
0.019	Bradyrhizobium sp. BTAi1
0.016	Bradyrhizobium diazoefficiens
0.014	Streptococcus pneumoniae
0.012	Bradyrhizobium japonicum

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100],midpoint = 0.40,  breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA WGS Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-wgs-01-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-wgs-01-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Relative Abudance Cutoff: 0.001

analysis.dat <- dat.wgs # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
  dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
  dplyr::arrange(desc(AverageRelativeAbundance))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")

AverageRelativeAbundance	otu2
0.465	Coprobacillus sp. D7
0.106	Propionibacterium acnes
0.074	Haemophilus influenzae
0.057	Prevotella melaninogenica
0.028	Cyanothece sp. CCY0110
0.027	Fusobacterium nucleatum
0.022	Campylobacter concisus
0.019	Bradyrhizobium sp. BTAi1
0.016	Bradyrhizobium diazoefficiens
0.014	Streptococcus pneumoniae
0.012	Bradyrhizobium japonicum
0.010	Bradyrhizobium sp. S23321
0.008	candidate division TM7 single-cell isolate TM7a
0.008	Veillonella parvula
0.008	Streptococcus parasanguinis
0.007	Rhodopseudomonas palustris
0.006	Pseudomonas fluorescens group
0.006	Bradyrhizobium sp. ORS 278
0.006	Streptococcus mitis
0.005	Beggiatoa sp. PS
0.004	Staphylococcus epidermidis
0.004	Corynebacterium tuberculostearicum
0.004	Streptococcus suis
0.003	Haemophilus parainfluenzae
0.003	Streptococcus pseudopneumoniae
0.003	Delftia sp. Cs1-4
0.003	Lactobacillus gasseri
0.003	[Eubacterium] hallii
0.002	Delftia acidovorans
0.002	Streptococcus oralis
0.002	Pseudomonas putida group
0.002	Streptococcus thermophilus
0.002	Pseudomonas aeruginosa group
0.002	Atopobium rimae
0.002	Prevotella intermedia
0.001	candidate division TM7 single-cell isolate TM7c
0.001	Acidovorax delafieldii
0.001	Staphylococcus capitis
0.001	Escherichia coli
0.001	Oligotropha carboxidovorans
0.001	Acinetobacter junii
0.001	Acidovorax ebreus
0.001	Lactococcus garvieae
0.001	Gemmata obscuriglobus
0.001	Lactobacillus crispatus
0.001	Streptococcus gordonii
0.001	Xanthomonas campestris
0.001	Bacteroides sp. 3_1_33FAA

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA WGS Data showing average relative abundance of genera by individual",
    subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-wgs-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-wgs-001-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Slide 8 - NCI 16S Data with Specific OTUs

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ plt_hmap4+
  plot_layout(
    nrow=1, widths = c(0.5, 0.2, 1, 1, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="NCI-16s Data showing average relative abundance of genera by individual"
  )
p

if(save.plots == T){
  ggsave(paste0("output/nci-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/nci-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Slide 9 - TCGA RNAseq Data with Specific OTUs

The heatmaps for slide 9 focus on the TCGA RNAseq data. The difference here is we focus on 4 genera.

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA RNAseq Data showing average relative abundance of genera by individual"
  )
p

if(save.plots == T){
  ggsave(paste0("output/tcga-rna-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/tcga-rna-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Slide 10 - TCGA WGS Data with Specific OTUs

The heatmaps for slide76 focus on the TCGA WGS data.

analysis.dat <- dat.wgs.s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs


otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(otu2) %>%
  dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))

kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
  kable_styling(full_width = T)%>%
  scroll_box(width="100%", height="100%")

AverageRelativeAbundance	otu2
0.022	Campylobacter concisus
0.027	Fusobacterium nucleatum
0.057	Prevotella melaninogenica
0.006	Streptococcus mitis
0.002	Streptococcus oralis
0.008	Streptococcus parasanguinis
0.014	Streptococcus pneumoniae
0.001	Streptococcus salivarius
0.001	Streptococcus sanguinis

plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
  dplyr::group_by(OTU) %>%
  dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(ID = as.factor(Patient_ID),
                Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
  dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)


# widen plot.dat for dendro
dat.wide <- plot.dat %>%
  dplyr::mutate(
    ID = paste0(ID, "_",sample_type)
  ) %>%
  dplyr::select(ID, Genus, Abundance) %>%
  dplyr::group_by(ID, Genus) %>%
  dplyr::summarise(
    Abundance = mean(Abundance)
  ) %>%
  tidyr::pivot_wider(
    id_cols = Genus,
    names_from = ID,
    values_from = Abundance,
    values_fill = 0
  )
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn

sample_names <- colnames(mat)

# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)

# Setup the data, so that the layout is inverted (this is more 
# "clear" than simply using coord_flip())
segment_data <- with(
  segment(dend_data), 
  data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
  dend_data$labels, 
  data.frame(y_center = x, gene = as.character(label), height = 1))

# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
  dplyr::mutate(x_center = (1:n()), 
         width = 1)

# Neglecting the gap parameters
heatmap_data <- mat %>% 
  reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
  left_join(gene_pos_table) %>%
  left_join(sample_pos_table)

# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
  heatmap_data$ID[i] <- A[[i]][1]
  heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
  sample_pos_table$ID[i] <- A[[i]][1]
  sample_pos_table$sample_type[i] <- A[[i]][2]
}

# Limits for the vertical axes
gene_axis_limits <- with(
  gene_pos_table, 
  c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) + 
  0.1 * c(-1, 1) # extra spacing: 0.1

## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap1 <- ggplot(hmd, 
                   aes(x = x_center, y = y_center, fill = expr, 
                       height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        #axis.ticks.y = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid = element_blank(),
        legend.position = "none")

# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap2 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank(),
        legend.position = "none")

# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))

plt_hmap3 <- ggplot(hmd, 
                    aes(x = x_center, y = y_center, fill = expr, 
                        height = height, width = width)) + 
  geom_tile() +
  #facet_wrap(.~sample_type)+
  scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
  scale_x_continuous(breaks = spd$x_center, 
                     labels = spd$ID, 
                     expand = c(0, 0)) + 
  # For the y axis, alternatively set the labels as: gene_position_table$gene
  scale_y_continuous(breaks = gene_pos_table[, "y_center"], 
                     labels = rep("", nrow(gene_pos_table)),
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "ESCA w/o Barretts", y = NULL) +
  theme_classic() +
  theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
        # margin: top, right, bottom, and left
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"), 
        panel.grid.minor = element_blank())


# Dendrogram plot
plt_dendr <- ggplot(segment_data) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  scale_x_reverse(expand = c(0, 0.5)) + 
  scale_y_continuous(breaks = gene_pos_table$y_center, 
                     labels = gene_pos_table$gene, 
                     limits = gene_axis_limits, 
                     expand = c(0, 0)) + 
  labs(x = "", y = "", colour = "", size = "") +
  theme_classic() + 
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))

prntRelAbund <- avgRelAbundCutoff*100

p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ 
  plot_layout(
    nrow=1, widths = c(0.5, 0.4, 0.15, 1),
    guides="collect"
  ) +
  plot_annotation(
    title="TCGA WGS Data showing average relative abundance of genera by individual"
  )
p

if(save.plots == T){
  ggsave(paste0("output/slide-10-heatmap-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
  ggsave(paste0("output/slide-10-heatmap-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}

Stacked Bar Charts

Slide 11 - NCI 16s Data

plot.dat <- dat.16s.s %>% filter(sample_type != "0") %>%
  mutate(ID = as.factor(accession.number),
         Genus = substr(Genus, 4, 1000),
         Phylum = substr(Phylum, 4, 1000))%>%
  dplyr::group_by(sample_type, OTU1)%>%
  dplyr::summarise(
    Abundance = mean(Abundance, na.rm=T)
  )
  
p1 <- ggplot(plot.dat, aes(x=sample_type, y = Abundance, fill=OTU1)) +
  geom_bar(stat="identity")+
  labs(title="NCI 16s Data",
       x = "Tissue Group",
       y="Average Relative Abundance") +
  theme_classic()+
  theme(
    axis.text.x = element_text(angle=30, vjust=0.95, hjust=0.9),
    legend.title = element_blank()
  )
p1

if(save.plots == T){
  ggsave(paste0("output/slide-11-bar-chart-",save.Date,".pdf"), plot=p1, units="in", width=8, height=5)
  ggsave(paste0("output/slide-11-bar-chart-",save.Date,".png"), plot=p1, units="in", width=8, height=5)
}

Slide 12 - TCGA RNAseq Data

plot.dat <- dat.rna.s %>% filter(sample_type != "0", is.na(OTU1) == F)%>%
  dplyr::group_by(sample_type, OTU1)%>%
  dplyr::summarise(
    Abundance = mean(Abundance, na.rm=T)
  )
  
p1 <- ggplot(plot.dat, aes(x=sample_type, y = Abundance, fill=OTU1)) +
  geom_bar(stat="identity")+
  labs(title="TCGA RNAseq Data",
       x = "Tissue Group",
       y="Average Relative Abundance") +
  theme_classic() +
  theme(
    axis.text.x = element_text(angle=30, vjust=0.95, hjust=0.9),
    legend.title = element_blank()
  )
p1

if(save.plots == T){
  ggsave(paste0("output/slide-12-bar-chart-",save.Date,".pdf"), plot=p1, units="in", width=8, height=5)
  ggsave(paste0("output/slide-12-bar-chart-",save.Date,".png"), plot=p1, units="in", width=8, height=5)
}

Slide 13 - TCGA WGS Data

plot.dat <- dat.wgs.s %>% filter(sample_type != "0", is.na(OTU1) == F)%>%
  dplyr::group_by(sample_type, OTU1)%>%
  dplyr::summarise(
    Abundance = mean(Abundance, na.rm=T)
  )
  
p1 <- ggplot(plot.dat, aes(x=sample_type, y = Abundance, fill=OTU1)) +
  geom_bar(stat="identity")+
  labs(title="TCGA WGS Data",
       x = "Tissue Group",
       y="Average Relative Abundance") +
  scale_y_continuous(trans="sqrt", breaks=c(0,0.01, 0.1, 0.2, 0.4, 0.6))+
  theme_classic() +
  theme(
    axis.text.x = element_text(angle=30, vjust=0.95, hjust=0.9),
    legend.title = element_blank()
  )
p1

if(save.plots == T){
  ggsave(paste0("output/slide-13-bar-chart-",save.Date,".pdf"), plot=p1, units="in", width=8, height=5)
  ggsave(paste0("output/slide-13-bar-chart-",save.Date,".png"), plot=p1, units="in", width=8, height=5)
}

sessionInfo()

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.1     dendextend_1.16.0 ggdendro_0.1.23   reshape2_1.4.4   
 [5] car_3.1-0         carData_3.0-5     gvlma_1.0.0.3     patchwork_1.1.1  
 [9] viridis_0.6.2     viridisLite_0.4.0 gridExtra_2.3     xtable_1.8-4     
[13] kableExtra_1.3.4  MASS_7.3-58.1     data.table_1.14.2 readxl_1.4.0     
[17] forcats_0.5.1     stringr_1.4.0     dplyr_1.0.9       purrr_0.3.4      
[21] readr_2.1.2       tidyr_1.2.0       tibble_3.1.7      ggplot2_3.3.6    
[25] tidyverse_1.3.2   lmerTest_3.1-3    lme4_1.1-30       Matrix_1.5-1     
[29] vegan_2.6-2       lattice_0.20-45   permute_0.9-7     phyloseq_1.40.0  

loaded via a namespace (and not attached):
  [1] googledrive_2.0.0      minqa_1.2.4            colorspace_2.0-3      
  [4] ellipsis_0.3.2         rprojroot_2.0.3        XVector_0.36.0        
  [7] fs_1.5.2               rstudioapi_0.13        farver_2.1.1          
 [10] fansi_1.0.3            lubridate_1.8.0        xml2_1.3.3            
 [13] codetools_0.2-18       splines_4.2.2          cachem_1.0.6          
 [16] knitr_1.39             ade4_1.7-19            jsonlite_1.8.0        
 [19] workflowr_1.7.0        nloptr_2.0.3           broom_1.0.0           
 [22] cluster_2.1.4          dbplyr_2.2.1           BiocManager_1.30.18   
 [25] compiler_4.2.2         httr_1.4.3             backports_1.4.1       
 [28] assertthat_0.2.1       fastmap_1.1.0          gargle_1.2.0          
 [31] cli_3.4.1              later_1.3.0            htmltools_0.5.2       
 [34] tools_4.2.2            igraph_1.3.4           gtable_0.3.0          
 [37] glue_1.6.2             GenomeInfoDbData_1.2.8 Rcpp_1.0.8.3          
 [40] Biobase_2.56.0         cellranger_1.1.0       jquerylib_0.1.4       
 [43] vctrs_0.4.1            Biostrings_2.64.0      rhdf5filters_1.8.0    
 [46] multtest_2.52.0        svglite_2.1.0          ape_5.6-2             
 [49] nlme_3.1-160           iterators_1.0.14       xfun_0.31             
 [52] rvest_1.0.2            lifecycle_1.0.1        googlesheets4_1.0.0   
 [55] zlibbioc_1.42.0        scales_1.2.0           ragg_1.2.4            
 [58] hms_1.1.1              promises_1.2.0.1       parallel_4.2.2        
 [61] biomformat_1.24.0      rhdf5_2.40.0           yaml_2.3.5            
 [64] sass_0.4.2             stringi_1.7.6          highr_0.9             
 [67] S4Vectors_0.34.0       foreach_1.5.2          BiocGenerics_0.42.0   
 [70] boot_1.3-28            GenomeInfoDb_1.32.2    systemfonts_1.0.4     
 [73] rlang_1.0.6            pkgconfig_2.0.3        bitops_1.0-7          
 [76] evaluate_0.15          Rhdf5lib_1.18.2        labeling_0.4.2        
 [79] tidyselect_1.1.2       plyr_1.8.7             magrittr_2.0.3        
 [82] R6_2.5.1               IRanges_2.30.0         generics_0.1.3        
 [85] DBI_1.1.3              pillar_1.8.0           haven_2.5.0           
 [88] whisker_0.4            withr_2.5.0            mgcv_1.8-41           
 [91] abind_1.4-5            survival_3.4-0         RCurl_1.98-1.8        
 [94] modelr_0.1.8           crayon_1.5.1           utf8_1.2.2            
 [97] tzdb_0.3.0             rmarkdown_2.14         grid_4.2.2            
[100] git2r_0.30.1           webshot_0.5.3          reprex_2.0.1          
[103] digest_0.6.29          httpuv_1.6.5           numDeriv_2016.8-1.1   
[106] textshaping_0.3.6      stats4_4.2.2           munsell_0.5.0         
[109] bslib_0.4.0

Updated Heatmaps and Dendrograms

Intro

Heatmaps and Dendrograms

Slide 5 - NCI 16S Data

Relative Abudance Cutoff: 0.05

Relative Abudance Cutoff: 0.01

Relative Abudance Cutoff: 0.001

Slide 6 - TCGA RNAseq Data

Relative Abudance Cutoff: 0.05

Relative Abudance Cutoff: 0.01

Relative Abudance Cutoff: 0.001

Slide 7 - TCGA WGS Data

Relative Abudance Cutoff: 0.05

Relative Abudance Cutoff: 0.01

Relative Abudance Cutoff: 0.001

Slide 8 - NCI 16S Data with Specific OTUs

Slide 9 - TCGA RNAseq Data with Specific OTUs

Slide 10 - TCGA WGS Data with Specific OTUs

Stacked Bar Charts

Slide 11 - NCI 16s Data

Slide 12 - TCGA RNAseq Data

Slide 13 - TCGA WGS Data