Last updated: 2022-12-13
Checks: 6 1
Knit directory:
esoph-micro-cancer-workflow/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200916)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version be3dfad. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.Rhistory
Ignored: data.zip
Ignored: data/
Ignored: output/Supplement Figure 2 (2).zip
Ignored: output/Updated Heatmaps.zip
Ignored: output/updated-figures-2022-12-12.zip
Ignored: output/updated_figures_2022_12_08.zip
Untracked files:
Untracked: output/nci-2022-12-12.pdf
Untracked: output/nci-2022-12-12.png
Untracked: output/slide-10-heatmap-2022-12-12.pdf
Untracked: output/slide-10-heatmap-2022-12-12.png
Untracked: output/slide-11-bar-chart-2022-12-12.pdf
Untracked: output/slide-11-bar-chart-2022-12-12.png
Untracked: output/slide-12-bar-chart-2022-12-12.pdf
Untracked: output/slide-12-bar-chart-2022-12-12.png
Untracked: output/slide-13-bar-chart-2022-12-12.pdf
Untracked: output/slide-13-bar-chart-2022-12-12.png
Untracked: output/slide-5-heatmap-001-2022-12-10.pdf
Untracked: output/slide-5-heatmap-001-2022-12-10.png
Untracked: output/slide-5-heatmap-001-2022-12-12.pdf
Untracked: output/slide-5-heatmap-001-2022-12-12.png
Untracked: output/slide-5-heatmap-01-2022-12-12.pdf
Untracked: output/slide-5-heatmap-01-2022-12-12.png
Untracked: output/slide-5-heatmap-05-2022-12-12.pdf
Untracked: output/slide-5-heatmap-05-2022-12-12.png
Untracked: output/slide-6-heatmap-001-2022-12-12.pdf
Untracked: output/slide-6-heatmap-001.-2022-12-12.png
Untracked: output/slide-6-heatmap-01-2022-12-12.pdf
Untracked: output/slide-6-heatmap-01-2022-12-12.png
Untracked: output/slide-6-heatmap-05-2022-12-12.pdf
Untracked: output/slide-6-heatmap-05-2022-12-12.png
Untracked: output/slide-7-heatmap-001-2022-12-12.pdf
Untracked: output/slide-7-heatmap-001-2022-12-12.png
Untracked: output/slide-7-heatmap-01-2022-12-12.pdf
Untracked: output/slide-7-heatmap-01-2022-12-12.png
Untracked: output/slide-7-heatmap-05-2022-12-12.pdf
Untracked: output/slide-7-heatmap-05-2022-12-12.png
Untracked: output/slide-8-heatmap-2022-12-12.pdf
Untracked: output/slide-8-heatmap-2022-12-12.png
Untracked: output/slide-9-heatmap-2022-12-12.pdf
Untracked: output/slide-9-heatmap-2022-12-12.png
Untracked: output/tcga-rna-001-2022-12-12.pdf
Untracked: output/tcga-rna-001.-2022-12-12.png
Untracked: output/tcga-rna-2022-12-12.pdf
Untracked: output/tcga-rna-2022-12-12.png
Untracked: output/updated-figures-2022-12-12/
Untracked: output/updated_figures_2022_12_08/
Unstaged changes:
Modified: analysis/slide-figures-heatmaps-dendrograms-updates.Rmd
Modified: output/slide-6-heatmap-001-2022-12-08.pdf
Modified: output/slide-6-heatmap-001.-2022-12-08.png
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/slide-figures-heatmaps-dendrograms-updates.Rmd
)
and HTML
(docs/slide-figures-heatmaps-dendrograms-updates.html
)
files. If you’ve configured a remote Git repository (see
?wflow_git_remote
), click on the hyperlinks in the table
below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | be3dfad | noah-padgett | 2022-12-08 | updated with new color scheme |
html | be3dfad | noah-padgett | 2022-12-08 | updated with new color scheme |
Rmd | b1264da | noah-padgett | 2022-09-07 | revised figures for publication |
Rmd | 9e1d57c | noah-padgett | 2022-08-19 | updated and corrected heatmap color and labels |
html | 9e1d57c | noah-padgett | 2022-08-19 | updated and corrected heatmap color and labels |
Rmd | b9a498d | noah-padgett | 2021-01-26 | fixed issue with subsetting |
html | b9a498d | noah-padgett | 2021-01-26 | fixed issue with subsetting |
This page contains the updated code for generating the joint figures of heatmaps with the dendrogram. The update was needed to fix how the OTUs were subset based on average relative abundance. Prior to ESCAh heatmap will be a table of the OTUs that meet the given criteria.
I figured out that I originally subset based on the OTU relative abundance for ESCAh individual and sample, meaning that I subset according the just the raw Abundance irrespective of any OTU average abundance. This mistake is corrected in this document.
get counts
Barretts, blood normal, non-tumor, tumor, tumor w/ barretts
nci_sample_data <- dat.16s.s %>%
filter(OTU == "Fusobacterium nucleatum")
table(nci_sample_data$tumor)
0 1
93 65
table(nci_sample_data$tissue)
BO N T
5 88 65
table(nci_sample_data$Barretts.)
N Y
71 87
table(nci_sample_data$tumor, nci_sample_data$Barretts.)
N Y
0 42 51
1 29 36
nrow(nci_sample_data)
[1] 158
analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="100%")
AverageRelativeAbundance | OTU |
---|---|
0.272 | Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis |
0.065 | Pseudomonas_rhodesiae |
0.054 | Prevotella_melaninogenica |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(accession.number),
Genus = substr(Genus, 4, 1000),
Phylum = substr(Phylum, 4, 1000)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2(
"Abundance",
trans="sqrt",
high=myColor[1],
mid = myColor[50],
low=myColor[100]
, midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "BO", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.2, 1, 1),
guides="collect"
) +
plot_annotation(
title="NCI-16s Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/nci-05-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/nci-05-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.01 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="400px")
AverageRelativeAbundance | OTU |
---|---|
0.272 | Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis |
0.065 | Pseudomonas_rhodesiae |
0.054 | Prevotella_melaninogenica |
0.049 | Lactobacillus_gasseri:Lactobacillus_johnsonii |
0.041 | Veillonella_dispar |
0.038 | Stenotrophomonas_maltophilia |
0.035 | Fusobacterium_nucleatum |
0.031 | Acinetobacter_guillouiae |
0.025 | Granulicatella_adiacens:Granulicatella_paraadiacens |
0.023 | Rothia_mucilaginosa |
0.022 | Staphylococcus_epidermidis:Staphylococcus_hominis |
0.020 | Gemella_haemolysans |
0.018 | Haemophilus_parainfluenzae |
0.016 | Selenomonas_sputigena |
0.014 | Lachnoanaerobaculum_orale |
0.014 | otu19913:Actinobacillus_minor:Actinobacillus_porcinus:Actinobacillus_rossii:Haemophilus_paraphrohaemolyticus |
0.011 | Actinomyces_odontolyticus |
0.010 | Neisseria_flavescens |
0.010 | otu16698:Tannerella_forsythia |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(accession.number),
Genus = substr(Genus, 4, 1000),
Phylum = substr(Phylum, 4, 1000)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "BO", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 4: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap4 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/o Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+plt_hmap4+
plot_layout(
nrow=1, widths = c(0.5, 0.2, 1, 1, 1),
guides="collect"
) +
plot_annotation(
title="NCI-16s Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/nci-01-",save.Date,".pdf"), plot=p, units="in", width=11, height=7)
ggsave(paste0("output/nci-01-",save.Date,".png"), plot=p, units="in", width=11, height=7)
}
analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="400px")
AverageRelativeAbundance | OTU |
---|---|
0.272 | Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis |
0.065 | Pseudomonas_rhodesiae |
0.054 | Prevotella_melaninogenica |
0.049 | Lactobacillus_gasseri:Lactobacillus_johnsonii |
0.041 | Veillonella_dispar |
0.038 | Stenotrophomonas_maltophilia |
0.035 | Fusobacterium_nucleatum |
0.031 | Acinetobacter_guillouiae |
0.025 | Granulicatella_adiacens:Granulicatella_paraadiacens |
0.023 | Rothia_mucilaginosa |
0.022 | Staphylococcus_epidermidis:Staphylococcus_hominis |
0.020 | Gemella_haemolysans |
0.018 | Haemophilus_parainfluenzae |
0.016 | Selenomonas_sputigena |
0.014 | Lachnoanaerobaculum_orale |
0.014 | otu19913:Actinobacillus_minor:Actinobacillus_porcinus:Actinobacillus_rossii:Haemophilus_paraphrohaemolyticus |
0.011 | Actinomyces_odontolyticus |
0.010 | Neisseria_flavescens |
0.010 | otu16698:Tannerella_forsythia |
0.009 | Sphingomonas_pituitosa |
0.008 | Clostridium_perfringens:Clostridium_thermophilus |
0.008 | Bacteroides_fragilis |
0.008 | Atopostipes_suicloacalis |
0.008 | Acidovorax_temperans |
0.008 | otu4906:Bacillus_oleronius |
0.007 | Leptotrichia_wadei |
0.007 | Corynebacterium_kroppenstedtii |
0.006 | TM7_phylum |
0.005 | Serratia_marcescens:Serratia_nematodiphila |
0.005 | Campylobacter_rectus:Campylobacter_showae |
0.005 | Geobacillus_stearothermophilus |
0.005 | Sarcina_maxima |
0.004 | Actinobacillus_rossii |
0.004 | Lysinibacillus_fusiformis:Lysinibacillus_mangiferahumi:Lysinibacillus_sphaericus |
0.004 | Cloacibacterium_normanense |
0.004 | Afipia_broomeae:Bradyrhizobium_elkanii:Bradyrhizobium_genosp:Bradyrhizobium_japonicum:Bradyrhizobium_jicamae:Bradyrhizobium_lablabi:Bradyrhizobium_pachyrhizi:Bradyrhizobium_retamae |
0.003 | Brucella_abortus:Brucella_canis:Brucella_ceti:Brucella_inopinata:Brucella_melitensis:Brucella_microti:Brucella_ovis:Brucella_pinnipedialis:Brucella_suis:Ochrobactrum_anthropi:Ochrobactrum_cytisi:Ochrobactrum_grignonense:Ochrobactrum_haematophilum:Ochrobactrum_lupini:Ochrobactrum_pecoris:Ochrobactrum_thiophenivorans:Ochrobactrum_tritici |
0.003 | Bifidobacterium_longum |
0.003 | Aggregatibacter_segnis |
0.003 | Brevundimonas_diminuta:Brevundimonas_naejangsanensis:Brevundimonas_vancanneytii:Nitrobacteria_hamadaniensis:Nitrobacteria_iranicum |
0.003 | Proteus_mirabilis:Proteus_penneri |
0.003 | Capnocytophaga_sputigena |
0.003 | Stomatobaculum_longum |
0.003 | Kocuria_kristinae |
0.003 | otu17927:Alloprevotella_rava |
0.003 | Parvimonas_micra |
0.002 | Megasphaera_micronuciformis |
0.002 | Oribacterium_sinus |
0.002 | Helicobacter_pylori |
0.002 | Faecalibacterium_prausnitzii |
0.002 | Enhydrobacter_aerosaccus |
0.002 | Castellaniella_denitrificans |
0.002 | otu21781:Centipeda_periodontii |
0.002 | Escherichia_coli:Escherichia_fergusonii |
0.002 | Paracoccus_aestuarii:Paracoccus_beibuensis:Paracoccus_marinus |
0.002 | Aquabacterium_commune |
0.002 | Citrobacter_freundii |
0.002 | Micrococcus_alkanovora:Micrococcus_antarcticus:Micrococcus_endophyticus:Micrococcus_indicus:Micrococcus_luteus:Micrococcus_yunnanensis |
0.002 | Lachnoanaerobaculum_umeaense |
0.002 | otu34876:Clostridium_indolis:Fusicatenibacter_saccharivorans |
0.002 | Lactococcus_lactis |
0.002 | Sutterella_wadsworthensis |
0.001 | Porphyromonas_endodontalis |
0.001 | Dialister_pneumosintes |
0.001 | Chryseobacterium_proteolyticum |
0.001 | Zimmermannella_faecalis |
0.001 | Caldimonas_hydrothermale:Caldimonas_manganoxidans:Caldimonas_taiwanensis |
0.001 | Massilia_lurida |
0.001 | Shinella_kummerowiae:Shinella_zoogloeoides |
0.001 | Rubellimicrobium_mesophilum |
0.001 | Blautia_wexlerae |
0.001 | Abiotrophia_defectiva |
0.001 | Mycobacterium_diernhoferi:Mycobacterium_fluoranthenivorans:Mycobacterium_frederiksbergense:Mycobacterium_hackensackense:Mycobacterium_sacrum |
0.001 | Enterobacter_aerogenes |
0.001 | Psychromonas_arctica |
0.001 | otu33966:Pseudolabrys_taiwanensis |
0.001 | Microvirga_guangxiensis |
0.001 | Tepidimonas_fonticaldi |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(accession.number),
Genus = substr(Genus, 4, 1000),
Phylum = substr(Phylum, 4, 1000)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "BO", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.2, 1, 1),
guides="collect"
) +
plot_annotation(
title="NCI-16s Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/nci-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=10)
ggsave(paste0("output/nci-001-",save.Date,".png"), plot=p, units="in", width=11, height=10)
}
# melt data down for use
dat.16s <- psmelt(phylo.data.nci.umd)
# set up variables
dat.16s$sample_type <- 0
dat.16s$sample_type[dat.16s$tissue=="T" &
dat.16s$Histology=="ADC"] <- "ESCA tissues"
dat.16s$sample_type[dat.16s$tissue=="N" &
dat.16s$Histology=="ADC"] <- "ESCA-adjacent tissue"
dat.16s$sample_type[dat.16s$tissue=="BO" &
dat.16s$Histology=="Barrets only"] <-"Barretts Only"
# make tumor vs normal variable
dat.16s$tumor.cat <- factor(dat.16s$tissue, levels=c("BO", "N", "T"), labels = c("Non-Tumor", "Non-Tumor", "Tumor"))
# dataset id
dat.16s$source <- "16s"
# plotting ids
dat.16s$X <- paste0(dat.16s$source, "-", dat.16s$tumor.cat)
# relabel as (0/1) for analysis
dat.16s$tumor <- as.numeric(factor(dat.16s$tissue, levels=c("BO", "N", "T"), labels = c("Non-Tumor", "Non-Tumor", "Tumor"))) - 1
# presence- absence
dat.16s$pres <- ifelse(dat.16s$Abundance > 0, 1, 0)
dat.16s$pres[is.na(dat.16s$pres)] <- 0
# subset to fuso. nuc. only
# Streptococcus sanguinis
# Campylobacter concisus
# Prevotella spp.
dat.16s.s <- filter(
dat.16s,
OTU %in% c(
"Fusobacterium_nucleatum",
unique(dat.16s$OTU[dat.16s$OTU %like% "Streptococcus_"]),
unique(dat.16s$OTU[dat.16s$OTU %like% "Campylobacter_"]),
"Prevotella_melaninogenica")
)
# new names
dat.16s.s$OTU1 <- factor(
dat.16s.s$OTU,
levels = c(
"Fusobacterium_nucleatum",
"Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis",
"Campylobacter_rectus:Campylobacter_showae",
"Prevotella_melaninogenica"
),
labels = c(
"Fusobacterium nucleatum",
"Streptococcus sanguinis",
"Campylobacter concisus",
"Prevotella melaninogenica"
)
)
# rename bacteria
dat.16s.s$OTU <- factor(
dat.16s.s$OTU,
levels = c(
"Fusobacterium_nucleatum",
"Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis",
"Campylobacter_rectus:Campylobacter_showae",
"Prevotella_melaninogenica"
),
labels = c(
"Fusobacterium nucleatum",
"Streptococcus spp.",
"Campylobacter concisus",
"Prevotella melaninogenica"
)
)
analysis.dat <- dat.16s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="400px")
AverageRelativeAbundance | OTU |
---|---|
0.289 | Streptococcus_dentisani:Streptococcus_infantis:Streptococcus_mitis:Streptococcus_oligofermentans:Streptococcus_oralis:Streptococcus_pneumoniae:Streptococcus_pseudopneumoniae:Streptococcus_sanguinis |
0.057 | Pseudomonas_rhodesiae |
0.057 | Prevotella_melaninogenica |
0.043 | Lactobacillus_gasseri:Lactobacillus_johnsonii |
0.038 | Veillonella_dispar |
0.035 | Fusobacterium_nucleatum |
0.034 | Stenotrophomonas_maltophilia |
0.031 | Acinetobacter_guillouiae |
0.024 | Rothia_mucilaginosa |
0.023 | Haemophilus_parainfluenzae |
0.023 | Gemella_haemolysans |
0.022 | Granulicatella_adiacens:Granulicatella_paraadiacens |
0.021 | Staphylococcus_epidermidis:Staphylococcus_hominis |
0.013 | Selenomonas_sputigena |
0.013 | Lachnoanaerobaculum_orale |
0.012 | otu19913:Actinobacillus_minor:Actinobacillus_porcinus:Actinobacillus_rossii:Haemophilus_paraphrohaemolyticus |
0.011 | Neisseria_flavescens |
0.011 | Lysinibacillus_fusiformis:Lysinibacillus_mangiferahumi:Lysinibacillus_sphaericus |
0.011 | Actinomyces_odontolyticus |
0.009 | otu4906:Bacillus_oleronius |
0.009 | otu16698:Tannerella_forsythia |
0.008 | Bacteroides_fragilis |
0.008 | Sphingomonas_pituitosa |
0.007 | Atopostipes_suicloacalis |
0.007 | Clostridium_perfringens:Clostridium_thermophilus |
0.006 | Acidovorax_temperans |
0.006 | Leptotrichia_wadei |
0.006 | Brucella_abortus:Brucella_canis:Brucella_ceti:Brucella_inopinata:Brucella_melitensis:Brucella_microti:Brucella_ovis:Brucella_pinnipedialis:Brucella_suis:Ochrobactrum_anthropi:Ochrobactrum_cytisi:Ochrobactrum_grignonense:Ochrobactrum_haematophilum:Ochrobactrum_lupini:Ochrobactrum_pecoris:Ochrobactrum_thiophenivorans:Ochrobactrum_tritici |
0.006 | Corynebacterium_kroppenstedtii |
0.005 | TM7_phylum |
0.005 | Geobacillus_stearothermophilus |
0.005 | Serratia_marcescens:Serratia_nematodiphila |
0.005 | Helicobacter_pylori |
0.005 | Campylobacter_rectus:Campylobacter_showae |
0.004 | Actinobacillus_rossii |
0.004 | otu17927:Alloprevotella_rava |
0.004 | otu17904:Deinococcus_aetherius |
0.004 | Sarcina_maxima |
0.003 | Cloacibacterium_normanense |
0.003 | Parvimonas_micra |
0.003 | Bifidobacterium_longum |
0.003 | Afipia_broomeae:Bradyrhizobium_elkanii:Bradyrhizobium_genosp:Bradyrhizobium_japonicum:Bradyrhizobium_jicamae:Bradyrhizobium_lablabi:Bradyrhizobium_pachyrhizi:Bradyrhizobium_retamae |
0.003 | Aggregatibacter_segnis |
0.003 | Proteus_mirabilis:Proteus_penneri |
0.003 | Oribacterium_sinus |
0.003 | Stomatobaculum_longum |
0.003 | Capnocytophaga_sputigena |
0.003 | Brevundimonas_diminuta:Brevundimonas_naejangsanensis:Brevundimonas_vancanneytii:Nitrobacteria_hamadaniensis:Nitrobacteria_iranicum |
0.002 | Collinsella_aerofaciens |
0.002 | Kocuria_kristinae |
0.002 | Megasphaera_micronuciformis |
0.002 | Castellaniella_denitrificans |
0.002 | Faecalibacterium_prausnitzii |
0.002 | otu21781:Centipeda_periodontii |
0.002 | Peptoniphilus_lacrimalis |
0.002 | Lactococcus_lactis |
0.002 | Sutterella_wadsworthensis |
0.002 | Enhydrobacter_aerosaccus |
0.002 | Escherichia_coli:Escherichia_fergusonii |
0.002 | Lachnoanaerobaculum_umeaense |
0.002 | Micrococcus_alkanovora:Micrococcus_antarcticus:Micrococcus_endophyticus:Micrococcus_indicus:Micrococcus_luteus:Micrococcus_yunnanensis |
0.001 | Dialister_pneumosintes |
0.001 | Paracoccus_aestuarii:Paracoccus_beibuensis:Paracoccus_marinus |
0.001 | Aquabacterium_commune |
0.001 | Citrobacter_freundii |
0.001 | Rubellimicrobium_mesophilum |
0.001 | Chryseobacterium_proteolyticum |
0.001 | Mycobacterium_diernhoferi:Mycobacterium_fluoranthenivorans:Mycobacterium_frederiksbergense:Mycobacterium_hackensackense:Mycobacterium_sacrum |
0.001 | otu34876:Clostridium_indolis:Fusicatenibacter_saccharivorans |
0.001 | Porphyromonas_endodontalis |
0.001 | otu19808:Gemmiger_formicilis |
0.001 | Solobacterium_moorei |
0.001 | Massilia_lurida |
0.001 | Zimmermannella_faecalis |
0.001 | Caldimonas_hydrothermale:Caldimonas_manganoxidans:Caldimonas_taiwanensis |
0.001 | Abiotrophia_defectiva |
0.001 | Shinella_kummerowiae:Shinella_zoogloeoides |
0.001 | Enterobacter_aerogenes |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(accession.number),
Genus = substr(Genus, 4, 1000),
Phylum = substr(Phylum, 4, 1000)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# by parts
hmd <- filter(heatmap_data, sample_type == "Barretts Only")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "Barretts Only")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "BO", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj.", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.2, 1, 1),
guides="collect"
) +
plot_annotation(
title="NCI-16s Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/nci-all-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=10)
ggsave(paste0("output/nci-all-001-",save.Date,".png"), plot=p, units="in", width=11, height=10)
}
The heatmaps for slide 6 focus on the TCGA RNAseq data.
get counts
Barretts, blood normal, non-tumor, tumor, tumor w/ barretts
tcga_rna_sample_data <- dat.rna.s %>%
filter(OTU == "Fusobacterium nucleatum")
table(tcga_rna_sample_data$tumor)
0 1
11 162
table(tcga_rna_sample_data$tissue)
C15.1 C15.3 C15.4 C15.5 C15.9 C16.0
1 5 35 122 8 2
table(tcga_rna_sample_data$RNAseq_Solid_Tissue_Normal)
0 1
154 19
table(tcga_rna_sample_data$Barrett.s.Esophagus.Reported)
No Not Available Yes
113 31 29
table(tcga_rna_sample_data$tumor, tcga_rna_sample_data$Barrett.s.Esophagus.Reported)
No Not Available Yes
0 7 1 3
1 106 30 26
nrow(tcga_rna_sample_data)
[1] 173
analysis.dat <- dat.rna # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(otu2) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="100%")
AverageRelativeAbundance | otu2 |
---|---|
0.426 | Pseudomonas fluorescens group |
0.320 | Pseudomonas sp. UW4 |
0.075 | Arthrobacter phenanthrenivorans |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(Patient_ID),
Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/o Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
axis.ticks.y = element_blank(),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.4, 0.15, 1),
guides="collect"
) +
plot_annotation(
title="TCGA RNAseq Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/tcga-rnaseq-05-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/tcga-rnaseq-05-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
analysis.dat <- dat.rna # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.01 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(otu2) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="100%")
AverageRelativeAbundance | otu2 |
---|---|
0.426 | Pseudomonas fluorescens group |
0.320 | Pseudomonas sp. UW4 |
0.075 | Arthrobacter phenanthrenivorans |
0.041 | Pseudomonas sp. UK4 |
0.029 | Bradyrhizobium sp. BTAi1 |
0.027 | Pseudomonas putida group |
0.011 | Bacillus cereus group |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(Patient_ID),
Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/o Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.4, 0.15, 1),
guides="collect"
) +
plot_annotation(
title="TCGA RNAseq Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/tcga-rnaseq-01-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/tcga-rnaseq-01-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
analysis.dat <- dat.rna # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(otu2) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="100%")
AverageRelativeAbundance | otu2 |
---|---|
0.426 | Pseudomonas fluorescens group |
0.320 | Pseudomonas sp. UW4 |
0.075 | Arthrobacter phenanthrenivorans |
0.041 | Pseudomonas sp. UK4 |
0.029 | Bradyrhizobium sp. BTAi1 |
0.027 | Pseudomonas putida group |
0.011 | Bacillus cereus group |
0.009 | Propionibacterium acnes |
0.006 | Escherichia coli |
0.004 | Arthrobacter sp. FB24 |
0.004 | Bacillus megaterium |
0.003 | Haemophilus influenzae |
0.003 | Pseudomonas aeruginosa group |
0.003 | Pseudomonas brassicacearum |
0.002 | Enterobacter cloacae complex |
0.001 | Psychrobacter sp. PRwf-1 |
0.001 | Enterococcus faecalis |
0.001 | Arthrobacter chlorophenolicus |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(Patient_ID),
Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.10, breaks=c(0, 0.05, 0.10, 0.15, 0.20)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.10, breaks=c(0, 0.05, 0.10, 0.15, 0.20)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(),#element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.10, breaks=c(0, 0.05, 0.10, 0.15, 0.20)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/o Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.4, 0.15, 1),
guides="collect"
) +
plot_annotation(
title="TCGA RNAseq Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/tcga-rnaseq-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/tcga-rnaseq-001.-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
The heatmaps for slide76 focus on the TCGA WGS data.
get counts
Barretts, blood normal, non-tumor, tumor, tumor w/ barretts
tcga_wgs_sample_data <- dat.wgs.s %>%
filter(OTU == "Fusobacterium nucleatum")
table(tcga_wgs_sample_data$tumor)
0 1
70 69
table(tcga_wgs_sample_data$tissue)
C15.3 C15.4 C15.5 C15.9
2 28 72 9
table(tcga_wgs_sample_data$WGS_Blood_Normal)
0 1 2
44 76 19
table(tcga_wgs_sample_data$WGS_Solid_Tissue_Normal)
0 1 2
90 42 7
table(tcga_wgs_sample_data$Barrett.s.Esophagus.Reported)
No Not Available Yes
54 47 10
table(tcga_wgs_sample_data$tumor, tcga_wgs_sample_data$Barrett.s.Esophagus.Reported)
No Not Available Yes
0 26 24 6
1 28 23 4
nrow(tcga_wgs_sample_data)
[1] 139
analysis.dat <- dat.wgs # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(otu2) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="100%")
AverageRelativeAbundance | otu2 |
---|---|
0.465 | Coprobacillus sp. D7 |
0.106 | Propionibacterium acnes |
0.074 | Haemophilus influenzae |
0.057 | Prevotella melaninogenica |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(Patient_ID),
Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/o Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.4, 0.15, 1),
guides="collect"
) +
plot_annotation(
title="TCGA WGS Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/tcga-wgs-05-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/tcga-wgs-05-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
analysis.dat <- dat.wgs # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.01 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(otu2) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="100%")
AverageRelativeAbundance | otu2 |
---|---|
0.465 | Coprobacillus sp. D7 |
0.106 | Propionibacterium acnes |
0.074 | Haemophilus influenzae |
0.057 | Prevotella melaninogenica |
0.028 | Cyanothece sp. CCY0110 |
0.027 | Fusobacterium nucleatum |
0.022 | Campylobacter concisus |
0.019 | Bradyrhizobium sp. BTAi1 |
0.016 | Bradyrhizobium diazoefficiens |
0.014 | Streptococcus pneumoniae |
0.012 | Bradyrhizobium japonicum |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(Patient_ID),
Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100],midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/o Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.4, 0.15, 1),
guides="collect"
) +
plot_annotation(
title="TCGA WGS Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/tcga-wgs-01-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/tcga-wgs-01-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
analysis.dat <- dat.wgs # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.001 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(otu2) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))%>%
dplyr::filter(AverageRelativeAbundance>=avgRelAbundCutoff) %>%
dplyr::arrange(desc(AverageRelativeAbundance))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="100%")
AverageRelativeAbundance | otu2 |
---|---|
0.465 | Coprobacillus sp. D7 |
0.106 | Propionibacterium acnes |
0.074 | Haemophilus influenzae |
0.057 | Prevotella melaninogenica |
0.028 | Cyanothece sp. CCY0110 |
0.027 | Fusobacterium nucleatum |
0.022 | Campylobacter concisus |
0.019 | Bradyrhizobium sp. BTAi1 |
0.016 | Bradyrhizobium diazoefficiens |
0.014 | Streptococcus pneumoniae |
0.012 | Bradyrhizobium japonicum |
0.010 | Bradyrhizobium sp. S23321 |
0.008 | candidate division TM7 single-cell isolate TM7a |
0.008 | Veillonella parvula |
0.008 | Streptococcus parasanguinis |
0.007 | Rhodopseudomonas palustris |
0.006 | Pseudomonas fluorescens group |
0.006 | Bradyrhizobium sp. ORS 278 |
0.006 | Streptococcus mitis |
0.005 | Beggiatoa sp. PS |
0.004 | Staphylococcus epidermidis |
0.004 | Corynebacterium tuberculostearicum |
0.004 | Streptococcus suis |
0.003 | Haemophilus parainfluenzae |
0.003 | Streptococcus pseudopneumoniae |
0.003 | Delftia sp. Cs1-4 |
0.003 | Lactobacillus gasseri |
0.003 | [Eubacterium] hallii |
0.002 | Delftia acidovorans |
0.002 | Streptococcus oralis |
0.002 | Pseudomonas putida group |
0.002 | Streptococcus thermophilus |
0.002 | Pseudomonas aeruginosa group |
0.002 | Atopobium rimae |
0.002 | Prevotella intermedia |
0.001 | candidate division TM7 single-cell isolate TM7c |
0.001 | Acidovorax delafieldii |
0.001 | Staphylococcus capitis |
0.001 | Escherichia coli |
0.001 | Oligotropha carboxidovorans |
0.001 | Acinetobacter junii |
0.001 | Acidovorax ebreus |
0.001 | Lactococcus garvieae |
0.001 | Gemmata obscuriglobus |
0.001 | Lactobacillus crispatus |
0.001 | Streptococcus gordonii |
0.001 | Xanthomonas campestris |
0.001 | Bacteroides sp. 3_1_33FAA |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
dplyr::ungroup() %>%
dplyr::filter(aveAbund>=avgRelAbundCutoff) %>%
dplyr::mutate(ID = as.factor(Patient_ID),
Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/o Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.4, 0.15, 1),
guides="collect"
) +
plot_annotation(
title="TCGA WGS Data showing average relative abundance of genera by individual",
subtitle=paste0("Subset to OTU average relative abundance > ",prntRelAbund,"%")
)
p
if(save.plots == T){
ggsave(paste0("output/tcga-wgs-001-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/tcga-wgs-001-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+ plt_hmap4+
plot_layout(
nrow=1, widths = c(0.5, 0.2, 1, 1, 1),
guides="collect"
) +
plot_annotation(
title="NCI-16s Data showing average relative abundance of genera by individual"
)
p
if(save.plots == T){
ggsave(paste0("output/nci-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/nci-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
The heatmaps for slide 9 focus on the TCGA RNAseq data. The difference here is we focus on 4 genera.
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.4, 0.15, 1),
guides="collect"
) +
plot_annotation(
title="TCGA RNAseq Data showing average relative abundance of genera by individual"
)
p
if(save.plots == T){
ggsave(paste0("output/tcga-rna-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/tcga-rna-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
The heatmaps for slide76 focus on the TCGA WGS data.
analysis.dat <- dat.wgs.s # insert dataset to be used in analysis
avgRelAbundCutoff <- 0.05 # minimum average relative abundance for OTUs
otu.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(otu2) %>%
dplyr::summarise(AverageRelativeAbundance=mean(Abundance, na.rm=T))
kable(otu.dat[,c(2,1)], format="html", digits=3) %>%
kable_styling(full_width = T)%>%
scroll_box(width="100%", height="100%")
AverageRelativeAbundance | otu2 |
---|---|
0.022 | Campylobacter concisus |
0.027 | Fusobacterium nucleatum |
0.057 | Prevotella melaninogenica |
0.006 | Streptococcus mitis |
0.002 | Streptococcus oralis |
0.008 | Streptococcus parasanguinis |
0.014 | Streptococcus pneumoniae |
0.001 | Streptococcus salivarius |
0.001 | Streptococcus sanguinis |
plot.dat <- analysis.dat %>% filter(sample_type != "0") %>%
dplyr::group_by(OTU) %>%
dplyr::mutate(aveAbund=mean(Abundance, na.rm=T)) %>%
dplyr::ungroup() %>%
dplyr::mutate(ID = as.factor(Patient_ID),
Abundance = ifelse(is.na(Abundance), 0, Abundance)) %>%
dplyr::select(sample_type, Phylum, Genus, ID, Abundance, aveAbund)
# widen plot.dat for dendro
dat.wide <- plot.dat %>%
dplyr::mutate(
ID = paste0(ID, "_",sample_type)
) %>%
dplyr::select(ID, Genus, Abundance) %>%
dplyr::group_by(ID, Genus) %>%
dplyr::summarise(
Abundance = mean(Abundance)
) %>%
tidyr::pivot_wider(
id_cols = Genus,
names_from = ID,
values_from = Abundance,
values_fill = 0
)
rn <- dat.wide$Genus
mat <- as.matrix(dat.wide[,-1])
rownames(mat) <- rn
sample_names <- colnames(mat)
# Obtain the dendrogram
dend <- as.dendrogram(hclust(dist(mat)))
dend_data <- dendro_data(dend)
# Setup the data, so that the layout is inverted (this is more
# "clear" than simply using coord_flip())
segment_data <- with(
segment(dend_data),
data.frame(x = y, y = x, xend = yend, yend = xend))
# Use the dendrogram label data to position the gene labels
gene_pos_table <- with(
dend_data$labels,
data.frame(y_center = x, gene = as.character(label), height = 1))
# Table to position the samples
sample_pos_table <- data.frame(sample = sample_names) %>%
dplyr::mutate(x_center = (1:n()),
width = 1)
# Neglecting the gap parameters
heatmap_data <- mat %>%
reshape2::melt(value.name = "expr", varnames = c("gene", "sample")) %>%
left_join(gene_pos_table) %>%
left_join(sample_pos_table)
# extract and rejoin sample IDs and sample_type names for plotting
# first for the heatmap data.frame
A <- str_split(heatmap_data$sample, "_")
heatmap_data$ID <- heatmap_data$sample_type <- "0"
for(i in 1:nrow(heatmap_data)){
heatmap_data$ID[i] <- A[[i]][1]
heatmap_data$sample_type[i] <- A[[i]][2]
}
# second for the sample position dataframe (dendo)
A <- str_split(sample_pos_table$sample, "_")
sample_pos_table$ID <- sample_pos_table$sample_type <- "0"
for(i in 1:nrow(sample_pos_table)){
sample_pos_table$ID[i] <- A[[i]][1]
sample_pos_table$sample_type[i] <- A[[i]][2]
}
# Limits for the vertical axes
gene_axis_limits <- with(
gene_pos_table,
c(min(y_center - 0.5 * height), max(y_center + 0.5 * height))
) +
0.1 * c(-1, 1) # extra spacing: 0.1
## Build Heatmap Pieces
# ESCA w/ Barrets
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap1 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
#axis.ticks.y = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid = element_blank(),
legend.position = "none")
# Part 2: "ESCA-adjacent tissue w/ Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA-adjacent tissue w/ Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA-adjacent tissue w/ Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap2 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("expr",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.1, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA-adj. w/ Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.5,vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank(),
legend.position = "none")
# Part 3: "ESCA tissues w/o Barretts History"
hmd <- filter(heatmap_data, sample_type == "ESCA tissues w/o Barretts History")
hmd$x_center <- as.numeric(as.factor(hmd$x_center))
spd <- filter(sample_pos_table, sample_type == "ESCA tissues w/o Barretts History")
spd$x_center <- as.numeric(as.factor(spd$x_center))
plt_hmap3 <- ggplot(hmd,
aes(x = x_center, y = y_center, fill = expr,
height = height, width = width)) +
geom_tile() +
#facet_wrap(.~sample_type)+
scale_fill_gradient2("Abundance",trans="sqrt", high=myColor[1], mid = myColor[50], low=myColor[100], midpoint = 0.40, breaks=c(0, 0.10, 0.30, 0.50, 0.80)) +
scale_x_continuous(breaks = spd$x_center,
labels = spd$ID,
expand = c(0, 0)) +
# For the y axis, alternatively set the labels as: gene_position_table$gene
scale_y_continuous(breaks = gene_pos_table[, "y_center"],
labels = rep("", nrow(gene_pos_table)),
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "ESCA w/o Barretts", y = NULL) +
theme_classic() +
theme(axis.text.x = element_blank(), #element_text(size = rel(1), hjust = 0.75, vjust=0.5, angle = 90),
axis.ticks.y = element_blank(),
axis.title.x = element_text(angle=90, hjust=0.75, vjust=0.5),
# margin: top, right, bottom, and left
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"),
panel.grid.minor = element_blank())
# Dendrogram plot
plt_dendr <- ggplot(segment_data) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
scale_x_reverse(expand = c(0, 0.5)) +
scale_y_continuous(breaks = gene_pos_table$y_center,
labels = gene_pos_table$gene,
limits = gene_axis_limits,
expand = c(0, 0)) +
labs(x = "", y = "", colour = "", size = "") +
theme_classic() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = unit(c(1, 0.01, 0.01, -0.7), "cm"))
prntRelAbund <- avgRelAbundCutoff*100
p <- plt_dendr+plt_hmap1+plt_hmap2+plt_hmap3+
plot_layout(
nrow=1, widths = c(0.5, 0.4, 0.15, 1),
guides="collect"
) +
plot_annotation(
title="TCGA WGS Data showing average relative abundance of genera by individual"
)
p
if(save.plots == T){
ggsave(paste0("output/slide-10-heatmap-",save.Date,".pdf"), plot=p, units="in", width=11, height=5)
ggsave(paste0("output/slide-10-heatmap-",save.Date,".png"), plot=p, units="in", width=11, height=5)
}
plot.dat <- dat.16s.s %>% filter(sample_type != "0") %>%
mutate(ID = as.factor(accession.number),
Genus = substr(Genus, 4, 1000),
Phylum = substr(Phylum, 4, 1000))%>%
dplyr::group_by(sample_type, OTU1)%>%
dplyr::summarise(
Abundance = mean(Abundance, na.rm=T)
)
p1 <- ggplot(plot.dat, aes(x=sample_type, y = Abundance, fill=OTU1)) +
geom_bar(stat="identity")+
labs(title="NCI 16s Data",
x = "Tissue Group",
y="Average Relative Abundance") +
theme_classic()+
theme(
axis.text.x = element_text(angle=30, vjust=0.95, hjust=0.9),
legend.title = element_blank()
)
p1
if(save.plots == T){
ggsave(paste0("output/slide-11-bar-chart-",save.Date,".pdf"), plot=p1, units="in", width=8, height=5)
ggsave(paste0("output/slide-11-bar-chart-",save.Date,".png"), plot=p1, units="in", width=8, height=5)
}
plot.dat <- dat.rna.s %>% filter(sample_type != "0", is.na(OTU1) == F)%>%
dplyr::group_by(sample_type, OTU1)%>%
dplyr::summarise(
Abundance = mean(Abundance, na.rm=T)
)
p1 <- ggplot(plot.dat, aes(x=sample_type, y = Abundance, fill=OTU1)) +
geom_bar(stat="identity")+
labs(title="TCGA RNAseq Data",
x = "Tissue Group",
y="Average Relative Abundance") +
theme_classic() +
theme(
axis.text.x = element_text(angle=30, vjust=0.95, hjust=0.9),
legend.title = element_blank()
)
p1
if(save.plots == T){
ggsave(paste0("output/slide-12-bar-chart-",save.Date,".pdf"), plot=p1, units="in", width=8, height=5)
ggsave(paste0("output/slide-12-bar-chart-",save.Date,".png"), plot=p1, units="in", width=8, height=5)
}
plot.dat <- dat.wgs.s %>% filter(sample_type != "0", is.na(OTU1) == F)%>%
dplyr::group_by(sample_type, OTU1)%>%
dplyr::summarise(
Abundance = mean(Abundance, na.rm=T)
)
p1 <- ggplot(plot.dat, aes(x=sample_type, y = Abundance, fill=OTU1)) +
geom_bar(stat="identity")+
labs(title="TCGA WGS Data",
x = "Tissue Group",
y="Average Relative Abundance") +
scale_y_continuous(trans="sqrt", breaks=c(0,0.01, 0.1, 0.2, 0.4, 0.6))+
theme_classic() +
theme(
axis.text.x = element_text(angle=30, vjust=0.95, hjust=0.9),
legend.title = element_blank()
)
p1
if(save.plots == T){
ggsave(paste0("output/slide-13-bar-chart-",save.Date,".pdf"), plot=p1, units="in", width=8, height=5)
ggsave(paste0("output/slide-13-bar-chart-",save.Date,".png"), plot=p1, units="in", width=8, height=5)
}
sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cowplot_1.1.1 dendextend_1.16.0 ggdendro_0.1.23 reshape2_1.4.4
[5] car_3.1-0 carData_3.0-5 gvlma_1.0.0.3 patchwork_1.1.1
[9] viridis_0.6.2 viridisLite_0.4.0 gridExtra_2.3 xtable_1.8-4
[13] kableExtra_1.3.4 MASS_7.3-58.1 data.table_1.14.2 readxl_1.4.0
[17] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4
[21] readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6
[25] tidyverse_1.3.2 lmerTest_3.1-3 lme4_1.1-30 Matrix_1.5-1
[29] vegan_2.6-2 lattice_0.20-45 permute_0.9-7 phyloseq_1.40.0
loaded via a namespace (and not attached):
[1] googledrive_2.0.0 minqa_1.2.4 colorspace_2.0-3
[4] ellipsis_0.3.2 rprojroot_2.0.3 XVector_0.36.0
[7] fs_1.5.2 rstudioapi_0.13 farver_2.1.1
[10] fansi_1.0.3 lubridate_1.8.0 xml2_1.3.3
[13] codetools_0.2-18 splines_4.2.2 cachem_1.0.6
[16] knitr_1.39 ade4_1.7-19 jsonlite_1.8.0
[19] workflowr_1.7.0 nloptr_2.0.3 broom_1.0.0
[22] cluster_2.1.4 dbplyr_2.2.1 BiocManager_1.30.18
[25] compiler_4.2.2 httr_1.4.3 backports_1.4.1
[28] assertthat_0.2.1 fastmap_1.1.0 gargle_1.2.0
[31] cli_3.4.1 later_1.3.0 htmltools_0.5.2
[34] tools_4.2.2 igraph_1.3.4 gtable_0.3.0
[37] glue_1.6.2 GenomeInfoDbData_1.2.8 Rcpp_1.0.8.3
[40] Biobase_2.56.0 cellranger_1.1.0 jquerylib_0.1.4
[43] vctrs_0.4.1 Biostrings_2.64.0 rhdf5filters_1.8.0
[46] multtest_2.52.0 svglite_2.1.0 ape_5.6-2
[49] nlme_3.1-160 iterators_1.0.14 xfun_0.31
[52] rvest_1.0.2 lifecycle_1.0.1 googlesheets4_1.0.0
[55] zlibbioc_1.42.0 scales_1.2.0 ragg_1.2.4
[58] hms_1.1.1 promises_1.2.0.1 parallel_4.2.2
[61] biomformat_1.24.0 rhdf5_2.40.0 yaml_2.3.5
[64] sass_0.4.2 stringi_1.7.6 highr_0.9
[67] S4Vectors_0.34.0 foreach_1.5.2 BiocGenerics_0.42.0
[70] boot_1.3-28 GenomeInfoDb_1.32.2 systemfonts_1.0.4
[73] rlang_1.0.6 pkgconfig_2.0.3 bitops_1.0-7
[76] evaluate_0.15 Rhdf5lib_1.18.2 labeling_0.4.2
[79] tidyselect_1.1.2 plyr_1.8.7 magrittr_2.0.3
[82] R6_2.5.1 IRanges_2.30.0 generics_0.1.3
[85] DBI_1.1.3 pillar_1.8.0 haven_2.5.0
[88] whisker_0.4 withr_2.5.0 mgcv_1.8-41
[91] abind_1.4-5 survival_3.4-0 RCurl_1.98-1.8
[94] modelr_0.1.8 crayon_1.5.1 utf8_1.2.2
[97] tzdb_0.3.0 rmarkdown_2.14 grid_4.2.2
[100] git2r_0.30.1 webshot_0.5.3 reprex_2.0.1
[103] digest_0.6.29 httpuv_1.6.5 numDeriv_2016.8-1.1
[106] textshaping_0.3.6 stats4_4.2.2 munsell_0.5.0
[109] bslib_0.4.0