Last updated: 2022-08-19

Checks: 6 1

Knit directory: esoph-micro-cancer-workflow/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200916) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version ff2197f. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data.zip
    Ignored:    data/
    Ignored:    output/Supplement Figure 2.zip

Unstaged changes:
    Modified:   analysis/supplemental_figure2.Rmd
    Modified:   output/supplemental_figure2C_NCI_campy.pdf
    Modified:   output/supplemental_figure2C_NCI_campy.png
    Modified:   output/supplemental_figure2C_NCI_combined.pdf
    Modified:   output/supplemental_figure2C_NCI_combined.png
    Modified:   output/supplemental_figure2C_NCI_fuso.pdf
    Modified:   output/supplemental_figure2C_NCI_fuso.png
    Modified:   output/supplemental_figure2C_NCI_prevo.pdf
    Modified:   output/supplemental_figure2C_NCI_prevo.png
    Modified:   output/supplemental_figure2C_NCI_strepto.pdf
    Modified:   output/supplemental_figure2C_NCI_strepto.png
    Modified:   output/supplemental_figure2C_tcga_rna_campy.pdf
    Modified:   output/supplemental_figure2C_tcga_rna_campy.png
    Modified:   output/supplemental_figure2C_tcga_rna_combined.pdf
    Modified:   output/supplemental_figure2C_tcga_rna_combined.png
    Modified:   output/supplemental_figure2C_tcga_rna_fuso.pdf
    Modified:   output/supplemental_figure2C_tcga_rna_fuso.png
    Modified:   output/supplemental_figure2C_tcga_rna_prevo.pdf
    Modified:   output/supplemental_figure2C_tcga_rna_prevo.png
    Modified:   output/supplemental_figure2C_tcga_rna_strepto.pdf
    Modified:   output/supplemental_figure2C_tcga_rna_strepto.png
    Modified:   output/supplemental_figure2C_tcga_wgs_campy.pdf
    Modified:   output/supplemental_figure2C_tcga_wgs_campy.png
    Modified:   output/supplemental_figure2C_tcga_wgs_combined.pdf
    Modified:   output/supplemental_figure2C_tcga_wgs_combined.png
    Modified:   output/supplemental_figure2C_tcga_wgs_fuso.pdf
    Modified:   output/supplemental_figure2C_tcga_wgs_fuso.png
    Modified:   output/supplemental_figure2C_tcga_wgs_prevo.pdf
    Modified:   output/supplemental_figure2C_tcga_wgs_prevo.png
    Modified:   output/supplemental_figure2C_tcga_wgs_strepto.pdf
    Modified:   output/supplemental_figure2C_tcga_wgs_strepto.png
    Modified:   output/supplemental_figure2D_NCI_campy.pdf
    Modified:   output/supplemental_figure2D_NCI_campy.png
    Modified:   output/supplemental_figure2D_NCI_combined.pdf
    Modified:   output/supplemental_figure2D_NCI_combined.png
    Modified:   output/supplemental_figure2D_NCI_fuso.pdf
    Modified:   output/supplemental_figure2D_NCI_fuso.png
    Modified:   output/supplemental_figure2D_NCI_prevo.pdf
    Modified:   output/supplemental_figure2D_NCI_prevo.png
    Modified:   output/supplemental_figure2D_NCI_strepto.pdf
    Modified:   output/supplemental_figure2D_NCI_strepto.png
    Modified:   output/supplemental_figure2D_tcga_rna_campy.pdf
    Modified:   output/supplemental_figure2D_tcga_rna_campy.png
    Modified:   output/supplemental_figure2D_tcga_rna_combined.pdf
    Modified:   output/supplemental_figure2D_tcga_rna_combined.png
    Modified:   output/supplemental_figure2D_tcga_rna_fuso.pdf
    Modified:   output/supplemental_figure2D_tcga_rna_fuso.png
    Modified:   output/supplemental_figure2D_tcga_rna_prevo.pdf
    Modified:   output/supplemental_figure2D_tcga_rna_prevo.png
    Modified:   output/supplemental_figure2D_tcga_rna_strepto.pdf
    Modified:   output/supplemental_figure2D_tcga_rna_strepto.png
    Modified:   output/supplemental_figure2D_tcga_wgs_campy.pdf
    Modified:   output/supplemental_figure2D_tcga_wgs_campy.png
    Modified:   output/supplemental_figure2D_tcga_wgs_combined.pdf
    Modified:   output/supplemental_figure2D_tcga_wgs_combined.png
    Modified:   output/supplemental_figure2D_tcga_wgs_fuso.pdf
    Modified:   output/supplemental_figure2D_tcga_wgs_fuso.png
    Modified:   output/supplemental_figure2D_tcga_wgs_prevo.pdf
    Modified:   output/supplemental_figure2D_tcga_wgs_prevo.png
    Modified:   output/supplemental_figure2D_tcga_wgs_strepto.pdf
    Modified:   output/supplemental_figure2D_tcga_wgs_strepto.png
    Modified:   output/supplemental_figure2E_NCI_campy.pdf
    Modified:   output/supplemental_figure2E_NCI_campy.png
    Modified:   output/supplemental_figure2E_NCI_combined.pdf
    Modified:   output/supplemental_figure2E_NCI_combined.png
    Modified:   output/supplemental_figure2E_NCI_fuso.pdf
    Modified:   output/supplemental_figure2E_NCI_fuso.png
    Modified:   output/supplemental_figure2E_NCI_prevo.pdf
    Modified:   output/supplemental_figure2E_NCI_prevo.png
    Modified:   output/supplemental_figure2E_NCI_strepto.pdf
    Modified:   output/supplemental_figure2E_NCI_strepto.png
    Modified:   output/supplemental_figure2E_tcga_rna_campy.pdf
    Modified:   output/supplemental_figure2E_tcga_rna_campy.png
    Modified:   output/supplemental_figure2E_tcga_rna_combined.pdf
    Modified:   output/supplemental_figure2E_tcga_rna_combined.png
    Modified:   output/supplemental_figure2E_tcga_rna_fuso.pdf
    Modified:   output/supplemental_figure2E_tcga_rna_fuso.png
    Modified:   output/supplemental_figure2E_tcga_rna_prevo.pdf
    Modified:   output/supplemental_figure2E_tcga_rna_prevo.png
    Modified:   output/supplemental_figure2E_tcga_rna_strepto.pdf
    Modified:   output/supplemental_figure2E_tcga_rna_strepto.png
    Modified:   output/supplemental_figure2E_tcga_wgs_campy.pdf
    Modified:   output/supplemental_figure2E_tcga_wgs_campy.png
    Modified:   output/supplemental_figure2E_tcga_wgs_combined.pdf
    Modified:   output/supplemental_figure2E_tcga_wgs_combined.png
    Modified:   output/supplemental_figure2E_tcga_wgs_fuso.pdf
    Modified:   output/supplemental_figure2E_tcga_wgs_fuso.png
    Modified:   output/supplemental_figure2E_tcga_wgs_prevo.pdf
    Modified:   output/supplemental_figure2E_tcga_wgs_prevo.png
    Modified:   output/supplemental_figure2E_tcga_wgs_strepto.pdf
    Modified:   output/supplemental_figure2E_tcga_wgs_strepto.png
    Modified:   output/supplemental_figure2F_NCI_campy.pdf
    Modified:   output/supplemental_figure2F_NCI_campy.png
    Modified:   output/supplemental_figure2F_NCI_combined.pdf
    Modified:   output/supplemental_figure2F_NCI_combined.png
    Modified:   output/supplemental_figure2F_NCI_fuso.pdf
    Modified:   output/supplemental_figure2F_NCI_fuso.png
    Modified:   output/supplemental_figure2F_NCI_prevo.pdf
    Modified:   output/supplemental_figure2F_NCI_prevo.png
    Modified:   output/supplemental_figure2F_NCI_strepto.pdf
    Modified:   output/supplemental_figure2F_NCI_strepto.png
    Modified:   output/supplemental_figure2F_tcga_rna_campy.pdf
    Modified:   output/supplemental_figure2F_tcga_rna_campy.png
    Modified:   output/supplemental_figure2F_tcga_rna_combined.pdf
    Modified:   output/supplemental_figure2F_tcga_rna_combined.png
    Modified:   output/supplemental_figure2F_tcga_rna_fuso.pdf
    Modified:   output/supplemental_figure2F_tcga_rna_fuso.png
    Modified:   output/supplemental_figure2F_tcga_rna_prevo.pdf
    Modified:   output/supplemental_figure2F_tcga_rna_prevo.png
    Modified:   output/supplemental_figure2F_tcga_rna_strepto.pdf
    Modified:   output/supplemental_figure2F_tcga_rna_strepto.png
    Modified:   output/supplemental_figure2F_tcga_wgs_campy.pdf
    Modified:   output/supplemental_figure2F_tcga_wgs_campy.png
    Modified:   output/supplemental_figure2F_tcga_wgs_combined.pdf
    Modified:   output/supplemental_figure2F_tcga_wgs_combined.png
    Modified:   output/supplemental_figure2F_tcga_wgs_fuso.pdf
    Modified:   output/supplemental_figure2F_tcga_wgs_fuso.png
    Modified:   output/supplemental_figure2F_tcga_wgs_prevo.pdf
    Modified:   output/supplemental_figure2F_tcga_wgs_prevo.png
    Modified:   output/supplemental_figure2F_tcga_wgs_strepto.pdf
    Modified:   output/supplemental_figure2F_tcga_wgs_strepto.png

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/supplemental_figure2.Rmd) and HTML (docs/supplemental_figure2.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd ff2197f noah-padgett 2022-08-15 updated figure dim for workflow
html ff2197f noah-padgett 2022-08-15 updated figure dim for workflow
html 72212e0 noah-padgett 2022-08-15 Update website to include supp fig 2
Rmd cb1cd82 noah-padgett 2022-08-15 Updated sup figure 2 parts

Histology (+/- Barretts)

#root function
root<-function(x){
  x <- ifelse(x < 0, 0, x)
  x**(0.25)
}
#inverse root function
invroot<-function(x){
  x**(4)
}
DIM <- c(6, 4)

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number),
                Barretts = ifelse(`Barretts.`=="Y",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)

dat <- dat.rna.s %>% 
  dplyr::mutate(Barretts = ifelse(Barrett.s.Esophagus.Reported=="Yes",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Barretts")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Barretts = ifelse(Barrett.s.Esophagus.Reported=="Yes",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor"),
    Barretts = ifelse(Barretts == 1, "Yes", "No")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Barretts")
TITLE_P1 <- c("NCI 16s Data", "TCGA RNAseq Data", "TCGA WGS Data")
TITLE_P2 <- c("Between Barretts Status", "Between Gender", "Across Races", "Across Stages")
SUBTITLE <-c("Combined across bacteria", "Fusobacterium nucleatum", "Prevotella melaninogenica", "Campylobacter concisus", 'Streptococcus sanguinis')

test_results <- expand.grid(TITLE_P1, SUBTITLE, TITLE_P2)
colnames(test_results) <- c("Data", "Bacteria", "Outcome")
test_results$est <- NA
test_results$pvalue <- NA

i <- 1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 49993, p-value = 0.7934
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -8.872113e-05  8.362287e-05
sample estimates:
difference in location 
          9.397832e-06 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value


p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 120 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 137 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 127 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 14528, p-value = 0.04244
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 3.138475e-05 6.413702e-04
sample estimates:
difference in location 
          4.994654e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(source=="rna", !is.na(Barretts))%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 828 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 822 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 818 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 21643, p-value = 0.03722
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 7.669214e-05 6.163255e-06
sample estimates:
difference in location 
          1.499786e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(source=="wgs", !is.na(Barretts))%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 321 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 304 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 337 rows containing missing values (geom_point).

Subset by Bacterium

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number),
                Barretts = ifelse(`Barretts.`=="Y",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)

dat <- dat.rna.s %>% 
  dplyr::mutate(Barretts = ifelse(Barrett.s.Esophagus.Reported=="Yes",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Barretts")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Barretts = ifelse(Barrett.s.Esophagus.Reported=="Yes",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor"),
    Barretts = ifelse(Barretts == 1, "Yes", "No")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Barretts")
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Fusobacterium nucleatum")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 3070.5, p-value = 0.9471
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -8.959180e-06  6.090329e-05
sample estimates:
difference in location 
         -2.198988e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 44 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 40 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 45 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Fusobacterium nucleatum")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 305, p-value = 0.3695
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.004713574  0.010418841
sample estimates:
difference in location 
           0.001023278 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 112 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 109 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Removed 109 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Fusobacterium nucleatum")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 436, p-value = 0.5109
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -7.669283e-05  1.113313e-01
sample estimates:
difference in location 
          4.479859e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 42 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 30 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 36 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 3094, p-value = 0.9859
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.4039867  0.3999948
sample estimates:
difference in location 
          9.583038e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 22 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 25 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Prevotella melaninogenica")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 334, p-value = 0.15
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.0006693769  0.0297048288
sample estimates:
difference in location 
           0.003703834 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 109 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 109 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Prevotella melaninogenica")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 511, p-value = 0.1088
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -4.578309e-05  5.374254e-01
sample estimates:
difference in location 
            0.03632853 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 31 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 33 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Removed 33 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Campylobacter concisus")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 3002, p-value = 0.6722
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -3.984863e-05  3.951250e-06
sample estimates:
difference in location 
         -1.818304e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 70 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 67 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 65 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Campylobacter concisus")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 324, p-value = 0.1411
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -1.492392e-05  4.003460e-04
sample estimates:
difference in location 
          2.641329e-06 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 134 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 132 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 125 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Campylobacter concisus")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 426, p-value = 0.5648
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -1.673192e-05  3.794818e-03
sample estimates:
difference in location 
          2.825405e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 52 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 42 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 49 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Streptococcus sanguinis")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 3433.5, p-value = 0.2285
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -2.000018 12.347244
sample estimates:
difference in location 
              3.399997 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 3 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 4 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 6 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Streptococcus sanguinis")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 283, p-value = 0.6267
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.003153229  0.005443557
sample estimates:
difference in location 
           0.001077408 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 110 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Streptococcus sanguinis")
m1<-wilcox.test(Abund ~ Barretts, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Barretts
W = 378.5, p-value = 0.9064
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -8.037787e-04  6.937269e-06
sample estimates:
difference in location 
         -1.328278e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[1]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 41 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

Sex

#root function
root<-function(x){
  x <- ifelse(x < 0, 0, x)
  x**(0.25)
}
#inverse root function
invroot<-function(x){
  x**(4)
}
DIM <- c(6, 4)

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number),
                Gender = ifelse(gender=="M","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)

dat <- dat.rna.s %>% 
  dplyr::mutate(Gender = ifelse(Gender=="male","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Gender")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Gender = ifelse(Gender=="male","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Gender")
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 31491, p-value = 0.6573
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -3.648190e-05  1.075892e-05
sample estimates:
difference in location 
          4.919689e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 122 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 137 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 129 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 16184, p-value = 0.1944
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -8.866505e-06  6.907441e-04
sample estimates:
difference in location 
          4.812845e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(source=="rna", !is.na(Gender))%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 817 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 840 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 823 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 39218, p-value = 0.002236
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 5.086163e-05 1.707932e-06
sample estimates:
difference in location 
          2.790151e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(source=="wgs", !is.na(Gender))%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 322 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 305 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 314 rows containing missing values (geom_point).

Subset by Bacterium

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number),
                Gender = ifelse(gender=="M","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)

dat <- dat.rna.s %>% 
  dplyr::mutate(Gender = ifelse(Gender=="male","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Gender")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Gender = ifelse(Gender=="male","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Gender")
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Fusobacterium nucleatum")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 1943, p-value = 0.9138
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -6.533984e-05  3.029709e-05
sample estimates:
difference in location 
          3.441841e-06 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 39 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 39 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 39 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Fusobacterium nucleatum")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 377, p-value = 0.2027
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.003382977  0.014580208
sample estimates:
difference in location 
           0.002984843 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 112 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 110 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Fusobacterium nucleatum")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 927, p-value = 0.01802
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 2.963471e-05 4.854826e-01
sample estimates:
difference in location 
             0.1363069 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 42 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 34 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 41 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 2251.5, p-value = 0.1379
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -2.685771e-05  2.199962e+00
sample estimates:
difference in location 
              0.599959 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 26 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 22 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 19 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Prevotella melaninogenica")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 297, p-value = 0.9314
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.01169006  0.01106448
sample estimates:
difference in location 
         -7.961707e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 110 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Removed 110 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Prevotella melaninogenica")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 848, p-value = 0.1339
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -1.513898e-05  5.145027e+00
sample estimates:
difference in location 
             0.3345322 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 29 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 32 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 36 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Campylobacter concisus")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 1896, p-value = 0.8834
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -4.967137e-05  2.342579e-05
sample estimates:
difference in location 
         -6.464722e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 62 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 63 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 65 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Campylobacter concisus")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 314, p-value = 0.8239
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -3.552421e-05  4.297188e-05
sample estimates:
difference in location 
          3.558703e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 123 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 134 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 129 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Campylobacter concisus")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 938.5, p-value = 0.006179
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 7.95184e-05 3.09854e-02
sample estimates:
difference in location 
          0.0009054198 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 41 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 48 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Removed 48 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Streptococcus sanguinis")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 1859.5, p-value = 0.7902
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -8.980748  6.799926
sample estimates:
difference in location 
            -0.7999949 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 5 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 5 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 2 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Streptococcus sanguinis")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 314, p-value = 0.8497
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.003569662  0.003880488
sample estimates:
difference in location 
          0.0002583106 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 108 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 109 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Streptococcus sanguinis")
m1<-wilcox.test(Abund ~ Gender, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Gender
W = 683, p-value = 0.9169
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -4.837276e-05  4.855190e-05
sample estimates:
difference in location 
         -2.983699e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[2]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 49 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 46 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

Race

#root function
root<-function(x){
  x <- ifelse(x < 0, 0, x)
  x**(0.25)
}
#inverse root function
invroot<-function(x){
  x**(4)
}
DIM <- c(6, 4)

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)

dat <- dat.rna.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Race")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Race")
analysis.dat$Race[analysis.dat$Race == "asian"] <- NA
analysis.dat$Race[analysis.dat$Race == "B"] <- "AA"
analysis.dat$Race[analysis.dat$Race == "black or african american"] <- "AA"
analysis.dat$Race[analysis.dat$Race == "H"] <- NA
analysis.dat$Race[analysis.dat$Race == "O"] <- NA
analysis.dat$Race[analysis.dat$Race == "W"] <- "EA"
analysis.dat$Race[analysis.dat$Race == "white"] <- "EA"

i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="16s")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 15296, p-value = 0.8229
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -6.877766e-02  3.687653e-05
sample estimates:
difference in location 
         -2.061302e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 133 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 134 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 130 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="rna")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 1620, p-value = 0.002684
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 0.002083263 0.154741366
sample estimates:
difference in location 
             0.0851424 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(source=="rna", !is.na(Race))%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 532 rows containing non-finite values (stat_ydensity).
Warning: Removed 583 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 532 rows containing non-finite values (stat_ydensity).
Warning: Removed 577 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 532 rows containing non-finite values (stat_ydensity).
Removed 577 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 4875, p-value = 0.074
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -6.932770e-06  3.222919e-02
sample estimates:
difference in location 
          3.318886e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(source=="wgs", !is.na(Race))%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 77 rows containing non-finite values (stat_ydensity).
Warning: Removed 180 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 77 rows containing non-finite values (stat_ydensity).
Warning: Removed 205 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 77 rows containing non-finite values (stat_ydensity).
Warning: Removed 202 rows containing missing values (geom_point).

Subset by Bacterium

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)

dat <- dat.rna.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Race")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Race")
analysis.dat$Race[analysis.dat$Race == "asian"] <- NA
analysis.dat$Race[analysis.dat$Race == "B"] <- "AA"
analysis.dat$Race[analysis.dat$Race == "black or african american"] <- "AA"
analysis.dat$Race[analysis.dat$Race == "H"] <- NA
analysis.dat$Race[analysis.dat$Race == "O"] <- NA
analysis.dat$Race[analysis.dat$Race == "W"] <- "EA"
analysis.dat$Race[analysis.dat$Race == "white"] <- "EA"


i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Fusobacterium nucleatum")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 970, p-value = 0.9862
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.1999612  0.1979485
sample estimates:
difference in location 
         -4.020351e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 46 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 36 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 41 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Fusobacterium nucleatum")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 38, p-value = 0.1391
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -20.323901   4.521115
sample estimates:
difference in location 
              4.515127 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 78 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 76 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 77 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Fusobacterium nucleatum")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 95.5, p-value = 0.6489
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.4853847  9.3518385
sample estimates:
difference in location 
            0.04431116 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 26 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 24 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 26 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 911.5, p-value = 0.696
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -1.6108658  0.2000605
sample estimates:
difference in location 
         -1.826919e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 17 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 23 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 24 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", , OTU == "Prevotella melaninogenica")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 39, p-value = 0.1178
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.3855819  0.7931673
sample estimates:
difference in location 
             0.7877327 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 78 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 78 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 78 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Prevotella melaninogenica")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 82.5, p-value = 1
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.741561 19.890186
sample estimates:
difference in location 
           5.70719e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 19 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 21 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Removed 21 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Campylobacter concisus")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 1058, p-value = 0.4462
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -5.697284e-05  6.744310e-06
sample estimates:
difference in location 
          1.455216e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 61 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 57 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 69 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Campylobacter concisus")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 39, p-value = 0.06438
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.02223931  0.03198255
sample estimates:
difference in location 
            0.03198255 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 86 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 92 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 89 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Campylobacter concisus")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 80, p-value = 0.9358
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.02889812  0.13061982
sample estimates:
difference in location 
         -4.615721e-05 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 26 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 27 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 30 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Streptococcus sanguinis")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 879, p-value = 0.554
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -17.199936   7.599966
sample estimates:
difference in location 
             -2.552039 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 3 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 2 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 2 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Streptococcus sanguinis")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 12, p-value = 0.526
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.19667754  0.00213217
sample estimates:
difference in location 
          -0.003235715 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 77 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 77 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 76 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 78 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Streptococcus sanguinis")
m1<-wilcox.test(Abund ~ Race, data=d, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
m1

    Wilcoxon rank sum test with continuity correction

data:  Abund by Race
W = 116.5, p-value = 0.1514
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.0000364912  0.0321872928
sample estimates:
difference in location 
           0.008332025 
test_results[i,4] <- m1$estimate
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[3]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=1.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=2,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=2, xend=2,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 29 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Removed 29 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 11 rows containing non-finite values (stat_ydensity).
Warning: Removed 34 rows containing missing values (geom_point).

Stage

#root function
root<-function(x){
  x <- ifelse(x < 0, 0, x)
  x**(0.25)
}
#inverse root function
invroot<-function(x){
  x**(4)
}
DIM <- c(6, 4)

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)

dat <- dat.rna.s %>% 
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"tumor.stage")
dat <- dat.wgs.s %>% 
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor"),
    Tumor_Stage = tumor.stage
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"tumor.stage")
analysis.dat$Tumor_Stage[analysis.dat$Tumor_Stage == "1"] <- "I"

i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 4.1193, df = 4, p-value = 0.3901
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value

p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=5,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=5, xend=5,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 127 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 126 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 129 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 5.8932, df = 3, p-value = 0.1169
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(source=="rna", !is.na(Tumor_Stage))%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 658 rows containing non-finite values (stat_ydensity).
Warning: Removed 728 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 658 rows containing non-finite values (stat_ydensity).
Warning: Removed 730 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 658 rows containing non-finite values (stat_ydensity).
Warning: Removed 734 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 10.886, df = 3, p-value = 0.01236
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(source=="wgs", !is.na(Tumor_Stage))%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[1])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 98 rows containing non-finite values (stat_ydensity).
Warning: Removed 276 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 98 rows containing non-finite values (stat_ydensity).
Warning: Removed 282 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 98 rows containing non-finite values (stat_ydensity).
Warning: Removed 295 rows containing missing values (geom_point).

Subset by Bacterium

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)

dat <- dat.rna.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"tumor.stage")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor"),
    Tumor_Stage = tumor.stage
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"tumor.stage")
analysis.dat$Tumor_Stage[analysis.dat$Tumor_Stage == "1"] <- "I"

i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Fusobacterium nucleatum")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 6.5465, df = 4, p-value = 0.1619
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=5,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=5, xend=5,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 46 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 46 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 40 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Fusobacterium nucleatum")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 6.1378, df = 3, p-value = 0.1051
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 97 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Removed 97 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 96 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Fusobacterium nucleatum")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 5.2948, df = 3, p-value = 0.1514
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[2])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 38 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 35 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 36 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 2.7637, df = 4, p-value = 0.5981
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=5,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=5, xend=5,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 20 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 18 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Prevotella melaninogenica")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 9.3195, df = 3, p-value = 0.02533
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 97 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 96 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 97 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Prevotella melaninogenica")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 2.5034, df = 3, p-value = 0.4747
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[3])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 31 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 29 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 34 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Campylobacter concisus")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 4.8348, df = 4, p-value = 0.3047
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=5,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=5, xend=5,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 66 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 57 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 66 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Campylobacter concisus")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 2.5152, df = 3, p-value = 0.4725
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 112 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 118 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 103 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Campylobacter concisus")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 1.014, df = 3, p-value = 0.7979
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[4])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 39 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 41 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 48 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Streptococcus sanguinis")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 5.9573, df = 4, p-value = 0.2024
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[1]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=5,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=5, xend=5,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 6 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 4 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 4 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Streptococcus sanguinis")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 5.817, df = 3, p-value = 0.1209
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[2]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 97 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 96 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Removed 96 rows containing missing values (geom_point).
i <- i+1
d <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Streptococcus sanguinis")
m1<-kruskal.test(Abund ~ Tumor_Stage, data=d)
m1

    Kruskal-Wallis rank sum test

data:  Abund by Tumor_Stage
Kruskal-Wallis chi-squared = 2.9396, df = 3, p-value = 0.401
test_results[i,4] <- m1$statistic
test_results[i,5] <- m1$p.value
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)",
         title=paste0(TITLE_P1[3]," Bacteria Relative Abundance ",TITLE_P2[4]),
         subtitle=SUBTITLE[5])+
    annotate("text", x=2.5, y=90, label=paste0("p=",round(test_results$pvalue[i],4)))+
    geom_segment(aes(x=1, xend=4,y=105,yend=105))+
    geom_segment(aes(x=1, xend=1,y=109,yend=100))+
    geom_segment(aes(x=4, xend=4,y=109,yend=100))+
    theme_classic()
p
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 42 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

Statistical Significance

test_results$sig <- ""
for(i in 1:nrow(test_results)){
  test_results[i,6] <- ifelse(test_results[i,5] < .05, "*", "")
}

kable(test_results, digits=4, format="html")%>%
  kable_styling(full_width = T) %>%
  scroll_box(width="100%", height="600px")
Data Bacteria Outcome est pvalue sig
NCI 16s Data Combined across bacteria Between Barretts Status 0.0000 0.7934
TCGA RNAseq Data Combined across bacteria Between Barretts Status 0.0000 0.0424
TCGA WGS Data Combined across bacteria Between Barretts Status 0.0000 0.0372
NCI 16s Data Fusobacterium nucleatum Between Barretts Status 0.0000 0.9471
TCGA RNAseq Data Fusobacterium nucleatum Between Barretts Status 0.0010 0.3695
TCGA WGS Data Fusobacterium nucleatum Between Barretts Status 0.0000 0.5109
NCI 16s Data Prevotella melaninogenica Between Barretts Status 0.0001 0.9859
TCGA RNAseq Data Prevotella melaninogenica Between Barretts Status 0.0037 0.1500
TCGA WGS Data Prevotella melaninogenica Between Barretts Status 0.0363 0.1088
NCI 16s Data Campylobacter concisus Between Barretts Status 0.0000 0.6722
TCGA RNAseq Data Campylobacter concisus Between Barretts Status 0.0000 0.1411
TCGA WGS Data Campylobacter concisus Between Barretts Status 0.0000 0.5648
NCI 16s Data Streptococcus sanguinis Between Barretts Status 3.4000 0.2285
TCGA RNAseq Data Streptococcus sanguinis Between Barretts Status 0.0011 0.6267
TCGA WGS Data Streptococcus sanguinis Between Barretts Status 0.0000 0.9064
NCI 16s Data Combined across bacteria Between Gender 0.0000 0.6573
TCGA RNAseq Data Combined across bacteria Between Gender 0.0000 0.1944
TCGA WGS Data Combined across bacteria Between Gender 0.0000 0.0022
NCI 16s Data Fusobacterium nucleatum Between Gender 0.0000 0.9138
TCGA RNAseq Data Fusobacterium nucleatum Between Gender 0.0030 0.2027
TCGA WGS Data Fusobacterium nucleatum Between Gender 0.1363 0.0180
NCI 16s Data Prevotella melaninogenica Between Gender 0.6000 0.1379
TCGA RNAseq Data Prevotella melaninogenica Between Gender -0.0001 0.9314
TCGA WGS Data Prevotella melaninogenica Between Gender 0.3345 0.1339
NCI 16s Data Campylobacter concisus Between Gender -0.0001 0.8834
TCGA RNAseq Data Campylobacter concisus Between Gender 0.0000 0.8239
TCGA WGS Data Campylobacter concisus Between Gender 0.0009 0.0062
NCI 16s Data Streptococcus sanguinis Between Gender -0.8000 0.7902
TCGA RNAseq Data Streptococcus sanguinis Between Gender 0.0003 0.8497
TCGA WGS Data Streptococcus sanguinis Between Gender 0.0000 0.9169
NCI 16s Data Combined across bacteria Across Races 0.0000 0.8229
TCGA RNAseq Data Combined across bacteria Across Races 0.0851 0.0027
TCGA WGS Data Combined across bacteria Across Races 0.0000 0.0740
NCI 16s Data Fusobacterium nucleatum Across Races 0.0000 0.9862
TCGA RNAseq Data Fusobacterium nucleatum Across Races 4.5151 0.1391
TCGA WGS Data Fusobacterium nucleatum Across Races 0.0443 0.6489
NCI 16s Data Prevotella melaninogenica Across Races 0.0000 0.6960
TCGA RNAseq Data Prevotella melaninogenica Across Races 0.7877 0.1178
TCGA WGS Data Prevotella melaninogenica Across Races 0.0001 1.0000
NCI 16s Data Campylobacter concisus Across Races 0.0000 0.4462
TCGA RNAseq Data Campylobacter concisus Across Races 0.0320 0.0644
TCGA WGS Data Campylobacter concisus Across Races 0.0000 0.9358
NCI 16s Data Streptococcus sanguinis Across Races -2.5520 0.5540
TCGA RNAseq Data Streptococcus sanguinis Across Races -0.0032 0.5260
TCGA WGS Data Streptococcus sanguinis Across Races 0.0083 0.1514
NCI 16s Data Combined across bacteria Across Stages 4.1193 0.3901
TCGA RNAseq Data Combined across bacteria Across Stages 5.8932 0.1169
TCGA WGS Data Combined across bacteria Across Stages 10.8856 0.0124
NCI 16s Data Fusobacterium nucleatum Across Stages 6.5465 0.1619
TCGA RNAseq Data Fusobacterium nucleatum Across Stages 6.1378 0.1051
TCGA WGS Data Fusobacterium nucleatum Across Stages 5.2948 0.1514
NCI 16s Data Prevotella melaninogenica Across Stages 2.7637 0.5981
TCGA RNAseq Data Prevotella melaninogenica Across Stages 9.3195 0.0253
TCGA WGS Data Prevotella melaninogenica Across Stages 2.5034 0.4747
NCI 16s Data Campylobacter concisus Across Stages 4.8348 0.3047
TCGA RNAseq Data Campylobacter concisus Across Stages 2.5152 0.4725
TCGA WGS Data Campylobacter concisus Across Stages 1.0140 0.7979
NCI 16s Data Streptococcus sanguinis Across Stages 5.9573 0.2024
TCGA RNAseq Data Streptococcus sanguinis Across Stages 5.8170 0.1209
TCGA WGS Data Streptococcus sanguinis Across Stages 2.9396 0.4010

sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.1     dendextend_1.16.0 ggdendro_0.1.23   reshape2_1.4.4   
 [5] car_3.1-0         carData_3.0-5     gvlma_1.0.0.3     patchwork_1.1.1  
 [9] viridis_0.6.2     viridisLite_0.4.0 gridExtra_2.3     xtable_1.8-4     
[13] kableExtra_1.3.4  MASS_7.3-56       data.table_1.14.2 readxl_1.4.0     
[17] forcats_0.5.1     stringr_1.4.0     dplyr_1.0.9       purrr_0.3.4      
[21] readr_2.1.2       tidyr_1.2.0       tibble_3.1.7      ggplot2_3.3.6    
[25] tidyverse_1.3.2   lmerTest_3.1-3    lme4_1.1-30       Matrix_1.4-1     
[29] vegan_2.6-2       lattice_0.20-45   permute_0.9-7     phyloseq_1.40.0  
[33] workflowr_1.7.0  

loaded via a namespace (and not attached):
  [1] googledrive_2.0.0      minqa_1.2.4            colorspace_2.0-3      
  [4] ellipsis_0.3.2         rprojroot_2.0.3        XVector_0.36.0        
  [7] fs_1.5.2               rstudioapi_0.13        farver_2.1.1          
 [10] fansi_1.0.3            lubridate_1.8.0        xml2_1.3.3            
 [13] codetools_0.2-18       splines_4.2.0          cachem_1.0.6          
 [16] knitr_1.39             ade4_1.7-19            jsonlite_1.8.0        
 [19] nloptr_2.0.3           broom_1.0.0            cluster_2.1.3         
 [22] dbplyr_2.2.1           BiocManager_1.30.18    compiler_4.2.0        
 [25] httr_1.4.3             backports_1.4.1        assertthat_0.2.1      
 [28] fastmap_1.1.0          gargle_1.2.0           cli_3.3.0             
 [31] later_1.3.0            htmltools_0.5.2        tools_4.2.0           
 [34] igraph_1.3.4           gtable_0.3.0           glue_1.6.2            
 [37] GenomeInfoDbData_1.2.8 Rcpp_1.0.8.3           Biobase_2.56.0        
 [40] cellranger_1.1.0       jquerylib_0.1.4        vctrs_0.4.1           
 [43] Biostrings_2.64.0      rhdf5filters_1.8.0     multtest_2.52.0       
 [46] svglite_2.1.0          ape_5.6-2              nlme_3.1-157          
 [49] iterators_1.0.14       xfun_0.31              ps_1.7.0              
 [52] rvest_1.0.2            lifecycle_1.0.1        googlesheets4_1.0.0   
 [55] getPass_0.2-2          zlibbioc_1.42.0        scales_1.2.0          
 [58] hms_1.1.1              promises_1.2.0.1       parallel_4.2.0        
 [61] biomformat_1.24.0      rhdf5_2.40.0           yaml_2.3.5            
 [64] sass_0.4.2             stringi_1.7.6          highr_0.9             
 [67] S4Vectors_0.34.0       foreach_1.5.2          BiocGenerics_0.42.0   
 [70] boot_1.3-28            GenomeInfoDb_1.32.2    systemfonts_1.0.4     
 [73] rlang_1.0.2            pkgconfig_2.0.3        bitops_1.0-7          
 [76] evaluate_0.15          Rhdf5lib_1.18.2        processx_3.7.0        
 [79] tidyselect_1.1.2       plyr_1.8.7             magrittr_2.0.3        
 [82] R6_2.5.1               IRanges_2.30.0         generics_0.1.3        
 [85] DBI_1.1.3              withr_2.5.0            pillar_1.8.0          
 [88] haven_2.5.0            whisker_0.4            mgcv_1.8-40           
 [91] abind_1.4-5            survival_3.3-1         RCurl_1.98-1.8        
 [94] modelr_0.1.8           crayon_1.5.1           utf8_1.2.2            
 [97] tzdb_0.3.0             rmarkdown_2.14         grid_4.2.0            
[100] callr_3.7.1            git2r_0.30.1           webshot_0.5.3         
[103] reprex_2.0.1           digest_0.6.29          httpuv_1.6.5          
[106] numDeriv_2016.8-1.1    stats4_4.2.0           munsell_0.5.0         
[109] bslib_0.4.0