别再手动调色了！用R语言ggplot2一键绘制TBtools GO富集分析结果图（附完整配色代码）-编程阁

告别手动调色：用ggplot2打造专业级GO富集分析可视化

每次看到那些配色混乱、标签重叠的GO富集分析图，我就想起自己研究生时期被导师打回重画的经历。生物信息学分析结果的可视化，尤其是准备投稿论文的图表，往往需要耗费研究者大量时间在细节调整上。今天，我将分享一套完整的R语言ggplot2工作流，帮助你从TBtools的文本输出快速生成可直接用于发表的精美图表。

1. 数据准备与预处理

在开始绘图前，我们需要对TBtools输出的GO富集分析结果进行适当整理。假设你已经得到了一个名为"GO.Enrichment.final.txt"的制表符分隔文件，包含GO条目名称、分类、p值、基因数等信息。

library(dplyr) library(tidyr) # 读取TBtools输出文件 raw_data <- read.table("GO.Enrichment.final.txt", sep = "\t", header = TRUE, stringsAsFactors = FALSE, check.names = FALSE) # 计算关键指标 processed_data <- raw_data %>% mutate( GeneRatio = HitsGenesCountsInSelectedSet / AllGenesCountsInSelectedSet, BgRatio = HitsGenesCountsInSelectedSet / AllGenesCountsInBackground, logP = -log10(`corrected p-value(BH method)`) ) %>% filter(`corrected p-value(BH method)` < 0.05) # 筛选显著结果

提示：TBtools输出的列名可能包含特殊字符，设置check.names=FALSE可保留原始列名

对于GO富集分析，我们通常需要按生物过程(BP)、细胞组分(CC)和分子功能(MF)分类展示。下面的代码将数据按类别分割并排序：

# 按GO类别分组处理 bp_data <- processed_data %>% filter(grepl("Biological process", Class)) %>% arrange(desc(logP)) %>% slice_head(n = 15) # 取每个类别前15个最显著结果 cc_data <- processed_data %>% filter(grepl("Cellular component", Class)) %>% arrange(desc(logP)) %>% slice_head(n = 15) mf_data <- processed_data %>% filter(grepl("Molecular function", Class)) %>% arrange(desc(logP)) %>% slice_head(n = 15) # 合并处理后的数据 plot_data <- bind_rows(bp_data, cc_data, mf_data) %>% mutate( GO_Name = factor(GO_Name, levels = unique(GO_Name)), Class = factor(Class, levels = c("Biological process", "Cellular component", "Molecular function")) )

2. 专业配色方案设计

科学可视化的配色不仅影响美观，更关系到信息传达的准确性。ggplot2配合RColorBrewer扩展包提供了多种专业配色方案。

2.1 选择合适的配色方案

RColorBrewer提供了三类主要配色方案：

顺序型(Sequential)：适用于从低到高的有序数据
分类型(Qualitative)：适用于不同类别间的区分
发散型(Diverging)：适用于有中间值向两端发散的数据

对于GO富集分析这种分类数据，我们应选择分类型配色。以下是几种适合科学出版物的方案对比：

配色方案	适用场景	颜色数量	打印友好
Set2	中等对比	8色	是
Paired	高对比	12色	是
Dark2	深色调	8色	是
Accent	强调色	8色	部分

library(RColorBrewer) # 查看可用配色方案 display.brewer.all(type = "qual") # 为GO三类设置专业配色 go_colors <- brewer.pal(3, "Dark2") names(go_colors) <- levels(plot_data$Class)

2.2 实现自适应配色

为了让配色方案自动适应不同的GO类别数量，我们可以创建一个灵活的配色函数：

get_go_palette <- function(n) { if (n <= 8) { return(brewer.pal(n, "Set2")) } else if (n <= 12) { return(brewer.pal(n, "Paired")) } else { return(colorRampPalette(brewer.pal(8, "Accent"))(n)) } }

3. 创建出版级柱状图

柱状图是展示GO富集结果最直观的方式之一，下面我们构建一个高度可定制的绘图函数。

3.1 基础柱状图实现

library(ggplot2) basic_go_barplot <- function(data, title = "") { ggplot(data, aes(x = reorder(GO_Name, logP), y = logP, fill = Class)) + geom_bar(stat = "identity", width = 0.7) + scale_fill_manual(values = go_colors) + labs(title = title, x = "GO Terms", y = expression(-log[10]("Adjusted p-value"))) + coord_flip() + # 横向柱状图更易阅读长标签 theme_minimal() } # 生成图表 basic_plot <- basic_go_barplot(plot_data, "GO Enrichment Analysis") print(basic_plot)

3.2 高级定制与排版优化

原始图表通常需要进一步调整才能达到出版要求。以下是常见的优化点：

标签重叠问题：通过调整文本角度、大小和间距解决
图例位置：根据图表布局选择最佳位置
坐标轴刻度：优化刻度密度和标签格式
网格线：适当调整以增强可读性而不分散注意力

enhanced_go_barplot <- function(data, title = "", font_size = 12) { ggplot(data, aes(x = reorder(GO_Name, logP), y = logP, fill = Class)) + geom_bar(stat = "identity", width = 0.7, color = "black", size = 0.3) + scale_fill_manual(values = go_colors) + labs(title = title, x = NULL, y = expression(-log[10]("Adjusted p-value"))) + coord_flip() + theme( text = element_text(family = "Arial", size = font_size), axis.text.y = element_text(color = "black", size = font_size - 1), axis.text.x = element_text(color = "black", size = font_size - 1), axis.title.x = element_text(size = font_size + 1, margin = margin(t = 10)), legend.position = "top", legend.title = element_blank(), legend.text = element_text(size = font_size), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_line(color = "gray90"), panel.grid.minor.x = element_blank(), plot.title = element_text(hjust = 0.5, size = font_size + 2) ) + scale_y_continuous(expand = expansion(mult = c(0, 0.05))) # 优化y轴空白 } # 输出高清图表 pdf("GO_Enrichment_Barplot.pdf", width = 10, height = 8, useDingbats = FALSE) print(enhanced_go_barplot(plot_data, "GO Enrichment Analysis", 14)) dev.off()

4. 创建信息丰富的气泡图

气泡图可以同时展示三个维度的信息：富集显著性(p值)、基因比例和涉及的基因数量。

4.1 基础气泡图实现

basic_go_bubble <- function(data, title = "") { ggplot(data, aes(x = GeneRatio, y = reorder(GO_Name, logP))) + geom_point(aes(size = HitsGenesCountsInSelectedSet, color = logP), alpha = 0.8) + scale_color_gradientn(colors = brewer.pal(9, "YlOrRd"), name = expression(-log[10]("p.adj"))) + scale_size_continuous(range = c(3, 10), name = "Gene Count") + facet_grid(Class ~ ., scales = "free_y", space = "free_y") + labs(title = title, x = "Gene Ratio", y = "GO Terms") + theme_minimal() } # 生成图表 basic_bubble <- basic_go_bubble(plot_data, "GO Enrichment Bubble Plot") print(basic_bubble)

4.2 高级气泡图定制

出版级气泡图需要考虑更多细节：

enhanced_go_bubble <- function(data, title = "", font_size = 12) { ggplot(data, aes(x = GeneRatio, y = reorder(GO_Name, logP))) + geom_point(aes(size = HitsGenesCountsInSelectedSet, color = logP), alpha = 0.8) + scale_color_gradientn( colors = brewer.pal(9, "YlOrRd"), name = expression(-log[10]("p.adj")), breaks = pretty_breaks(n = 5) ) + scale_size_continuous( range = c(3, 10), name = "Gene Count", breaks = pretty_breaks(n = 5) ) + scale_x_continuous(expand = expansion(mult = c(0.05, 0.15))) + facet_grid(Class ~ ., scales = "free_y", space = "free_y") + labs(title = title, x = "Gene Ratio", y = "GO Terms") + theme( text = element_text(family = "Arial", size = font_size), axis.text = element_text(color = "black"), axis.title = element_text(size = font_size + 1), legend.title = element_text(size = font_size), legend.text = element_text(size = font_size - 1), strip.text = element_text(size = font_size + 1, face = "bold"), panel.grid.major = element_line(color = "gray90"), panel.grid.minor = element_blank(), panel.spacing = unit(1, "lines"), plot.title = element_text(hjust = 0.5, size = font_size + 2) ) } # 输出高清图表 png("GO_Enrichment_Bubble.png", width = 2800, height = 2000, res = 300) print(enhanced_go_bubble(plot_data, "GO Enrichment Analysis", 12)) dev.off()

5. 自动化与批量处理技巧

当需要处理多个GO富集分析结果时，自动化脚本可以节省大量时间。

5.1 创建绘图函数工厂

make_go_plotter <- function(color_palette = "Set2", font_family = "Arial", output_dir = "GO_Plots") { if (!dir.exists(output_dir)) { dir.create(output_dir) } function(data, analysis_name, plot_type = "both") { # 处理数据 plot_data <- prepare_go_data(data) # 设置输出路径 bar_path <- file.path(output_dir, paste0(analysis_name, "_barplot.pdf")) bubble_path <- file.path(output_dir, paste0(analysis_name, "_bubble.pdf")) # 生成图表 if (plot_type %in% c("bar", "both")) { pdf(bar_path, width = 10, height = 8, useDingbats = FALSE) print(enhanced_go_barplot(plot_data, analysis_name)) dev.off() } if (plot_type %in% c("bubble", "both")) { png(bubble_path, width = 2800, height = 2000, res = 300) print(enhanced_go_bubble(plot_data, analysis_name)) dev.off() } } } # 使用示例 my_plotter <- make_go_plotter("Dark2", "Helvetica") my_plotter(raw_data, "WGCNA_Module_Red")

5.2 处理多个分析结果的批量脚本

process_multiple_go_analyses <- function(file_pattern = "GO_Enrichment_*.txt", output_dir = "GO_Plots") { # 获取所有匹配的文件 go_files <- list.files(pattern = file_pattern) # 初始化绘图器 plotter <- make_go_plotter(output_dir = output_dir) # 处理每个文件 for (file in go_files) { analysis_name <- gsub("GO_Enrichment_|.txt", "", file) data <- read.table(file, sep = "\t", header = TRUE, check.names = FALSE) plotter(data, analysis_name) } } # 执行批量处理 process_multiple_go_analyses()

在实际项目中，我发现将绘图参数集中管理可以大大提高工作效率。以下是我常用的参数配置列表，保存在单独的R脚本中：

# 文件: plot_settings.R GO_PLOT_SETTINGS <- list( font = list( family = "Arial", size = 12, title_size = 14 ), color = list( palette = "Set2", bubble_gradient = "YlOrRd" ), size = list( bubble_min = 3, bubble_max = 10, bar_width = 0.7 ), output = list( dpi = 300, width = 10, height = 8, units = "in" ) )