告别手动处理：用Python脚本批量实现Autodock Vina分子对接（附PDB文件预处理脚本）-编程阁

告别手动处理：用Python脚本批量实现Autodock Vina分子对接（附PDB文件预处理脚本）

药物筛选研究中，分子对接是评估小分子与靶蛋白相互作用的关键步骤。当面对数百甚至上千个候选分子时，传统手动操作不仅效率低下，还容易引入人为错误。本文将展示如何通过Python脚本整合Open Babel和MGLTools工具链，构建从PDB文件预处理到批量对接的全自动流程，显著提升科研工作效率。

1. 环境配置与工具链集成

1.1 核心工具安装指南

实现自动化对接需要三个核心组件协同工作：

# 安装Open Babel（推荐conda方式） conda install -c openbabel openbabel # 下载AutoDock Vina wget http://vina.scripps.edu/download/autodock_vina_1_1_2_linux_x86.tgz tar xzvf autodock_vina_1_1_2_linux_x86.tgz # 获取MGLTools命令行版本 wget https://ccsb.scripps.edu/mgltools/downloads/mgltools_x86_64Linux2_1.5.7.tar.gz tar -axvf mgltools_x86_64Linux2_1.5.7.tar.gz cd mgltools_x86_64Linux2_1.5.7 bash install.sh

提示：建议将上述工具的bin目录添加到系统PATH环境变量，避免每次调用都需要指定完整路径。

1.2 Python环境依赖

创建专用conda环境管理项目依赖：

# 创建并激活环境 conda create -n vina_auto python=3.8 conda activate vina_auto # 安装必要库 pip install pandas numpy tqdm

2. PDB文件预处理自动化

2.1 受体蛋白处理流程

受体蛋白预处理需要去除水分子、添加氢原子并转换为pdbqt格式。以下Python函数封装了MGLTools的预处理命令：

import subprocess from pathlib import Path def prepare_receptor(pdb_file: Path, output_dir: Path): """自动化处理受体蛋白PDB文件""" try: cmd = f"pythonsh prepare_receptor4.py -r {pdb_file} -o {output_dir/pdb_file.stem}.pdbqt" subprocess.run(cmd, shell=True, check=True) print(f"成功生成受体文件: {output_dir/pdb_file.stem}.pdbqt") except subprocess.CalledProcessError as e: print(f"受体处理失败: {e}")

2.2 小分子配体批量处理

对于分子库中的多个配体，使用Open Babel进行批量转换：

def batch_convert_ligands(input_dir: Path, output_dir: Path): """批量转换配体为pdbqt格式""" output_dir.mkdir(exist_ok=True) for pdb_file in input_dir.glob("*.pdb"): cmd = f"obabel {pdb_file} -O {output_dir/pdb_file.stem}.pdbqt" subprocess.run(cmd, shell=True)

3. 对接参数智能配置

3.1 对接盒(Box)参数计算

对接区域的定义直接影响结果准确性。以下代码自动从受体结构计算对接中心：

import numpy as np from biopandas.pdb import PandasPdb def calculate_docking_box(pdbqt_file: Path, padding=15): """从受体pdbqt文件自动计算对接盒参数""" ppdb = PandasPdb().read_pdbqt(str(pdbqt_file)) df = ppdb.df['ATOM'] coords = df[['x_coord', 'y_coord', 'z_coord']].values center = coords.mean(axis=0) size = coords.max(axis=0) - coords.min(axis=0) + padding return { 'center_x': center[0], 'center_y': center[1], 'center_z': center[2], 'size_x': size[0], 'size_y': size[1], 'size_z': size[2] }

3.2 配置文件生成

将计算得到的参数写入Vina配置文件：

def generate_config(box_params: dict, config_path: Path): """生成Vina对接配置文件""" with open(config_path, 'w') as f: for key, value in box_params.items(): f.write(f"{key} = {value}\n") f.write("exhaustiveness = 32\n")

4. 批量对接与并行加速

4.1 单次对接函数实现

封装Vina对接命令为Python函数：

def run_vina_docking(receptor: Path, ligand: Path, config: Path, output: Path): """执行单次分子对接""" cmd = f"vina --receptor {receptor} --ligand {ligand} --config {config} --out {output}" subprocess.run(cmd, shell=True, check=True)

4.2 多进程批量处理

利用Python的multiprocessing模块加速大规模对接：

from multiprocessing import Pool from tqdm import tqdm def batch_docking(receptor: Path, ligand_dir: Path, config: Path, output_dir: Path, processes=4): """多进程批量对接""" output_dir.mkdir(exist_ok=True) ligands = list(ligand_dir.glob("*.pdbqt")) def worker(ligand): output = output_dir / f"result_{ligand.stem}.pdbqt" run_vina_docking(receptor, ligand, config, output) with Pool(processes) as p: list(tqdm(p.imap(worker, ligands), total=len(ligands)))

5. 结果分析与可视化

5.1 对接结果解析

提取对接结果中的结合亲和力分数：

def parse_docking_results(result_file: Path): """解析对接结果文件""" with open(result_file) as f: lines = f.readlines() scores = [] for line in lines: if line.startswith("REMARK VINA RESULT"): scores.append(float(line.split()[3])) return scores

5.2 结果汇总与排序

批量处理多个对接结果并生成排序报告：

import pandas as pd def generate_report(output_dir: Path): """生成对接结果汇总报告""" results = [] for result_file in output_dir.glob("result_*.pdbqt"): ligand_name = result_file.stem.replace("result_", "") scores = parse_docking_results(result_file) if scores: results.append({ "Ligand": ligand_name, "Best_Score": min(scores), "Average_Score": sum(scores)/len(scores) }) df = pd.DataFrame(results) return df.sort_values("Best_Score")

6. 实战案例：COVID-19靶点筛选

以SARS-CoV-2主蛋白酶(Mpro)为例，演示完整工作流程：

准备受体文件：Mpro.pdb
收集候选化合物库：compounds/*.pdb
执行预处理和对接：

# 预处理受体 prepare_receptor(Path("Mpro.pdb"), Path("processed")) # 批量处理配体 batch_convert_ligands(Path("compounds"), Path("processed/ligands")) # 计算对接参数 box_params = calculate_docking_box(Path("processed/Mpro.pdbqt")) generate_config(box_params, Path("config.txt")) # 执行批量对接 batch_docking( receptor=Path("processed/Mpro.pdbqt"), ligand_dir=Path("processed/ligands"), config=Path("config.txt"), output_dir=Path("results"), processes=8 ) # 生成报告 report = generate_report(Path("results")) report.to_csv("docking_results.csv", index=False)

在实际项目中，这套脚本将传统需要数天的手动操作缩短至几小时内完成。通过调整并行进程数，可以充分利用计算资源应对不同规模的筛选需求。