LightDock | 蛋白质-多肽对接 | peptide-protein docking

 

SOX9因为IDR domain太多,完整的结构基本无法预测,所以从多肽SLiM着手,分段预测,分而治之。

 

用的是LightDock这个对接工具,非常好用,速度也很快,50个多肽和1个蛋白的对接,时间可以控制在半天之内。

https://lightdock.org/tutorials/0.9.3/basics

 

最终的汇总PPT,Downloads/SOX9-B1 docking/SOX9-B1 interaction.pptx

 

关于BAF的PyMol操作

select histones, chain A+G+D+B+C+E+F+H
select histones, chain G+H
color grey, histones

select B1, chain M
color red, B1

select dna, resn DA+DT+DG+DC
color pink, dna

select R, chain R

  

关于选中B1的contact residues

color red, B1
# 在PyMol里,如何select某些链上没有被其他结构occupied的残基
select not_contacted, (chain M and polymer.protein) and not (byres (chain M and polymer.protein within 5 of (not chain M)))
color white, not_contacted

select contacted, (chain M and polymer.protein) and (byres (chain M and polymer.protein within 4 of (not chain M)))
color white, contacted
save B1.cryoEM.fasta, B1, format=fasta
save B1.contacted.fasta, contacted, format=fasta

# for repaired B1
color red, repaired_B1
select contacted, (repaired_B1 and polymer.protein) and (byres (repaired_B1 and polymer.protein within 3 of (not repaired_B1 and not B1)))
color white, contacted
save repaired_B1.contacted.fasta, contacted, format=fasta

  

做MSA找到其fasta序列

# online multiple sequence alignment
# goto: https://www.genome.jp/tools-bin/clustalw
# SLOW/ACCURATE
# Gap Open Penalty: 1

 

其他选择contact residues的尝试

color red, B1
select surface_residues, B1 and b > 8
select exposed_protein, surface_residues and polymer.protein
color white, exposed_protein

# 如何选择那些能够与配体binding的残基
select binding_residues, byres (chain M and polymer.protein within 5 of organic)
color blue, binding_residues

# 如何选择那些能够与潜在蛋白互作的残基
# 如何选择那些没有暴露出来,不可能与蛋白互作的残基
cmd.get_area("B1", load_b=1)
select buried_residues, B1 and polymer.protein and b < 10
color white, buried_residues

  

show surface
color red, alpha_B1
color green, B1

select d1, alpha_B1 and (resi 34-47 or resi 173-232 or resi 276-312 or resi 340-352 or resi 366-385)
color white, d1

save BAF_B1_contact.pse

  

把lightdock的swarm与B1 align【最终还是手动做的】

# -----------
# lightdock
# swarm
cd /Users/zhixinli/Partners HealthCare Dropbox/Zhixin Li/Downloads/SOX9-B1 docking
load swarm_centers_B1_1131.pdb, swarms
load B1_repaired_model_03.pdb, B1

show surface, B1
color red, B1

# swarm 不需要 align,直接可视化即可
show spheres, swarm
set sphere_scale, 0.5, swarm
color yellow, swarm

center B1
zoom all

# 获取 receptor 的中心
get_position

  


 

docking的最终准备,B1蛋白处理

## docking前的蛋白预处理

# PDB上的蛋白质,在docking之前,该做哪些预处理,我发现蛋白序列并不完整。
# 1. 检查和清理结构,如何删除水分子,去除配体、小分子和离子?
remove solvent
remove resn HOH
remove hetatm
remove resn SO4
remove resn CL
remove resn CA
remove resn MG
remove resn ZN
remove resn NA
remove resn K
save clean_B1.pdb

# 2. 处理不完整的氨基酸序列
# https://swissmodel.expasy.org/interactive

cd /Users/zhixinli/Partners HealthCare Dropbox/Zhixin Li/Downloads/SOX9-B1 docking
load clean_B1.pdb, clean_B1
load B1_repaired_model_03.pdb, repaired_B1
color red, repaired_B1
super clean_B1, repaired_B1
align clean_B1, repaired_B1
save raw_repaired_B1.align.pse

# 3. 加氢原子
# chimera your_protein.pdb
# Tools → Structure Editing → AddH
# File → Save PDB → 命名为 protein_with_H.pdb

# ----------------

cd ~/projects/SOX9_BAF/
module load localcolabfold/1.5.2 gcc/9.2.0

colabfold_batch SOX9_SLiM.filtered.fasta SOX9_SLiM_fast_5m --num-recycle 12 --num-models 5 --msa-mode single_sequence --use-gpu-relax --amber

zip -r SOX9.SLiM.pdb.5m.zip pdb_5m

# rank_001: 排名第一,即最优模型
# alphafold2_ptm_model_5: 使用的模型是 model_5(基于 pTM 评估)
# seed_000: 随机种子编号,影响结构采样但通常只在多次运行比较时用到

  


 

 

对接

# ----------------

conda create -n docking_py310 python=3.10
conda activate docking_py310
pip install lightdock

which lightdock3_setup.py
/home/zz950/softwares/miniconda3/envs/docking_py310/bin/lightdock3_setup.py

# GPT: lightdock3_setup.py 核心参数解析
# GPT: 我在PyMol里选中了部分残基,如何导出成restraints.txt文件?

# 限定可以docking的残基
# sublime处理重复
# repaired_B1.contacted.BAF.pse
iterate (contacted), print(f'M.{resn}.{resi}')

# 基本参数
# 设定对接点数量;
# 去掉氢和水分子;
# 指定残基限制;
# 初步探索结构空间
# lightdock3_setup.py B1_repaired_model_03.pdb $p -s 100 -g 200 --noh --now -r restraints.txt

# /home/zz950/softwares/miniconda3/envs/docking_py310/bin/lightdock3_setup.py B1_repaired_model_03.pdb pdb/ADTSGVPS_unrelaxed_rank_001_alphafold2_ptm_model_1_seed_000.pdb -s 100 -g 200 --noxt --noh --now -r restraints.txt

/home/zz950/softwares/miniconda3/envs/docking_py310/bin/lightdock3_setup.py B1_repaired_model_03.pdb pdb_5m/DVISNIETFDVNEFD_relaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb --noxt --noh --now -r restraints.txt

/home/zz950/softwares/miniconda3/envs/docking_py310/bin/lightdock3_setup.py B1_repaired_model_03.pdb $p --noxt --noh --now -r restraints.txt

# /home/zz950/softwares/miniconda3/envs/docking_py310/bin/lightdock3.py setup.json 100 -c 1 -l 0

# 正式 docking:50-200 步,根据配体大小和系统复杂度决定
# sipper or ddna
/home/zz950/softwares/miniconda3/envs/docking_py310/bin/lightdock3.py setup.json 20 -s sipper

  

后续处理Pipeline

for d in swarm_*; do
  for f in "$d"/gso_*.out; do
    awk '!/^#/ {print FILENAME "\t" $(NF) "\t" NR-1 "\t" $0}' "$f" >> all_scores.tmp
  done
done

# sort -k4 -g all_scores.tmp | head -1 > best_result.txt
# awk '{print $(NF), $0}' all_scores.tmp | sort -k1 -g | head -10 > best_result.txt
sort -k2 -g all_scores.tmp | head -10 > ${p}_best_result_top10.txt

/home/zz950/softwares/miniconda3/envs/docking_py310/bin/lgd_generate_conformations.py B1_repaired_model_03.pdb pdb_5m/DVISNIETFDVNEFD_relaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb swarm_1055/gso_20.out 42
/home/zz950/softwares/miniconda3/envs/docking_py310/bin/lgd_generate_conformations.py B1_repaired_model_03.pdb pdb_5m/DVISNIETFDVNEFD_relaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb swarm_1056/gso_10.out 77
/home/zz950/softwares/miniconda3/envs/docking_py310/bin/lgd_generate_conformations.py B1_repaired_model_03.pdb pdb_5m/DVISNIETFDVNEFD_relaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb swarm_494/gso_10.out 97

# clean for rerun
rm -r swarm_* setup.json init/ lightdock* pdb_5m/lightdock_* all_scores.tmp

  

这次学习高度依赖GPT,基本舍弃了Google

# 我如何根据这些结果来获取当前的docking可信度打分

# 这么多swarm结果,已经这么多gso输出,我如何选出最优的那个对比

# gso每一行是一个docking吗
# 我如何获取某一行最佳的那个打分的docking结构

# 为什我生成了1000个swarm文件夹
# 我想docking 多肽和蛋白,应该用哪个打分函数?
# 如何修改代码,输出每个swarm的最佳得分呢
# 为什么gso_0.out里的最优得分都是0?
# 提取或可视化这个结构的置信度(plddt)分布
# 可以直接在PyMol里可视化pLDDT吗
# 可以在PyMol中可视化LightDock生成的Swarms吗?
# 我有swarm的pdb文件,该怎么align,与蛋白一起可视化
# 能解释一下,在每一个swarm处的glowworms,打分函数是根据什么来打分的吗,输入是什么?
# 多肽对接蛋白,多肽一部分嵌入了蛋白内,这合理吗? 不合理!!!目前就sipper是合理的
# 但只有sipper 的对接,多肽没有嵌入蛋白,多肽附着在蛋白表面,为什么打分为0,有改进办法吗
# 默认的DEFAULT_CONTACT_RESTRAINTS_CUTOFF = 3.9,这适合我的情况吗
# alphafold生成的relaxed和unrelaxed,哪个模型好
# lightdock的打分有对不同的蛋白大小做normalization吗?不同大小的多肽对接得分可比吗?
# 帮我写一个按原子数和接触残基数归一化的脚本

  

选取最佳打分函数

# test scoring funciton
SCORES=(cpydock dfire2 mj3h pisa sd sipper tobi vdw)

LOGFILE="timing_stat.log"
echo -e "ScoringFunction\tSetupTime(s)\tSimulationTime(s)" > $LOGFILE

for SCORE in "${SCORES[@]}"
do
	# clean previous files
	rm lightdock* pdb_5m/lightdock_*

	DIR="dock_${SCORE}"
    mkdir -p "$DIR"
    cd "$DIR"

    /home/zz950/softwares/miniconda3/envs/docking_py310/bin/lightdock3_setup.py ../B1_repaired_model_03.pdb ../pdb_5m/DVISNIETFDVNEFD_relaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb --noxt --noh --now -r ../restraints.txt

	echo "[$SCORE] Running simulation..."
    start_sim=$(date +%s)
    /home/zz950/softwares/miniconda3/envs/docking_py310/bin/lightdock3.py setup.json 20 -s $SCORE
    end_sim=$(date +%s)

    sim_time=$((end_sim - start_sim))

    echo -e "$SCORE\t$setup_time\t$sim_time" | tee -a ../$LOGFILE

    for d in swarm_*; do
	  for f in "$d"/gso_*.out; do
	    awk '!/^#/ {print FILENAME "\t" $(NF) "\t" NR-1 "\t" $0}' "$f" >> all_scores.tmp
	  done
	done

	sort -k2 -g all_scores.tmp | head -1 > best_result_top10.txt

	read COL1 COL3 < <(awk 'NR==1 {print $1, $3}' best_result_top10.txt)

	/home/zz950/softwares/miniconda3/envs/docking_py310/bin/lgd_generate_conformations.py ../B1_repaired_model_03.pdb ../pdb_5m/DVISNIETFDVNEFD_relaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb $COL1 $COL3

	cp swarm*/lightdock_$((COL3 - 1)).pdb ../${SCORE}_best.pdb

    cd ..
done

 

-rw-r--r-- 1 zz950 zz950 176 Apr 30 15:41 dock_cpydock/best_result_top10.txt
-rw-r--r-- 1 zz950 zz950 176 Apr 30 16:53 dock_dfire2/best_result_top10.txt
-rw-r--r-- 1 zz950 zz950 169 Apr 30 17:24 dock_mj3h/best_result_top10.txt
-rw-r--r-- 1 zz950 zz950 167 Apr 30 17:47 dock_pisa/best_result_top10.txt
-rw-r--r-- 1 zz950 zz950 194 Apr 30 17:56 dock_sd/best_result_top10.txt
-rw-r--r-- 1 zz950 zz950 163 Apr 30 17:59 dock_sipper/best_result_top10.txt
-rw-r--r-- 1 zz950 zz950 168 Apr 30 18:38 dock_tobi/best_result_top10.txt
-rw-r--r-- 1 zz950 zz950 174 Apr 30 18:47 dock_vdw/best_result_top10.txt


swarm_1055/gso_10.out	-1057.80198153	42	(-31.3301535, -18.6203510, 16.5903817,  0.4594921, -0.3323501, -0.6703852, -0.4785333)    0    0 -1577.07855309  0 4.200 -1057.80198153
swarm_998/gso_10.out	-1041.64240828	144	(-21.2007974, -33.8906462, 21.0036514,  0.3935567,  0.5070643,  0.7040320, -0.3038713)    0    0 -1523.76322112  4 3.800 -1041.64240828
swarm_379/gso_10.out	-27.36000000	30	(-26.8635348, -12.6567589, 18.2163714, -0.1305292, -0.8561228, -0.0141236, -0.4998165)    0    0 -24.08566743  4 3.640 -27.36000000
swarm_157/gso_20.out	-0.83572176	111	(-2.6080655, 35.5076648,  5.7503667,  0.0168021,  0.1792593,  0.8345611, -0.5206645)    0    0  -0.62337494  5 4.840  -0.83572176
swarm_488/gso_10.out	-2599291755.02440310	129	(-38.8236518, -15.4687036, 15.6944477, -0.2331482, -0.5301978,  0.8057096,  0.1239529)    0    0 -2892909889.30625200  2 3.240 -2599291755.02440310
swarm_0/gso_0.out	0.00000000	100	(45.9092041,  8.6613108, -0.6781290,  0.3481973, -0.4696026,  0.8106289,  0.0333577)    0    0   5.00000000  0 0.200   0.00000000
swarm_604/gso_10.out	-30.26000000	79	(16.8878788, -16.1452200, -7.1088962,  0.8045971,  0.4575122, -0.1986185,  0.3222682)    0    0 -27.56280527 10 2.520 -30.26000000
swarm_1056/gso_10.out	-428.29799035	112	(-33.7805647, -18.5711154, 18.4883215,  0.1383592,  0.1644035, -0.8605261,  0.4618690)    0    0 -556.67418304  4 3.720 -428.29799035

  

不同打分函数耗时

ScoringFunction SetupTime(s)    SimulationTime(s)
dfire2          4031
mj3h            1767
pisa            1226
sd              472
sipper          64
tobi            2238
vdw             462
cpydock         482
ddna            1162

  

 

一些bug

sipper打分为0,需要修改DEFAULT_CONTACT_RESTRAINTS_CUTOFF参数

vi /home/zz950/softwares/miniconda3/envs/docking_py310/lib/python3.10/site-packages/lightdock/constants.py

 

代码bug

File "/home/zz950/softwares/miniconda3/envs/docking_py310/lib/python3.10/site-packages/lightdock/scoring/sipper/driver.py", line 164, in __call__
    DEFAULT_CONTACT_RESTRAINTS_CUTOFF
NameError: name 'DEFAULT_CONTACT_RESTRAINTS_CUTOFF' is not defined

lightdock/scoring/ddna/data/fort.21_xscore_noH_Met文件缺失

from lightdock.constants import DEFAULT_CONTACT_RESTRAINTS_CUTOFF

  

  

 

 

参考代码:

- /home/zz950/projects/BAF_SOX9/docking/lightdock

- /home/zhl595/projects/SOX9_BAF

 

posted @ 2025-05-02 03:00  Life·Intelligence  阅读(235)  评论(0)    收藏  举报
TOP