Protein-ligand binding prediction with machine learning models: current status
Protein-ligand binding prediction with machine learning models: current status
Drug discovery is a long journey. Given the complexity of a new drug design project, only through the highly organized cooperations between different people, the goal of developing a new drug could be achieved.
In the whole process, the binding affinity prediction between a target (a protein in general) and a small compound would be useful before the cell model or animal model experiments. We hope to discover tightly bound small molecules to a specific protein. Improving the bind affinity prediction could help us short list a set of useful molecules (lead-like compounds). Traditionally, binding affinity prediction could be achieved by absolute binding energy calculation, MMGBSA and scoring functions (in virtual screening and docking). More and more machine learning based methods have been developed to perform the prediction (Table 1).
Table 1. Current ML-based binding affinity prediction models
SN
|
Model
|
Year
|
Training (~11k)
|
Testing (290)
|
RMSE
|
R
|
Prediction
|
1
|
RF-score
|
2010
|
PDBBind v2016
|
V2016 coreset
|
1.39
|
0.8
|
pKd
|
2
|
Kdeep
|
2018
|
PDBBind v2016
|
V2016 coreset
|
1.27
|
0.82
|
pKd
|
3
|
TopBP
|
2018
|
PDBBind v2016
|
V2016 coreset
|
1.65
|
0.86
|
Energy
|
4
|
Pafnucy
|
2018
|
PDBBind v2016
|
V2016 coreset
|
1.42
|
0.78
|
pKd
|
5
|
OnionNet
|
2019
|
PDBBind v2016
|
V2016 coreset
|
1.28
|
0.82
|
pKd
|
Why do binding affinity prediction?
What are the traditional methods?
What we could do with machine-learning method?
What are the dataset?
Majorly two datasets could be used. One is BindingDB, which contains all available binding affinity data.Another is the PDBBind database, which contains all experimentally determined protein-ligand complexes with their binding affinity data (Ki, Kd, or IC50).
Comments
Post a Comment