Protein-ligand binding prediction with machine learning models: current status

- May 28, 2019

Protein-ligand binding prediction with machine learning models: current status

Drug discovery is a long journey. Given the complexity of a new drug design project, only through the highly organized cooperations between different people, the goal of developing a new drug could be achieved.

In the whole process, the binding affinity prediction between a target (a protein in general) and a small compound would be useful before the cell model or animal model experiments. We hope to discover tightly bound small molecules to a specific protein. Improving the bind affinity prediction could help us short list a set of useful molecules (lead-like compounds). Traditionally, binding affinity prediction could be achieved by absolute binding energy calculation, MMGBSA and scoring functions (in virtual screening and docking). More and more machine learning based methods have been developed to perform the prediction (Table 1).

Table 1. Current ML-based binding affinity prediction models

SN	Model	Year	Training (~11k)	Testing (290)	RMSE	R	Prediction
1	RF-score	2010	PDBBind v2016	V2016 coreset	1.39	0.8	pKd
2	Kdeep	2018	PDBBind v2016	V2016 coreset	1.27	0.82	pKd
3	TopBP	2018	PDBBind v2016	V2016 coreset	1.65	0.86	Energy
4	Pafnucy	2018	PDBBind v2016	V2016 coreset	1.42	0.78	pKd
5	OnionNet	2019	PDBBind v2016	V2016 coreset	1.28	0.82	pKd

Why do binding affinity prediction?

What are the traditional methods?

What we could do with machine-learning method?

What are the dataset?

Majorly two datasets could be used. One is BindingDB, which contains all available binding affinity data.
Another is the PDBBind database, which contains all experimentally determined protein-ligand complexes with their binding affinity data (Ki, Kd, or IC50).

Search This Blog

Memory Down