Posts

Protein-ligand binding prediction with machine learning models: current status

Protein-ligand binding prediction with machine learning models: current status Drug discovery is a long journey. Given the complexity of a new drug design project, only through the highly organized cooperations between different people, the goal of developing a new drug could be achieved. In the whole process, the binding affinity prediction between a target (a protein in general) and a small compound would be useful before the cell model or animal model experiments. We hope to discover tightly bound small molecules to a specific protein. Improving the bind affinity prediction could help us short list a set of useful molecules (lead-like compounds). Traditionally, binding affinity prediction could be achieved by absolute binding energy calculation, MMGBSA and scoring functions (in virtual screening and docking). More and more machine learning based methods have been developed to perform the prediction (Table 1). Table 1. Current ML-based binding affinity prediction models ...

A step-by-step tutorial to perform PCA with Gromacs MD trajectory

Image
A step-by-step tutorial to perform PCA with Gromacs MD trajectory It is a common practice to perform principal component analysis to explore the transitions and dynamics of macromolecules simulations.   There may exist multiple states in the free energy landscape, so it is important to extract the representative structures from the energy minima in the free energy surface. To generate such a free energy surface, we could define some collective variables (CVs). However, due to the high dimensionality of the simulation trajectory, it is not always straightforward to select several more important CVs. So the problem is how could we find one or two CVs which could describe the slow motions of the system? To this end, PCA-based dimension reduction and projections could partially solve the problem by transforming the original dataset and grasp the largest variance of the system. Although there has been a tool in Gromacs, which perform the PCA using g_covar and g_anaeig...

The inconsistency in MD simulation regarding 1-4 interactions

The problem of conflicting factors In MD simulation, in amber force field using 0.5 and 0.8333 scaling factors for 1-4 interaction van der waals and columbic interactions. However, in glycam based force field, these two scaling factors are set as 1.0 both.  In case we need to simulate a protein using amber force field and the LPS lipid using the glycam based force field in Gromacs, then there comes the problem. Because in Gromacs, the setting for these two factors (fugeJJ and fugeQQ) is written in the "[default]" field for the whole simulation system. Therefore, in order to generate 1-4 interactions for protein and LPS molecules, it will be incompatible. How to solve the problem? 1. Change protein force field to OPLS, using OPLS force field for LPS as well. Or choosing other force field. 2. Manually define the 1-4 interactions. But sadly only for vdw interactions. We could explicitly add parameters for 1-4 LJ interactions in the "pair" field. ...

Fixing bugs in FF14SB port for Gromacs

Bugs? The newly released amber14sb made modifications for protein, DNA and RNA. This force field could be ported into gromacs as I already introduced previous. However, there are some small bugs in the improper dihedral angles of HID, HIE and HIP. So when generating tpr file, errors would be prompted. When run grompp -f npt.mdp -p topol.top -c protein.gro -o product.tpr -n index, errors occur: ERROR 1 [file topolff14sb_Protein_chain_B.itp, line 208041]:   No default Improper Dih. types The reason is that the parameters for some dihedral angles are missing. So we need to fix this. How to fix? Add the following two lines in the ffbonded.itp after the 730th lines: NA  CW  CC  CT       4      180.00     4.60240     2     ;;; HID force field added NA  CV  CC  CT       4      180.00     4.60240     2     ;;; HID force fie...

Install new amber force fields ports in Gromacs

Image
Amber force fields are popular for protein, DNA, RNA and lipid simulations. Recently updated amberff12, amberff14 and amberff15 were reported to perform better than amber99sb and amber99sb-ildn. Gromacs is a commonly used simulation software, which using OpenMM to enable GPU acceleration. Gromacs is fast and easy to use. A lot of people use amberff in Gromacs. So to use the newly release amberff12 and amber14sb in gromacs, we need to some work. Step 1. Install Gromacs 4.6.x or Gromacs 5.0.x or Gromacs 5.1.x Installation instructions could be find here: For 4.6.x http://www.gromacs.org/Documentation/Installation_Instructions_4.6 For 5.0.x and 5.1.x http://www.gromacs.org/Documentation/Installation_Instructions_5.0 Suppose you are user john, you installed a gromacs 4.6.7 in /home/john/applications/. After installation, you should source the GMXRC by: source  /home/john/applications/gromacs.4.6.7/GMXRC And add topology files to environments by: export GMXLIB=/home/joh...

Some tips on installing AmberTools 16 together with Jupyter notebook

Image
AmberTools 16 together with Ipython notebook Amber (starting from 16) supports Ipython notebook (also known as jupyther notebook), which is useful for trajectories visulization and analysis and is very powerful tool to record your commands.  Besides the amber mdcrd trajectories, you may also use it to analyze your gromacs trajectories.  The strength of IPython notebook is that it is flexible and extendable and very user-friendly.  In case you'd like to try IPython notebook together with MD trajectory analysis, you could first read few pages about Jupyter notebook and AMBER pytraj. http://jupyter.org/ http://ambermd.org/tutorials/analysis/tutorial_notebooks/nglview_notebook/ http://ambermd.org/tutorials/analysis/ If you'd like to use it, these are useful step-by-step instructions for Pytraj with Ipython here : You need to install conda or anaconda first. See this link   https://docs.continuum.io/anaconda/install 1. login to ...

Missing values for "intercor" in bestranking.lst output when runing GOLD in parallel with PVM

Image
There are some bugs when running GOLD 5.4 in parallel with PVM (parallel virtual machine). In the "intcor", there is not value reported in the final docking result file "bestranking.lst". Run-1: In parallel with PVM using 8 cpus (parallel_gold_li*) ============================== ====================== #     Fitness  S(hb_ext) S(vdw_ext)  S(hb_int)    S(int)      intcor      time                               File name                Ligand name        46.25       9.00       27.86              0.00      -1.06       ----         15.870    './ligands_ m1_2.mol2'    'ligand_001' ============================== ====================== If running...