A step-by-step tutorial to perform PCA with Gromacs MD trajectory
A step-by-step tutorial to perform PCA with Gromacs MD trajectory
It is a common practice to perform principal component analysis to explore the transitions and dynamics of macromolecules simulations.
So the problem is how could we find one or two CVs which could describe the slow motions of the system? To this end, PCA-based dimension reduction and projections could partially solve the problem by transforming the original dataset and grasp the largest variance of the system. Although there has been a tool in Gromacs, which perform the PCA using g_covar and g_anaeig. The two utilities, however, are not flexible enough to do customized PCA with unconventional dataset.
The tutorial is organized as the following:
1. What do we need before we calculate PCA using xtc trajectory?
2. How to perform the analysis?
3. How to visualize the result?
4. What's more?
4. What's more?
Lets begin!
1. What we need?
Working environment:
So I just presume that you are comfortable with Linux, since the whole tutorial and codes are deployed in Linux.
Software and libraries:
Gromacs
Anaconda or Miniconda
python 3.6 (or 2.7), pandas, scikit-learn, numpy
Files:
In your working directory, you need a Gromacs trajectory file (eg. mytraj.xtc), a reference pdb file. The reference pdb file works as the topology file, as we need in all other MD packages.
The atom numbers in the xtc file should be extactly the same as that in the pdb file.
Read Gromacs xtc with mdtraj
>>> import mdtraj as mt
>>> traj = mt.load_xtc(traj.xtc, reference.pdb)
Read Gromacs xtc with mdtraj
>>> import mdtraj as mt
>>> traj = mt.load_xtc(traj.xtc, reference.pdb)
2. Howto perform PCA analysis?
If you'd like to use the all-atom xyz coordination as the input for the PCA:
>>> xyz = traj.xyz.reshape((traj.xyz.shape[0], -1))
Or you only need the C-alpha xyz
>>> calpha_index = traj.topology.select("name CA")
>>> traj = traj.atom_slice(calpha_index)
>>> xyz = traj.xyz.reshape((traj.xyz.shape[0], -1))
Do PCA calculation with scikit-learn
>>> from sklearn import decomposition
>>> pca = decomposition.PCA()
>>> pca.fit(xyz)
>>> xyz_transformed = pca.transform(xyz)
Now you have need projections stored in xyz_transformed.
The eigenvector and eigenvalue are stored in pca instance.
>>> import matplotlib.pyplot as plt
>>> plt.scatter(xyz_transformed[:, 0], xyz_transformed[:, 1],)
>>> plt.xlabel("PC1")
>>> plt.ylabel("PC2")
>>> plt.show()
Actually, as long as we could prepare our trajectory data into a certain format, we could
do the PCA calculation. We could use the C-C contact map, or all the dihedral angles (but have to transform them into sin and cos values first), or hydrogen bonding contact map, and feed the MxN matrix into scikit-learn.
https://www.github.com/zhenglz/dockingML
Download that package, install it, and then put the package bin path in your bashrc.
$ export PATH=/home/john/dockingML/bin:$PATH
$ # show help information
$ gmx_pca -h
$ # do diheral pca calculation, but have to prepare an index file contain the dihedral atom indices
$ gmx_pca.py -mode dihedral -f mytrajecotry.xtc -s reference.pdb -n dihedral.ndx -proj 10
Any issues or bugs, please leave me a comment. Thanks!
>>> xyz = traj.xyz.reshape((traj.xyz.shape[0], -1))
Or you only need the C-alpha xyz
>>> calpha_index = traj.topology.select("name CA")
>>> traj = traj.atom_slice(calpha_index)
>>> xyz = traj.xyz.reshape((traj.xyz.shape[0], -1))
Do PCA calculation with scikit-learn
>>> from sklearn import decomposition
>>> pca = decomposition.PCA()
>>> pca.fit(xyz)
>>> xyz_transformed = pca.transform(xyz)
Now you have need projections stored in xyz_transformed.
The eigenvector and eigenvalue are stored in pca instance.
3. Visualise the first two projections
We use matplotlib to visualize the first 2 PCs.>>> import matplotlib.pyplot as plt
>>> plt.scatter(xyz_transformed[:, 0], xyz_transformed[:, 1],)
>>> plt.xlabel("PC1")
>>> plt.ylabel("PC2")
>>> plt.show()
Actually, as long as we could prepare our trajectory data into a certain format, we could
do the PCA calculation. We could use the C-C contact map, or all the dihedral angles (but have to transform them into sin and cos values first), or hydrogen bonding contact map, and feed the MxN matrix into scikit-learn.
4. What's more?
Acutally, there is a package for easy gromacs trajectory PCA analysis:https://www.github.com/zhenglz/dockingML
Download that package, install it, and then put the package bin path in your bashrc.
$ export PATH=/home/john/dockingML/bin:$PATH
$ # show help information
$ gmx_pca -h
$ # do diheral pca calculation, but have to prepare an index file contain the dihedral atom indices
$ gmx_pca.py -mode dihedral -f mytrajecotry.xtc -s reference.pdb -n dihedral.ndx -proj 10
Any issues or bugs, please leave me a comment. Thanks!
Where is the complete tutorial??
ReplyDelete