Understanding the principles of protein-protein interactions: Designing novel means for virtual proteomics
No Thumbnail Available
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Central University of Punjab
Abstract
Proteins are the basic functional units in the cellular world of life. They are nano-
machines programmed to associate with other biomolecules in order to enact an array
of molecular functions in response to biological events at cellular and system levels.
Understanding the biomolecular phenomenon governing such associations may
provide insights into the principles of protein chemistry that have a wide-range of
applications. In the current work, two databases (PPInS and NRDB) in which the
information of interacting protein chains from the experimentally determined protein-
protein complexes (PPCs) for which structural information in terms of SCOP
superfamily was available, is demarcated in the form of protein-protein interaction
interfaces (PPIIs) were developed. The PPIIs contained in these databases were made
available on a web server for public use. These were analysed w.r.t. physicochemical
and geometrical characteristics of PPI sites. With the belief that designing of
computational tools with prediction ability must be trained and tested on real instances
of the phenomenon for which it is designed, the analytical information obtained from
the analysis of PPIIs from NRDB was incorporated in development of a computational
tool, Anveshan, for prediction of putative protein-protein interaction (PPI) sites. The
training and test datasets for Anveshan development were obtained from the PPInS.
PPInS is a high-performance database of PPIIs in which atomic-level
information of the molecular interactions amongst various protein chains in PPCs
together with their evolutionary information in Structural Classification of Proteins
(SCOPe release 2.06), is made available. Total 32,468 PDB files representing X-ray
crystallized multimeric PPCs with structural resolution better than 2.5 Å were shortlisted
to demarcate the PPIIs. Total 111,857 PPIIs with approximately 32.24 million atomic
contact pairs were generated and made available on a web server, named PPInS,
(http://www.cup.edu.in:99/ppins/home.php) for on-site analysis and downloading
purpose. A non-redundant database (NRDB) of PPInS containing 2,265 PPIIs with over
1.8 million ACPs corresponding to the 1,931 PPCs was also designed by removing
structural redundancies at the level of SCOP superfamily (SCOP release 1.75) was
also designed to provide the foundation to the development of Anveshan.
All the PPIIs and PPIPs involved in both these databases were analysed w.r.t.
residues interface propensity (RIP), hydrophobic content, solvation free energy,
compactness of interacting residue’ neighbourhood, planarity, and depth index. The
PPIIs were also examined in the context of sequence similarity shared by the protein
chains involved in the PPII formation which revealed the presence of homodimers in
abundance in PDB. Therefore, prior to analysing the PPIIs w.r.t to other parameters,
PPIIs from both the databases were categorized in three PPII classes depicting the
low-sequence similarity (LSS), moderate-sequence similarity (MSS), and high-
sequence similarity (HSS) between the protein chains involved in PPIIs. Analysis
pertaining to RIP showed the presence of aliphatic and aromatic residues on interaction
sites in abundance and the least occurrence of charged residues (except Arg).
Physicochemical and structural analysis of PPIPs, initially, showed a significant
difference between their parametric scores w.r.t. all three PPII classes from PPInS and
NRDB. However, on removing less than 1% statistical outliers from each PPII class,
the parametric scores from all three classes of PPInS and NRDB converged to a
statistical indistinguishable common sub-range and followed the similar distribution
trends. This indicates that the principles of molecular recognition among proteins are
not driven by their sequence similarity and reinforces the importance of geometrical
and electrostatic complementarity as the main determinants for PPIs.
The parametric score obtained by analysing 4,530 PPIPs from NRDB w.r.t. their
RIP, their hydrophobic content and the amount of solvation free energy associated with
them provided the basis for the implementation of Anveshan. By applying Anveshan on
another dataset of 4,290 PPIPs from 2,145 PPIIs, the optimal range of these parametric
scores and protein-probe van der Waals energy of interaction was determined.
Subsequently, taking the optimal range of PPIP parametric scores and threshold for
protein-probe van derWaals energy of interaction into the consideration, the Anveshan
was tested on a blind dataset of 554 protein chains. Predicting 10 sites for each protein
chain and taking the best-predicted patch into account, Anveshan was successful in
predicting 69.67% sites correctly with at least 50% accuracy in both precision and
coverage separately. On predicting only one PPI site for each protein chain, sites
predicted by Anveshan on an average covered 21.91% of actual sites in them.
Analysing the sites predicted by SPPIDER, it was found that 22.7% of actual sites were
covered in predicted sites. However, on predicting two sites for each protein chain, the
percentage coverage of actual sites in the sites predicted by Anveshan exceeded two-
fold (i.e. 41.81%), thus making Anveshan a superior approach.
Description
Keywords
Protein-protein interactions, PPInS, NRDB, residue interface propensity
hydrophobicity, solvation free energy, depth, plarity, protrusion
Citation
Kumar, Vicky; Kulharia, Mahesh and Munshi, Anjana (2020) Understanding the principles of protein-protein
interactions: Designing novel means for virtual
proteomics