Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.22.525078v1?rss=1
Authors: Piette, O., Abia, D., Bastolla, U.
Abstract: MotivationEvolutionary inferences depend crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than protein sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships at the sequence level.
ResultsHere we investigate the mutual relationships between four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence similarity, fraction of superimposed backbones and contact overlap) and the corresponding alignments. Changes in protein sequences and structures are intimately correlated, but our results suggest that no individual measure can provide a complete and unbiased picture of changes in protein sequences and structure. Therefore, we propose a new hybrid measure of protein sequence and structure similarity based on Principal Components (PC_sim). Starting from an MSA, we obtain modified pairwise alignments (PA) based on PC_sim, and from them we construct a new MSA based on the maximal cliques of the PA graph. These alignments yield larger protein similarities and agree better with the Balibase "reference" MSA and with consensus MSA than alignments that target individual similarity measures. Moreover, PC_sim is associated with a divergence measure that correlates strongest with divergences obtained from individual similarities, which suggests that it can infer more accurate evolutionary divergences for the reconstruction of phylogenetic trees with distance methods.
Availabilityhttps://github.com/ugobas/Evol_div
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC