COLLAPSE: A representation learning framework for identification and characterization of protein structural sites release_t4h5hw2ds5alfkvgdqunrjuf5i

by Alexander Derry, Russ Altman

Released as a post by Cold Spring Harbor Laboratory.

2022  

Abstract

The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the PROSITE database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis.
In application/xml+jats format

Archived Files and Locations

application/pdf  4.6 MB
file_iy7tsczxynaj3jh66ghyzwy5ee
www.biorxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  post
Stage   unknown
Date   2022-07-22
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: b302583d-5f45-4810-9d5a-e11eaa96f024
API URL: JSON