Large scale automated phylogenomical analysis of bacterial whole-genome isolates and the Evergreen platform release_or6rvy57gjdqjd6qtugulmduze

by Judit Szarvas, Johanne Ahrenfeldt, Jose L. Bellod Cisneros, Martin Christen Frølund Thomsen, Frank Aarestrup, Ole Lund

Released as a post by Cold Spring Harbor Laboratory.

2019  

Abstract

Public health authorities whole-genome sequence thousands of pathogenic isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and need for real-time results. We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. To decrease the computational burden, a two level clustering strategy is employed. The data is first divided into sets by matching each isolate to a closely related reference genome. The reads then are aligned to the reference to gain a consensus sequence and SNP based genetic distance is calculated between the sequences in each set. Isolates are clustered together with a threshold of 10 SNPs. Finally, phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are placed on a clade with the cluster representative sequence. The method was benchmarked and found to be accurate in grouping outbreak strains together, while discriminating from non-outbreak strains. The pipeline was applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating the phylogenetic trees as needed. It has so far placed more than 100,000 isolates into phylogenies, and has been able to keep up with the daily release of data. The trees are continuously published on https://cge.cbs.dtu.dk/services/Evergreen .
In application/xml+jats format

Archived Files and Locations

application/pdf  3.6 MB
file_3ztheejp6rb7lmd3tgg63ypdde
www.biorxiv.org (repository)
web.archive.org (webarchive)
application/pdf  3.6 MB
file_qdhvh67scfhbpg3iidixv26jla
www.biorxiv.org (repository)
web.archive.org (webarchive)
application/pdf  3.2 MB
file_smgoapqemjbtnjjsgby7i6f2ae
www.biorxiv.org (repository)
web.archive.org (webarchive)
application/pdf  3.2 MB
file_wxpsipkehbckpglmb7ymua22ue
web.archive.org (webarchive)
www.biorxiv.org (web)
Read Archived PDF
Preserved and Accessible
Type  post
Stage   unknown
Date   2019-02-05
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 75ea85f1-cd83-4116-b125-03a89d51ff0e
API URL: JSON