Large scale automated phylogenomical analysis of bacterial whole-genome isolates and the Evergreen platform
release_or6rvy57gjdqjd6qtugulmduze
by
Judit Szarvas, Johanne Ahrenfeldt, Jose L. Bellod Cisneros, Martin Christen Frølund Thomsen, Frank Aarestrup, Ole Lund
2019
Abstract
Public health authorities whole-genome sequence thousands of pathogenic isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and need for real-time results. We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. To decrease the computational burden, a two level clustering strategy is employed. The data is first divided into sets by matching each isolate to a closely related reference genome. The reads then are aligned to the reference to gain a consensus sequence and SNP based genetic distance is calculated between the sequences in each set. Isolates are clustered together with a threshold of 10 SNPs. Finally, phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are placed on a clade with the cluster representative sequence. The method was benchmarked and found to be accurate in grouping outbreak strains together, while discriminating from non-outbreak strains. The pipeline was applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating the phylogenetic trees as needed. It has so far placed more than 100,000 isolates into phylogenies, and has been able to keep up with the daily release of data. The trees are continuously published on https://cge.cbs.dtu.dk/services/Evergreen .
In application/xml+jats
format
Archived Files and Locations
application/pdf 3.6 MB
file_3ztheejp6rb7lmd3tgg63ypdde
|
www.biorxiv.org (repository) web.archive.org (webarchive) |
application/pdf 3.6 MB
file_qdhvh67scfhbpg3iidixv26jla
|
www.biorxiv.org (repository) web.archive.org (webarchive) |
application/pdf 3.2 MB
file_smgoapqemjbtnjjsgby7i6f2ae
|
www.biorxiv.org (repository) web.archive.org (webarchive) |
application/pdf 3.2 MB
file_wxpsipkehbckpglmb7ymua22ue
|
web.archive.org (webarchive) www.biorxiv.org (web) |
post
Stage
unknown
Date 2019-02-05
10.1101/540138
access all versions, variants, and formats of this works (eg, pre-prints)
Crossref Metadata (via API)
Worldcat
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar