Genotyping common, large structural variations in 5,202 genomes using pangenomes, the Giraffe mapper, and the vg toolkit release_z3lddejn45bvtd7634zpqy43sy

by Jouni Sirén, Jean Monlong, Xian Chang, Adam M Novak, Jordan Eizenga, Charles Markello, Jonas Andreas Sibbesen, Glenn Hickey, Pi-Chuan Chang, Andrew Carroll, David Haussler, Erik Garrison (+1 others)

Released as a post by Cold Spring Harbor Laboratory.



We introduce Giraffe, a pangenome short read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe, part of the variation graph toolkit (vg), maps reads to thousands of human genomes at around the same speed BWA-MEM maps reads to a single reference genome, while maintaining comparable accuracy to VG-MAP, vg's original mapper. We have developed efficient genotyping pipelines using Giraffe. We demonstrate improvements in genotyping for single nucleotide variations (SNVs), insertions and deletions (indels) and structural variations (SVs) genome-wide. We use Giraffe to genotype and phase 167 thousands structural variations ascertained from long read studies in 5,202 human genomes sequenced with short reads, including the complete 1000 Genomes Project dataset, at an average cost of $1.50 per sample. We determine the frequency of these variations in diverse human populations, characterize their complex allelic variations and identify thousands of expression quantitative trait loci (eQTLs) driven by these variations.
In application/xml+jats format

Archived Files and Locations

application/pdf  3.0 MB
file_k53hyvfj7fbmdjv6jmae2huneu (repository) (webarchive)
application/pdf  10.1 MB
file_wraj7yc4xbhorbfqjxchxbdv7a (repository) (webarchive)
Read Archived PDF
Preserved and Accessible
Type  post
Stage   unknown
Date   2020-12-06
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 93a5dd24-58a3-4362-a63e-f360eb758035