Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery release_d5yvcjrigfgk5d35qzr6idwe4e

by Danang Crysnanto, Hubert Pausch

Published in Genome Biology by Springer Science and Business Media LLC.

2020   Volume 21, Issue 1, p184

Abstract

The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. We develop the first variation-aware reference graph for an agricultural animal ( https://doi.org/10.5281/zenodo.3759712 ). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations.
In text/plain format

Archived Files and Locations

application/pdf  1.7 MB
file_4ha3yyn7zbeo3nmlm7mo2vsiui
genomebiology.biomedcentral.com (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2020-07-27
Language   en ?
Container Metadata
Open Access Publication
In DOAJ
In Keepers Registry
ISSN-L:  1474-7596
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 5a51df97-1017-4284-85e7-e436ae172d02
API URL: JSON