Accurate construction of long range haplotype In unrelated individuals
release_c22geyhycva2rmx553tcebexo4
by
Nicholas A Johnson, Stephanie J. London, Isabelle Romieu, Wing H. Wong, Hua Tang
Abstract
Haplotype, or the sequence of alleles along a single chromosome, has important applications in phenotype-genotype association studies, as well as in population genetics analyses. Because haplotype cannot be experimentally assayed in diploid organisms in a high-throughput fashion, numerous statistical methods have been developed to reconstruct probable haplotype from genotype data. These methods focus primarily on accurate phasing of a short genomic region with a small number of markers, and the error rate increases rapidly for longer regions. Here we introduce a new phasing algorithm, emphases, which aims to improve long-range phasing accuracy. Using datasets from multiple populations, we found that emphases reduces long-range phasing errors by up to 50% compared to the current state-of-the-art methods. In addition to inferring the most likely haplotypes, emphases produces confidence measures, allowing downstream analyses to account for the uncertainties associated with some haplotypes. We anticipate that emphases offers a powerful tool for analyzing large-scale data generated in the genome-wide association studies (GWAS).
In text/plain
format
Archived Files and Locations
application/pdf 649.4 kB
file_ohephxjqj5hlbe6dgixoux3qm4
|
web.archive.org (webarchive) www3.stat.sinica.edu.tw (web) |
access all versions, variants, and formats of this works (eg, pre-prints)
Crossref Metadata (via API)
Worldcat
SHERPA/RoMEO (journal policies)
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar