PadChest: A large chest x-ray image dataset with multi-label annotated
reports
release_uuhka6akyrhr7orlppbgymxjsy
by
Aurelia Bustos, Antonio Pertusa, Jose-Maria Salinas, Maria de la
Iglesia-Vayá
2019
Abstract
We present a labeled large-scale, high resolution chest x-ray dataset for the
automated exploration of medical images along with their associated reports.
This dataset includes more than 160,000 images obtained from 67,000 patients
that were interpreted and reported by radiologists at Hospital San Juan
Hospital (Spain) from 2009 to 2017, covering six different position views and
additional information on image acquisition and patient demography. The reports
were labeled with 174 different radiographic findings, 19 differential
diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and
mapped onto standard Unified Medical Language System (UMLS) terminology. Of
these reports, 27% were manually annotated by trained physicians and the
remaining set was labeled using a supervised method based on a recurrent neural
network with attention mechanisms. The labels generated were then validated in
an independent test set achieving a 0.93 Micro-F1 score. To the best of our
knowledge, this is one of the largest public chest x-ray database suitable for
training supervised models concerning radiographs, and the first to contain
radiographic reports in Spanish. The PadChest dataset can be downloaded from
http://bimcv.cipf.es/bimcv-projects/padchest/.
In text/plain
format
Archived Files and Locations
application/pdf 4.5 MB
file_zgbgbowp45awrphtir45t2qkau
|
arxiv.org (repository) web.archive.org (webarchive) |
1901.07441v2
access all versions, variants, and formats of this works (eg, pre-prints)