Assessing Phenotype Definitions for Algorithmic Fairness release_cdwqn76mbjdppe7e3jznxcbh74

by Tony Y. Sun, Shreyas Bhave, Jaan Altosaar, Noémie Elhadad

Released as a article .

2022  

Abstract

Disease identification is a core, routine activity in observational health research. Cohorts impact downstream analyses, such as how a condition is characterized, how patient risk is defined, and what treatments are studied. It is thus critical to ensure that selected cohorts are representative of all patients, independently of their demographics or social determinants of health. While there are multiple potential sources of bias when constructing phenotype definitions which may affect their fairness, it is not standard in the field of phenotyping to consider the impact of different definitions across subgroups of patients. In this paper, we propose a set of best practices to assess the fairness of phenotype definitions. We leverage established fairness metrics commonly used in predictive models and relate them to commonly used epidemiological cohort description metrics. We describe an empirical study for Crohn's disease and diabetes type 2, each with multiple phenotype definitions taken from the literature across two sets of patient subgroups (gender and race). We show that the different phenotype definitions exhibit widely varying and disparate performance according to the different fairness metrics and subgroups. We hope that the proposed best practices can help in constructing fair and inclusive phenotype definitions.
In text/plain format

Archived Files and Locations

application/pdf  2.6 MB
file_ppeosnvzmfhsdo2juc6h6o4orm
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2022-03-10
Version   v1
Language   en ?
arXiv  2203.05174v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 7561c2c7-f288-44e5-b523-ca89e3c745c6
API URL: JSON