Comparison of different scoring methods based on latent variable models of the PHQ-9: an individual participant data meta-analysis
release_dwtvsmpcazhinlxhkms2cmmdrq
by
felix fischer, Brooke Levis, Carl Falk, Ying Sun, John Ioannidis, Pim Cuijpers, Ian Shrier, Andrea Benedetti, Brett Thombs, the Depression Screening Data (DEPRESSD) PHQ Collaboration
Abstract
<jats:title>Abstract</jats:title>
<jats:sec id="S0033291721000131_sec_a1">
<jats:title>Background</jats:title>
Previous research on the depression scale of the Patient Health Questionnaire (PHQ-9) has found that different latent factor models have maximized empirical measures of goodness-of-fit. The clinical relevance of these differences is unclear. We aimed to investigate whether depression screening accuracy may be improved by employing latent factor model-based scoring rather than sum scores.
</jats:sec>
<jats:sec id="S0033291721000131_sec_a2" sec-type="methods">
<jats:title>Methods</jats:title>
We used an individual participant data meta-analysis (IPDMA) database compiled to assess the screening accuracy of the PHQ-9. We included studies that used the Structured Clinical Interview for DSM (SCID) as a reference standard and split those into calibration and validation datasets. In the calibration dataset, we estimated unidimensional, two-dimensional (separating cognitive/affective and somatic symptoms of depression), and bi-factor models, and the respective cut-offs to maximize combined sensitivity and specificity. In the validation dataset, we assessed the differences in (combined) sensitivity and specificity between the latent variable approaches and the optimal sum score (⩾10), using bootstrapping to estimate 95% confidence intervals for the differences.
</jats:sec>
<jats:sec id="S0033291721000131_sec_a3" sec-type="results">
<jats:title>Results</jats:title>
The calibration dataset included 24 studies (4378 participants, 652 major depression cases); the validation dataset 17 studies (4252 participants, 568 cases). In the validation dataset, optimal cut-offs of the unidimensional, two-dimensional, and bi-factor models had higher sensitivity (by 0.036, 0.050, 0.049 points, respectively) but lower specificity (0.017, 0.026, 0.019, respectively) compared to the sum score cut-off of ⩾10.
</jats:sec>
<jats:sec id="S0033291721000131_sec_a4" sec-type="conclusions">
<jats:title>Conclusions</jats:title>
In a comprehensive dataset of diagnostic studies, scoring using complex latent variable models do not improve screening accuracy of the PHQ-9 meaningfully as compared to the simple sum score approach.
</jats:sec>
In application/xml+jats
format
Archived Files and Locations
application/pdf 419.3 kB
file_3lfsy7m75fg2jepnefh2eouh5e
|
www.cambridge.org (publisher) web.archive.org (webarchive) |
access all versions, variants, and formats of this works (eg, pre-prints)
Crossref Metadata (via API)
Worldcat
SHERPA/RoMEO (journal policies)
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar