When Do Latent Class Models Overstate Accuracy for Binary Classifiers?: With Applications to Jury Accuracy, Survey Response Error, and Diagnostic Error (WP-08-10)
Bruce D. Spencer
Latent class models (LCMs) are increasingly used to assess the accuracy of binary classifiers, such as medical diagnostic tests for the presence of a condition, when there is no “gold standard” available and hence the true classification is unknown. LCMs are also used in nonmedical contexts, e.g., for assessing the accuracy of verdicts in criminal cases and for assessing the accuracy of responses to survey questions about drug use, employment status, etc. LCMs posit a relation between observed classifications and unobserved latent classes. When the latent class is treated as the true class, the LCMs provide measures of components of accuracy including the type I and type II error rates. In practice, however, the latent class will differ from the true class and the type I and type II error rates are misspecified by the LCM. A result of Uebersax (1988) implies that under widely applicable conditions, but when covariates are not relevant, LCMs in effect construct the latent class to yield minimum estimates of type I and type II error rates and as a result the LCM estimates of those error rates are optimistic. Spencer derives new lower bounds for the difference between the true type I and type II error rates and those from the LCM; the bounds are applicable and estimable even when covariates are present. The results are important for interpreting the results of latent class analyses. In addition, a total error model is presented that incorporates and error component from invalidity in the LCM.