Abstract
Problems of the analysis of data with incomplete observations are all too familiar in statistics. They are doubly difficult if we are also uncertain about the choice of model. We propose a general formulation for the discussion of such problems and develop approximations to the resulting bias of maximum likelihood estimates on the assumption that model departures are small. Loss of efficiency in parameter estimation due to incompleteness in the data has a dual interpretation: the increase in variance when an assumed model is correct; the bias in estimation when the model is incorrect. Examples include non-ignorable missing data, hidden confounders in observational studies and publication bias in meta-analysis. Doubling variances before calculating confidence intervals or test statistics is suggested as a crude way of addressing the possibility of undetectably small departures from the model. The problem of assessing the risk of lung cancer from passive smoking is used as a motivating example.
Original language | English |
---|---|
Pages (from-to) | 459-495 |
Number of pages | 36 |
Journal | Journal of the Royal Statistical Society. Series B: Statistical Methodology |
Volume | 67 |
Issue number | 4 |
DOIs | |
Publication status | Published - 2005 |
Keywords
- Selection bias
- Publication bias
- Misspecified models
- Missingness at random
- Missing data
- Ignorable data
- Hidden confounders
- Coarsening
- MAXIMUM-LIKELIHOOD-ESTIMATION
- NON-IGNORABLE NONRESPONSE
- PUBLICATION BIAS
- NONIGNORABLE NONRESPONSE
- COMPETING RISKS
- PASSIVE SMOKING
- CLINICAL-TRIALS
- SELECTION BIAS
- EM ALGORITHM
- LUNG-CANCER