It is common to assume that all data follow a multivariate Normal or Student-T distribution
\((π₯_1,π₯_2,π¦) \sim MVN (π,"Ξ£" )\)
For now, we consider that
How to use the conditional mean?
\(π_1^β\) is the most likely value for \(π₯_1\), and can thus be used as imputation.
\(π_1^β\) is equal to the mean of \(π₯_1\) plus an adjustment.
Example
Multivariate distribution of patients presenting with lower respiratory tract infections in primary care:
How to use the conditional variance?
βVarβ \((π₯_1βπ₯_2, π¦)\) quantifies the variance of \(π_1^β\) , and can be used to draw multiple imputations. In particular, we can sample an imputed value from a Normal distribution with mean \(π_1^β\) and variance βVarβ \((π₯_1βπ₯_2, π¦)\)
βVarβ \((π₯_1βπ₯_2, π¦)\) is equal to the variance of \(π₯_1\) minus an adjustment. If there is little correlation between the predictors and outcomes, then the variance of imputed values for \(π₯_1\) is equal to the variance of \(π₯_1\) in the original population.
So, how to generate an imputed dataset?
An iterative procedure is needed:
This approach is known as the Gibbs sampler
A natural choice for the initial estimates of ΞΌ and Ξ£ is to derive them directly using the complete data only.
Illustration of the Gibbs sampler
How to ensure that we end up in the posterior distribution?
Allow for sufficient imputation cycles!
Repeat the whole process from different starting points
## Imputation via Joint Modelling
Final considerations
Normality assumptions may not always be realistic
Several extensions have been proposed to accommodate for this