Random forest and variable importance rankings for correlated survival data, with applications to tooth loss

M. J. Hallett, J. J. Fan, X. G. Su, R. A. Levine, Martha E. Nunn

Research output: Contribution to journalArticle

7 Scopus citations

Abstract

Oral health is a significant issue for adults because of its relationship to quality of life, as well as systematic health and well being. Impaired oral health can lead to significant health problems, such as pain and infection. This article considers a tree-based method to assess tooth loss. In particular, a variable importance measure based on extremely randomized trees (Geurts et al., 2006) is proposed for correlated survival data, and is applied to the VA Dental Longitudinal Study. This new variable importance method aims to remove the bias of the traditional random forest variable selection, which may favour input variables with more categories, as shown by Strobl et al. (2007). The multivariate exponential tree algorithm of Fan et al. (2009) is used to build trees, as it has superior prediction accuracy and computational efficiency compared to marginal and semiparametric frailty model-based trees (Nunn et al., 2011). Simulation studies for assessing various variable importance methods are presented. To limit the final number of meaningful prognostic groups, an amalgamation procedure is used to develop tooth prognostic groups from a forest of trees. The resulting prognosis rules and variable importance rankings may be used in clinical practice to increase tooth retention and establish rational treatment plans. By ranking the relative importance of various clinical and genetic factors for tooth loss, we are able to provide clinicians with critical information so that they can develop and implement an effective treatment plan.

Original languageEnglish
Pages (from-to)523-547
Number of pages25
JournalStatistical Modelling
Volume14
Issue number6
DOIs
Publication statusPublished - Dec 9 2014

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this