Random forest and variable importance rankings for correlated survival data, with applications to tooth loss

M. J. Hallett, J. J. Fan, X. G. Su, R. A. Levine, Martha E. Nunn

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Oral health is a significant issue for adults because of its relationship to quality of life, as well as systematic health and well being. Impaired oral health can lead to significant health problems, such as pain and infection. This article considers a tree-based method to assess tooth loss. In particular, a variable importance measure based on extremely randomized trees (Geurts et al., 2006) is proposed for correlated survival data, and is applied to the VA Dental Longitudinal Study. This new variable importance method aims to remove the bias of the traditional random forest variable selection, which may favour input variables with more categories, as shown by Strobl et al. (2007). The multivariate exponential tree algorithm of Fan et al. (2009) is used to build trees, as it has superior prediction accuracy and computational efficiency compared to marginal and semiparametric frailty model-based trees (Nunn et al., 2011). Simulation studies for assessing various variable importance methods are presented. To limit the final number of meaningful prognostic groups, an amalgamation procedure is used to develop tooth prognostic groups from a forest of trees. The resulting prognosis rules and variable importance rankings may be used in clinical practice to increase tooth retention and establish rational treatment plans. By ranking the relative importance of various clinical and genetic factors for tooth loss, we are able to provide clinicians with critical information so that they can develop and implement an effective treatment plan.

Original languageEnglish
Pages (from-to)523-547
Number of pages25
JournalStatistical Modelling
Volume14
Issue number6
DOIs
StatePublished - Dec 9 2014

Fingerprint

Correlated Data
Random Forest
Survival Data
Ranking
Random variable
Health
Frailty Model
Amalgamation
Quality of Life
Longitudinal Study
Prognosis
Semiparametric Model
Tree Algorithms
Pain
Variable Selection
Computational Efficiency
Infection
Simulation Study
Model-based
Prediction

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Random forest and variable importance rankings for correlated survival data, with applications to tooth loss. / Hallett, M. J.; Fan, J. J.; Su, X. G.; Levine, R. A.; Nunn, Martha E.

In: Statistical Modelling, Vol. 14, No. 6, 09.12.2014, p. 523-547.

Research output: Contribution to journalArticle

Hallett, M. J. ; Fan, J. J. ; Su, X. G. ; Levine, R. A. ; Nunn, Martha E. / Random forest and variable importance rankings for correlated survival data, with applications to tooth loss. In: Statistical Modelling. 2014 ; Vol. 14, No. 6. pp. 523-547.
@article{41bdcf2f86af442da556c7e8e1de033a,
title = "Random forest and variable importance rankings for correlated survival data, with applications to tooth loss",
abstract = "Oral health is a significant issue for adults because of its relationship to quality of life, as well as systematic health and well being. Impaired oral health can lead to significant health problems, such as pain and infection. This article considers a tree-based method to assess tooth loss. In particular, a variable importance measure based on extremely randomized trees (Geurts et al., 2006) is proposed for correlated survival data, and is applied to the VA Dental Longitudinal Study. This new variable importance method aims to remove the bias of the traditional random forest variable selection, which may favour input variables with more categories, as shown by Strobl et al. (2007). The multivariate exponential tree algorithm of Fan et al. (2009) is used to build trees, as it has superior prediction accuracy and computational efficiency compared to marginal and semiparametric frailty model-based trees (Nunn et al., 2011). Simulation studies for assessing various variable importance methods are presented. To limit the final number of meaningful prognostic groups, an amalgamation procedure is used to develop tooth prognostic groups from a forest of trees. The resulting prognosis rules and variable importance rankings may be used in clinical practice to increase tooth retention and establish rational treatment plans. By ranking the relative importance of various clinical and genetic factors for tooth loss, we are able to provide clinicians with critical information so that they can develop and implement an effective treatment plan.",
author = "Hallett, {M. J.} and Fan, {J. J.} and Su, {X. G.} and Levine, {R. A.} and Nunn, {Martha E.}",
year = "2014",
month = "12",
day = "9",
doi = "10.1177/1471082X14535517",
language = "English",
volume = "14",
pages = "523--547",
journal = "Statistical Modelling",
issn = "1471-082X",
publisher = "SAGE Publications Ltd",
number = "6",

}

TY - JOUR

T1 - Random forest and variable importance rankings for correlated survival data, with applications to tooth loss

AU - Hallett, M. J.

AU - Fan, J. J.

AU - Su, X. G.

AU - Levine, R. A.

AU - Nunn, Martha E.

PY - 2014/12/9

Y1 - 2014/12/9

N2 - Oral health is a significant issue for adults because of its relationship to quality of life, as well as systematic health and well being. Impaired oral health can lead to significant health problems, such as pain and infection. This article considers a tree-based method to assess tooth loss. In particular, a variable importance measure based on extremely randomized trees (Geurts et al., 2006) is proposed for correlated survival data, and is applied to the VA Dental Longitudinal Study. This new variable importance method aims to remove the bias of the traditional random forest variable selection, which may favour input variables with more categories, as shown by Strobl et al. (2007). The multivariate exponential tree algorithm of Fan et al. (2009) is used to build trees, as it has superior prediction accuracy and computational efficiency compared to marginal and semiparametric frailty model-based trees (Nunn et al., 2011). Simulation studies for assessing various variable importance methods are presented. To limit the final number of meaningful prognostic groups, an amalgamation procedure is used to develop tooth prognostic groups from a forest of trees. The resulting prognosis rules and variable importance rankings may be used in clinical practice to increase tooth retention and establish rational treatment plans. By ranking the relative importance of various clinical and genetic factors for tooth loss, we are able to provide clinicians with critical information so that they can develop and implement an effective treatment plan.

AB - Oral health is a significant issue for adults because of its relationship to quality of life, as well as systematic health and well being. Impaired oral health can lead to significant health problems, such as pain and infection. This article considers a tree-based method to assess tooth loss. In particular, a variable importance measure based on extremely randomized trees (Geurts et al., 2006) is proposed for correlated survival data, and is applied to the VA Dental Longitudinal Study. This new variable importance method aims to remove the bias of the traditional random forest variable selection, which may favour input variables with more categories, as shown by Strobl et al. (2007). The multivariate exponential tree algorithm of Fan et al. (2009) is used to build trees, as it has superior prediction accuracy and computational efficiency compared to marginal and semiparametric frailty model-based trees (Nunn et al., 2011). Simulation studies for assessing various variable importance methods are presented. To limit the final number of meaningful prognostic groups, an amalgamation procedure is used to develop tooth prognostic groups from a forest of trees. The resulting prognosis rules and variable importance rankings may be used in clinical practice to increase tooth retention and establish rational treatment plans. By ranking the relative importance of various clinical and genetic factors for tooth loss, we are able to provide clinicians with critical information so that they can develop and implement an effective treatment plan.

UR - http://www.scopus.com/inward/record.url?scp=84914666608&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84914666608&partnerID=8YFLogxK

U2 - 10.1177/1471082X14535517

DO - 10.1177/1471082X14535517

M3 - Article

VL - 14

SP - 523

EP - 547

JO - Statistical Modelling

JF - Statistical Modelling

SN - 1471-082X

IS - 6

ER -