Original Research
Derivation and Validation of a Bayesian Network to Predict Pretest Probability of Venous Thromboembolism

https://doi.org/10.1016/j.annemergmed.2004.08.036Get rights and content

Study objective

A Bayesian network can estimate a numeric pretest probability of venous thromboembolism on the basis of values of clinical variables. We determine the accuracy with which a Bayesian network can identify patients with a low pretest probability of venous thromboembolism, defined as less than or equal to 2%.

Methods

Using commercial software, we derived a population of Bayesian networks from 25 input variables collected on 3,145 emergency department (ED) patients with suspected venous thromboembolism who underwent standardized testing, including pulmonary vascular imaging, and 90-day follow-up (11.0% of patients were venous thromboembolism positive). The best-fit Bayesian network was selected using a genetic algorithm. The selected Bayesian network was tested in a validation population of 1,423 ED patients prospectively evaluated for venous thromboembolism, including 90-day follow-up (8.0% were venous thromboembolism positive). The Bayesian network probability estimate was normalized to a score of 0% to 100%.

Results

Of 1,423 patients in the validation cohort, 711 (50%; 95% confidence interval [CI] 47% to 52%) had a score less than or equal to 2% that predicted a low pretest probability. Of these 711 patients, 700 (98.5%; 95% CI 97.2% to 99.2%) had no venous thromboembolism at follow-up.

Conclusion

A Bayesian network, derived and independently validated in ED populations, identified half of the validation cohort as having a low pretest probability (≤2%); 98.5% of these patients were correctly classified by the network.

Introduction

Prior probability plays a central role in decisionmaking for the diagnostic evaluation of venous thromboembolism.1, 2 Published studies of pretest probability for venous thromboembolism have focused mainly on scoring systems derived from logistic regression analysis or a point estimate from the solution to the logistic regression equation.3, 4, 5 Logistic regression has several methodologic shortcomings, including collinearity and overestimation of importance of uncommon covariates.6 Moreover, scoring systems cannot provide a percentage point estimate of pretest probability, which is necessary to compute the percentage point estimate of posttest probability after the results of diagnostic testing are revealed.7 Although the antilog transformation of a logistic regression can be solved for a point estimate of probability, most solved logistic equations rarely output a probability in the 0% to 5% range.8 Within this 0% to 5% pretest probability range, most physicians formulate the belief that testing for venous thromboembolism is warranted. A useful decision-support system must accurately report a point estimate of pretest probability in the 0% to 2% range, because in this range physicians can justify not testing for venous thromboembolism.9

The theoretical utility of Bayesian network analysis has been described for the diagnosis of venous thromboembolism.10 Briefly, Bayes' theorem brings quantitative reasoning to the bedside. For the condition of venous thromboembolism, the clinician approaches the patient and formulates a nebulous, unstructured belief about the possibility of venous thromboembolism. However, at the same time, multiple alternative hypotheses compete for that belief. The clinician then gathers information from the medical history and physical examination that either strengthens or weakens the belief about the diagnosis of venous thromboembolism. Bayes' theorem quantifies the degree of belief adjustment resulting from these observations. Figure 1 illustrates how 1 observation (the pulse oximetry reading) can modify the previous probability of venous thromboembolism.

We hypothesized that when the Bayesian network outputs a normalized score less than or equal to 2, the actual pretest probability (or prevalence) of venous thromboembolism would be 2% or less (≤2%). Thus, a score of less than or equal to 2 represents the equivalence of a “test negative” result—a pretest probability that could support the decision not to order additional diagnostic testing for venous thromboembolism.8

Section snippets

Materials and methods

In this report, we use a custom data-mining tool that uses a genetic algorithm to search all possible Bayesian networks that can be developed from 25 clinical variables collected from the history and physical examinations of 3,145 emergency department (ED) patients who were evaluated for venous thromboembolism. In accordance with methodology used by other experts, the term “venous thromboembolism” refers to a patient evaluated for clinically suspected pulmonary embolism and who was then treated

Results

The clinical characteristics of the derivation and validation population are shown in Table 1. This table shows the mean values for continuous variables (age and vital signs) and the frequencies of dichotomous nodes in both populations.

From the derivation population, the variables in Table 1 served as the input nodes used to construct a large number of potential Bayesian networks. Each candidate network was then tested for fitness using the genetic algorithm. Specifically, the genetic algorithm

Limitations

On the basis of conventional grading of decision-rule validity, this report provides only level III evidence.18 The data show that the derived Bayesian network can support the decision to rule out venous thromboembolism only in a patient population with a low disease prevalence. In the derivation population (prevalence 11%), 2.0% of patients with a score less than or equal to 2% had a venous thromboembolism. This outcome rate overlapped the computed test threshold for venous thromboembolism of

Discussion

Pretest probability assessment refers to a complex process of predicting the probability of an unknown feature—a disease—according to known features, such as signs and symptoms. In this report, we derive and test a Bayesian network to predict the pretest probability of venous thromboembolism using data obtained with a standard medical history and physical examination. This report represents the first step toward developing a Bayesian network tool to transform clinical data at the bedside into a

Acknowledgements

We thank Francis Fesmire, MD, for his contribution to this work.

The following institutions participated in the derivation study: Elizabeth G. Israel, MD, Barnes Hospital, St. Louis, MO; Christopher Kabrhel, MD, Carolinas Medical Center, Charlotte, NC; Jeffrey A. Kline, MD, Detroit Receiving Hospital, Detroit, MI; Brian J. O'Neil, MD, Henry Ford Hospital, Detroit, MI; David C. Portelli, MD, Mayo Clinic, Phoenix, AZ; Peter B. Richman, MD, Northwestern Memorial Hospital, Feinberg School of

References (23)

  • P.S. Wells et al.

    Derivation of a simple clinical model to categorize patients' probability of pulmonary embolism: increasing the models utility with the SimpliRED d-dimer

    Thromb Haemost

    (2000)
  • Cited by (32)

    • Medical idioms for clinical Bayesian network development

      2020, Journal of Biomedical Informatics
      Citation Excerpt :

      Building a BN involves two main tasks [6]: (1) determine the BN structure; and (2) specify the parameters. Both the BN structure and parameters can be built: (a) by automated learning from data if sufficient data are available [7,8,9]; (b) “by-hand” using knowledge elicitation methods to capture domain expert knowledge and extract necessary information from the literature [1,10,11,12]; or, (c) through a combination of both [13,14]. In many medical problems it is not feasible or appropriate to use automated techniques and the structure and/or parameters of the BN must be elicited from experts.

    • A review on evolutionary algorithms in Bayesian network learning and inference tasks

      2013, Information Sciences
      Citation Excerpt :

      Zhu et al. [140] proposed a MB-embedded GA for gene selection in microarray datasets and showed that using GA to search for the MB of the class variable results in higher classification accuracy. Kline et al. [73] showed the use of GAs to search for the most accurate BN structure to predict venous thromboembolism according to gathered data. Also in the field of BN classification, Peña [110] applied UMDA to search for the optimal dependency structure between predictor variables in unsupervised learning using a specific representation of BNs.

    • Comparing risks of alternative medical diagnosis using Bayesian arguments

      2010, Journal of Biomedical Informatics
      Citation Excerpt :

      diagnosis of specific diseases [2,13,23,27,28,40,43,47,57]; predicting risk of specific diseases [8,32,53,54]; predicting specific medical outcomes [19,25,31,52,56];

    • Future Developments in Chest Pain Diagnosis and Management

      2010, Medical Clinics of North America
      Citation Excerpt :

      Various new techniques are being used to provide an actual point-estimate of PTP of PE, over and above the physician's usual unstructured Gestalt estimate, assisted by validated scoring systems such as the Canadian (Wells) or Geneva scores. These methods include a back-transformed logistic regression equation, and nonlinear models such as artificial intelligence or Bayesian Network analysis.105–107 PREtest ConsultPE (PREtest Consult Inc, Charlotte, NC) is a novel computerized method of pretest probability assessment derived from a process called attribute matching.

    View all citing articles on Scopus

    Author contributions: JAK and AJN conceived the study. JAK, PBR, CK, and DMC obtained study funding and supervised data collection, including quality control. JAK, PBR, CK, and DMC contributed substantially to data collection, data entry, and data analysis. AJN constructed and tested the Bayesian network. JAK performed other statistical analysis. JAK drafted the manuscript, and all authors contributed substantially to its revision. JAK takes responsibility for the paper as a whole.

    Funding and support: The authors report this study did not receive any outside funding or support.

    Reprints not available from the authors.

    View full text