Article Text

## Abstract

The ability to predict patient visits to emergency departments (ED) is crucial for designing strategies aimed at avoiding overcrowding. A good working knowledge of the mathematical models used to predict patient volume and of their results is therefore essential. Articles retrieved by a Medline search were reviewed for studies designed to predict patient attendance at ED or walk-in clinics. Nine studies were identified. Most of the models used to predict patient volume were either linear regression models including calendar variables or time series models. These models explained 31–75% of patient-volume variability. Although the day of the week had the strongest effect, this variable explained only part of the variability. Other causes of this variability are to be defined. However, the performance of the models was good, with errors ranging from 4.2% to 14.4%. Adding meteorological data failed to improve model performance. The mathematical methods developed to predict ED visits have a low rate of error, but the prediction of daily patient visits should be used carefully and therefore does not allow day-to-day adjustments of staff. ED directors or managers should be aware of the model limitations. These models should certainly be used on a larger scale to assess future needs.

## Statistics from Altmetric.com

Overcrowding and waiting times in emergency departments (ED), which reflect healthcare system dysfunction,1 2 are generating considerable research interest.3^{–}6 ED are experiencing increasing stress from year to year.7 The many challenges faced by ED, which are largely similar across industrialised countries, include the increasing complexity of health conditions managed in ED, inpatient bed shortages, increasing patient volumes, provision in the ED of aggressive treatments aimed at making admission unnecessary and long outpatient wait times to obtain diagnostic investigations.8 9 The number of patient visits has been rising steadily for years. In 1994 in the UK, the increase was estimated at 3–5% per year and 40% of admissions occurred via the ED,3 whereas a 27% increase in ED patient volume was noted in California, USA, in the 1990s.5 Recent research on ED overcrowding has taken into account the number of patients in the ED,10 the impact of patient surges11 and the techniques used to measure ED overcrowding.12 The development of models on emergency census10 indicates a need for predicting patient visits more accurately, even if overcrowding is widely linked to inpatient bed shortages. Therefore, ED are a key aspect of hospital boarding, and the planning of admissions calls increasingly upon mathematical techniques. Negotiations between administrators and medical teams should refer to a shared knowledge of these techniques and their known results in similar applications.

The objective of this article is to review the mathematical models designed to improve the prediction of fluctuations in patient visits. We will not deal with syndromic surveillance aimed at detecting outbreaks of a specific disease.13

Our review is aimed at presenting forecasting techniques to ED physicians in order to help them in planning activities and thus in anticipating medical and nurses needs to match demand related to the number of ED visits.

We will first describe the various statistical models used to predict patient visits to ED or walk-in clinics, then we will compare the objectives and results of these studies in an effort to highlight the impact of predictions on the organisation of ED or on their integration within the overall healthcare system. Improved knowledge of the mechanisms whose combined effects cause ED overcrowding can allow ED physicians to participate in decisions about ED operations.

## MATERIALS AND METHODS

### Article identification and selection

We looked for studies of mathematical models designed to predict patient visits to ED or walk-in clinics. First, we searched Medline (via PubMed). None of our search strategies designed to achieve both high sensitivity and high specificity were successful. We elected to give preference to sensitivity. The search strategy “Emergency Service [mh] AND (forecasting [all] or scheduling) AND (simulation OR models, theoretical [mh])” retrieved 201 articles; date of access was 23 September 2007. Based on the abstracts, we selected the articles whose main objective involved predicting patient visits to ED or walk-in clinics. We then excluded articles focusing on topics peripheral to this main objective, such as disease-scoring or admission-prediction tools, syndromic surveillance and computer simulations based on prediction models. These criteria selected six14^{–}19 of the 201 articles. From the reference lists of the selected articles, we identified two additional, older studies.20 21 Finally, another article22 was found by using the Science Citation Index (Web of Science, Thomson Reuters, New York, USA) to look for publications citing any of the eight selected articles. Our study is thus based on nine articles.

## FORECASTING TECHNIQUES

### Approaches used to model patient visits to ED

The number of patient visits is now usually recorded automatically in ED in industrialised countries. As an example, fig 1 shows patient visits to the ED of a teaching hospital (Hôpital Avicenne, Bobigny, France) over a 2-year period. The mean number of patient visits was 83 per day. Plotting the number of visits per day over time shows a noisy signal not easily understandable, which does not appear very seasonal or cyclical.

### Mathematical techniques

In the studies retrieved by our search, two types of techniques were used to model patient visits to ED. One method consists of looking for correlations, which are often linear, between patient visits and a number of independent variables, such as calendar variables or weather patterns. The second method views patient visits as a time series and predicts future values from past values.

#### Conventional linear regression models

Although many variants of linear regression models exist, they all lead to an equation of the type

where Npats is the number of patients per day estimated by the model, α is a constant, **X** is the vector of the k variables taken into account (day of the week for instance), β is the weighting coefficient of each of these variables and e is a random component added to the model.

To select the variables that should be included in the regression model, one method is backward stepwise selection, in which the initial inclusion of all the variables into the model is followed by the exclusion of one non-significant variable at a time, until only variables significantly associated with patient visits (Npats) are left.

#### Time series analysis

Time series analysis is employed in forecasting techniques, but the underlying models are more difficult to understand, justifying a review of their elaboration technique.

Time series models usually estimate patient visits as the net result of three components: (1) long-term trends; (2) short-lived and often cyclical changes (related for instance to season, weather, or day of the week) and (3) the effects of unexpected, random events. A specific challenge encountered when analysing patient visits to ED is that both season and day of the week may lead to cyclical variations. The system underlying the time series (which is unknown) is considered as a sum of shocks affecting the time series, the observed variations corresponding to the responses to these shocks.23

The autoregressive integrated moving average (ARIMA) model24 is among the most widely used time series methods. The model is characterised by three processes, with three corresponding parameters, p, d and q. The autoregressive process treats each value in the time series as a linear function of p past values. The integration process (I) relates to the long-term trend of the time series and assumes that the difference between two consecutive values is invariable: the usual method consists of calculating a new series by subtracting the past value from the current given value and in that case, the parameter d of the process is equal to 1 (when repeating this calculation on this new series—this is only occasionally useful—d = 2). The moving average process assumes that each value depends not only on the error intrinsic to the value, but also on the sum of the errors affecting q past values. Graphical representations of the autocorrelation function and of the partial autocorrelation function of the series provide visual clues about the values of p, d and q.25 As an example, fig 2 shows how two parameter sets model signals represented in fig 1. The parameter values of fig 2(B) provide a more accurate prediction than those of fig 2(A).

### Estimating and evaluating predictions

Evaluating the performance of a patient-visit model usually involves the following steps (fig 3). First, ED patient-visit data are collected over a predefined period of time and at a predefined frequency (usually hourly or daily). The sample obtained is divided into two sets (fig 3, step 1), the training set and the validation set. The training set serves to estimate the optimal values of the parameters used in the prediction equation, whereas the validation set serves to evaluate the performance of the model: the equation containing the parameter values estimated from the training set is used to predict patient visits (fig 3, step 2), and these predicted visits are compared with the visits measured in the validation set (fig 3, step 3).

When reviewing criteria used for assessing model performance, we identified three main criteria: the percentage of variability (R^{2}) in regression analyses, the mean absolute percentage error (MAPE) and the root mean square error (RMSE). R^{2} corresponds to the percentage of explained variability by the model. A higher R^{2} value indicates a better model accuracy. MAPE is the mean of the absolute differences between predicted values and measured values, these differences being expressed as the percentages of the measured values. RMSE is the root squared mean of the squared differences between model and observed values. Low values of RMSE and MAPE indicate good model performance; that is, a good match between predicted and measured values. The value of RMSE depends on the mean value of the variable, which hinders comparisons of models for predicting variables that have different mean values. We believe that MAPE is easier to understand for comparison purposes in this situation.

Periodic fluctuations can also be evaluated using spectral analysis, most notably fast Fourier transform.26 In this mathematical approach, any signal or function is viewed as the sum of several cyclical functions. An advantage is that this purely mathematical method makes no assumptions about the variable. However, the results are given in the frequency domain, which complicates the interpretation of events and of the relations linking events.

## RESULTS

### Data used for patient-visit modelling and methods for reporting results

Studies of patient-visit models used data recorded in ED or walk-in clinics. The daily number of patient visits was generally used, over a period ranging from 1 to 10 years (usually 3 years). Covariables entered into conventional linear models included day of the week, month, season, whether the day was a non-workday or the day after a non-workday (or the day before a non-workday in one study) and vacation periods; obviously, seasons and holidays varied between countries. The long-term patient-visit trend was usually evaluated based on the chronological order of the day in the sample (with the first day having the value 1, the hundredth day the value 100, etc). Meteorological data consisted of temperature and precipitation.

Table 1 summarises the prediction models used in the nine selected articles. Daily data were used in five articles14 17 18 20 21 and monthly data in two articles.15 22 The data collection period varied widely, from a few months14 19 to nearly 10 years.16 17 22 Studies that used linear regression models14 18 20 21 consistently found that the day of the week was the best predictor of patient visits. Patient visits were thus significantly greater on the day following the weekly day off (Monday in occidental countries). Prediction errors were smaller for walk-in clinics than for ED. Patient visits seemed far steadier in walk-in clinics. R^{2} values were significant, even if in the best cases the models failed to account for approximately 25% of the variability. MAPE was 4.23%—a good value—in a study of a healthcare clinic in which patient visits were similar from week to week,22 with little random variation, compared with approximately 10% for ED visits.17 When available, MAPE and RMSE ranged from 4.2% to 14.4%, indicating a good statistical predictability. Furthermore, these studies indicate considerable random variability in patient visits to ED, even compared with those in walk-in clinics. Looking at the mean level of visits, the explained variability is higher in centres with the highest activity. Meteorological data failed substantially to improve the reliability of the models. In addition, meteorological data are of limited usefulness for long-term predictions, as they lack reliability over long periods.

The main objective of forecasting is adjusting staffing to activity. However, only one paper14 gives numerical data regarding the improvement of staffing, with a 18.5% decrease of “left without being seen” as a surrogate marker.

## DISCUSSION

This systematic review describes the models for forecasting the number of ED visits. In order to guarantee that the techniques described were of medical interest and available to practitioners, we limited our review to articles referenced in the Medline database. Most of the studies sought to predict daily patient visits, although a few focused on annual visits over 10 years,16 monthly visits,22 or hourly visits.19 These differences hinder comparisons of techniques and results. A reasonable assumption, however, is that annual visit prediction mainly involves detecting a trend, with limited influence of stochastic variations. Similarly, visits per month are probably reproducible, given the consistent role of season in visit variations.15 18 20 21 27 What is constant is the Monday (Sunday in the Israeli study18) effect in adult ED. Possible explanations include the return of patients from a weekend absence or the return of primary care practitioners to their office and sending their patients to ED.14

Covariables that express cyclical patterns may change over time. Over the long periods needed for time series studies, changes may occur in patterns of patient visits to ED or walk-in clinics. This possibility casts doubt on the validity of using very long time series for building ED patient-visit models. The wide fluctuations in patient visits are multifactorial. Seasonal variations often affect both the number of patient visits, a well-known example being the “winter crisis”,28 and the reasons for patient visits.29 In addition to seasonality, the existence of a weekly cycle has been firmly established.27 30 Other temporal variables may affect patient visits, including holidays, local events (eg, fairs and exhibits) and international events (eg, sporting events). Finally, whether the degree of air pollution or weather conditions affect patient visits remains controversial.31 Syndromic surveillance is not intended to predict patient visits, as disease-specific patient-visit patterns differ from the overall pattern32 and seasonality occurs for many diseases, including conditions that seem unrelated to infectious agents.29

Even though ED forecasting and simulation33 seem of great interest for ED managers, we found few studies designed to predict patient visits to ED and walk-in clinics in the medical literature and the results of published studies cannot be readily generalised to other ED, as pointed out consistently by the authors. Our review did not highlight significant performance differences between the various model approaches. Therefore, the calendar model14 appears to be the easiest model to understand and is also appropriate for communication to non-statistics professionals. The model requires only few calculations; it may be better suited to walk-in clinics than to ED, as the R^{2} of the model is higher for clinics than for ED.

The literature suggests that the objective of adjusting staffing patterns to forecasting can theoretically be accomplished. However, there is only limited room for manipulating the number of staff members available in the ED. Predicting the required number of ED staff members over several years is of obvious interest to healthcare facility managers, whereas month-to-month adjustments are difficult to achieve. Predicting the busiest days, on the other hand, may serve to adjust shift patterns and work schedules based on the expected workload. Caution is required, however, as the considerable contribution of unpredictable variations to patient visit fluctuations might result in staffing shortages and, therefore, in decreased quality of care, particularly as chronic understaffing of ED is common.7 ED staff must not only deal with new arrivals, unexpected peaks in arrivals leading to an increase of patient census for hours,11 but also manage patients who are already in the ED. There is a strong negative correlation between patients who are held up at the ED (known as “boarders”) and the number of available beds. The ED length of stay thus correlated linearly with hospital bed occupancy.34 35 In most studies, ED length of stay increased when hospital bed occupancy exceeded 90%. In contrast, controversy surrounds the potential impact on ED length of stay of the arrival density index, which takes into account both patient visits and clustering of patient arrivals.10 35 Organisation of an ED must be centered on the input–throughput–output concept,36 with a model adapted to the arrival patterns.10

## CONCLUSION

Studies aimed at predicting patient visits to ED use models that assume a stationary process. Under this assumption, model performance is very acceptable and should be a part of new models of emergency census with the limits of feasibility in the ED. Forecasting models should be studied on a larger scale to assess future needs in the short to mid-term regarding the good results for higher volumes.

## REFERENCES

## Footnotes

This review is dedicated to the memory of Dr Philippe Hoang The Dan.

**Competing interests:**None.