Background and objective The hypothesis of the present work derives from clinical experience that suggests that patients who are more ill have less facial expression variability in response to emotional cues.
Methods Prospective study of diagnostic accuracy from a convenience sample of adult patients with dyspnoea and chest pain in an emergency department. Patients viewed three stimulus slides on a laptop computer that were intended to evoke a change in facial affect. The computer simultaneously video recorded patients’ facial expressions. Videos were examined by two independent blinded observers who analysed patients’ facial expressions using the Facial Action Coding System (FACS). Patients were followed for predefined serious cardiopulmonary diagnosis (Disease+) within 14 days (acute coronary syndrome, pulmonary embolism, pneumonia, aortic or oesophageal disasters or new cancer). The main analysis compared total FACS scores, and action units of smile, surprise and frown between Disease+ and Disease−.
Results Of 50 patients, 8 (16%) were Disease+. The two observers had 92% exact agreement on the FACS score from the first stimulus slide. During stimulus slide 1, the median of all FACS values from Disease+ patients was 3.4 (1st–3rd quartiles 1–6), significantly less than the median of 7 (3–14) from D−patients (p=0.019, Mann–Whitney U). Expression of surprise had the largest difference between Disease+ and Disease−(area under the receiver operating characteristic curve 0.75, 95% CI 0.52 to 0.87).
Conclusions With a single visual stimulus, patients with serious cardiopulmonary diseases lacked facial expression variability and surprise affect. Our preliminary findings suggest that stimulus-evoked facial expressions from emergency department patients with cardiopulmonary symptoms might be a useful component of gestalt pretest probability assessment.
- clinical assessment
- pulmonary embolism
Statistics from Altmetric.com
It has been previously shown that adults who present with symptoms of chest pain and shortness of breath evoke concern for several life threatening conditions including pneumonia, acute coronary syndrome (ACS), pulmonary embolism (PE), heart failure, pneumothorax, mediastinal disease processes and uncommonly, aortic dissection.1 Rapid and accurate diagnosis can improve outcome. Decision aids and prediction rules have been developed which convert objective data gathered at the bedside into numeric values to estimate the pretest probability of specific diseases, such as ACS and PE.2 Pretest probability assessment is of major importance to care processes. For example, a high pretest probability justifies immediate treatment and definitive, often costly and invasive testing; lower estimates can be used to justify the use of less expensive, less invasive diagnostic tests; and the lowest estimates can be used to avoid testing altogether. Pretest probability can be assessed several ways, including scoring systems, computerised methods, Bayesian network methodology or by implicit estimation.3–6 Many physicians prefer to use the implicit or empiric approach, alternatively referred to as the gestalt method of pretest probability assessment for both ACS and PE.7–9 Gestalt reasoning appears to appeal to physicians because it is internally generated, without the need for a reference, and can invoke human thought processes, making it highly adaptable between patients.
In 2011, a meta-analysis of pretest probability found that the unstructured, or clinical gestalt method for PE had similar diagnostic accuracy compared with objective methods.2 In general, the addition of gestalt assessment appeared to improve performance of decision rules.10 ,11 Many of the so-called objective decision rules for both ACS and PE actually contain a subjective variable that requires the user to assess whether the patient has a presentation that is typical for ACS or PE.3 ,12 ,13 These variables often contribute heavily to rule performance.14 These data indicate a strong basis for adding gestalt to pretest probability assessment. However, some evidence has suggested that the accuracy of gestalt reasoning increases with clinical experience, which introduces inter-clinician variability that causes some physicians to be uncomfortable with its use.6 ,15–17A related criticism is that gestalt assessment requires implicit and hidden human thought processes. Thus, gestalt reasoning may have pragmatic benefits, but at a cost of problems with transparency of how it is estimated. Taken together, these issues argue the importance of elucidating the elements of gestalt reasoning. If these elements can be identified and made explicit, they can be used to teach physicians how to improve their accuracy at constructing a differential diagnosis and prioritising the need for diagnostic testing, including tests that impart ionising radiation.1
Clinical experience and prior literature support two concepts that motivated the analysis of facial expressions in patients with chest pain and shortness of breath. First, clinicians probably perceive emotional tension in the face of patients, and use it in their gestalt reasoning. Second, prior literature suggests a reciprocal relationship between the emotional state expressed by facial muscles and cardiopulmonary health. For instance, prior work has found that observers interpret ‘sick’ facial expressions as conveying negative emotions, including disgust, anger and contempt.18 A substantial body of literature has linked emotional tension in the face with vagal tone and airway conductance in patients with reactive airway disease.19 Furthermore, Rosenberg et al and Dalton et al found that patients with myocardial ischaemia had significantly more facial expressions of anger, non-enjoyment smiles and brow lowering than patients without myocardial ischaemia.20 ,21 In this work, we use the Facial Action Coding System (FACS) that measures facial movement on a numeric scale.22 The primary aims were to determine the maximum change in facial muscle contractions from baseline to that observed during reaction to visual stimuli, and test whether the numeric values of these deflections accurately predict the presence or absence of significant acute cardiopulmonary disease using receiver operating characteristic (ROC) curve analysis.
This was a single centre pilot study conducted in the emergency department from May to September 2011 at Carolinas Medical Center, an academic, urban tertiary care hospital. This study had approval from the institutional review board and all patients signed an informed consent form.
We enrolled a convenience sample of adult emergency department patients who were identified by survey of the emergency department's electronic tracking board from 9:00 to 16:00 during weekdays for the chief complaint of chest pain and shortness of breath. This work comprised a subset of patients enrolled in a larger study, described more extensively elsewhere.1 Exclusion criteria were known diagnosis, inability to understand the informed consent process because of acuity of illness (eg, symptomatic arterial hypotension or severe respiratory distress), intoxication, altered mental status, severe visual impairment, dementia or reasons to preclude follow-up. All patients had to have both complaints in either their present history or review of systems. Table 1 shows the clinical characteristics of the enrolled patient population. The only significant past medical history in this patient sample was prior chronic obstructive pulmonary disease and myocardial infarction. The median duration of chest pain prior to enrolment was 24 h (1st–3rd quartiles 5–72).
Video recordings of participants’ facial expressions were analysed using the FACS. The FACS is a manual coding procedure that assigns numbers to changes in facial muscle activity, referred to as action units (AUs).22 For example, AU1 represents the raising of the inner portion of the eyebrows, a visual change in facial appearance that occurs when the medial portion of the frontalis muscle is contracted. In addition to coding the presence/absence of AUs, FACS also scores the intensity of the appearance change on a five point scale, with A indicating only a trace of movement and E indicating maximum evidence of movement. FACS allows for coding of co-occurring AUs, termed events. Events more effectively describe facial configurations as a whole and capture unique interactions between muscle movements that only occur when two muscles are activated and contracted simultaneously. For the current study, facial configurations for smile, frown and surprise were analysed.
Diagnostic outcome of the patient
All participants in this study underwent CT pulmonary angiography (CTPA); its results plus results from 14-day follow-up were used to determine presence or absence of serious cardiopulmonary disease.1 All patients deemed disease negative (Disease−) were alive and well on telephone follow-up with no serious diagnosis. For the present study, explicit criterion standard definitions of newly diagnosed serious diseases (Disease+) included acute PE, ACS, aortic dissection, ventricular arrhythmias, dangerous mediastinal process (Boerhaave syndrome, pneumomediastium or large mass), pneumothorax or pneumonia as interpreted by a radiologist on objective imaging and followed by appropriate treatment actions by the clinical care team.1 ,23 ACS was diagnosed using published standards, including need for revascularisation (percutaneous or surgical) or myocardial infarction based on troponin elevated above the 99th percentile for the precision level of the test.24 Exacerbations of existing chronic conditions, including atrial fibrillation, heart failure, or existing lung disease were not considered new threats to life.
After informed consent, participants were given instructions. They were told they were going to be shown a series of slides, and that while they viewed the slides, their facial expressions would be recorded with the computer webcam. All patients watched the stimulus slideshow in 45° of semi-Fowler's position in a gurney. The study associate then positioned a small laptop computer (MacBook Air, Apple, Cupertino, California, USA) on a Mayo stand positioned approximately 24 inches in front of the subject. The computer provided an 11.6-inch diagonal LED-backlit screen with 1366×768, 16:9 resolution. The computer's webcam was used to record participants’ facial expressions. The computer was programmed using Mac OS X to demonstrate a five-slide presentation that included a text-only introduction slide, three stimulus slides (figure 1) and a closing slide. Each slide was shown for 10 s. The computer video recorded the patient's face in the centre of the screen while the subject viewed the slides. The stimulus slides were chosen to depict images consistent with specific emotional states; the first two stimulus slides were intended to convey humour, and the third to convey sadness.
FACS coding procedures
Two assessors who read and practised the entire FACS manual and methodology viewed the digitised video recordings for the FACS analysis.22 The assessors were instructed to obtain the FACS scores at the moment of maximal response in the first few seconds after the patient viewed the stimuli. FACS scores were computed for each stimulus slide, resulting in three sets of FACS scores for each participant. AU groups were analysed for clinically relevant component responses of the natural smile (movements AU2 [outer brow raiser], AU12 [lip corner puller], AU14 [dimpler], and AU25 [lips part]) and surprise (AU1 [inner brow raiser], AU2 [outer brow raiser] and AU5 [upper lid raiser]) in response to each stimulus. We also examined the frown (AU24 [brow lowerer]). The magnitude of movement for each AU group ranged from 0 (no deviation from neutral) to a maximum of 5 (maximum deviation from neutral). Thus, the maximum scores were 20, 15 and 5 for smile, surprise and frown, respectively. The assessors were blinded to each other's results and to the patient's outcome.
Interobserver agreement was tested with raw absolute agreement, weighted Cohen's κ and Spearman’s ρ statistic. Analyses to test our primary aim focused on the discriminative ability of the summed scores for the AU groups for natural smile, surprise and frown. Data were tested for normality with a Shapiro–Wilks test; non-normal data are presented as medians with 1st–3rd IQRs. Medians between Disease+ and Disease− groups were compared with the two-sided p value from the Mann–Whitney U test. Ability of facial expression scores to distinguish between Disease+ and Disease− was estimated with the area under the ROC curve. As a pilot study, no formal sample size calculation was possible, but we estimated 50 patients would include approximately 10 patients with life threatening illness and 40 without life threatening illness, allowing CIs around the ROC curve less than 0.3.
Interobserver reliability for FACS scoring
We enrolled 50 patients with videos observed by two independent coders. Table 2 presents the agreement and interobserver reliability data. The raw agreement and κ values were similar between the three facial expressions for the first stimulus slide. The absolute agreements and κ values tended to decrease for all three facial expressions with slides 2 and 3. The absolute agreement values and their 95% limits indicate significantly lower agreement between observers on slides 2 and 3. Therefore expressions measured coincident with slide 1 were used for ROC analysis.
Eight patients (16%) had a significant diagnosis within 14 days: three with pneumonia, two with PE, one with a ruptured thoracic aneurysm, one with myocardial infarction, and one with a new mediastinal malignancy. Among 42 patients considered Disease−, there were two with an acute exacerbation of chronic obstructive pulmonary disease, two with acute heart failure and one with atrial fibrillation. The most frequent diagnosis among the remaining patients was descriptive of chest pain or dyspnoea or both in 24 patients, followed by anxiety (4), hypertension (3), drug or alcohol use (2) and miscellaneous (4).
Comparison of FACS scores for Disease+ versus Disease− participants
To address the question of overall facial valence variability, for each patient, we summed the total of all FACS action unit scores for all three slides recorded by both observers for each patient. The ROC curve for the total FACS scores for each patient was 0.62 (95% CI 0.52 to 0.83). The median of all FACS values for all three slides from Disease+ patients was 13.5 (1st–3rd quartiles 6–32), compared with a median of 27.5 (9–48) for Disease− patients (p=0.14, Mann–Whitney U).We then considered each slide in sequence by examining the interobserver variability and medians of the FACS scores for the three facial expressions of interest. The first slide had significantly higher interobserver reliability (based on the 95% CI for the κ values) than the second and third slides. For slide 1, the median values for the expressions of surprise and frown were significantly different between Disease+ and Disease− patients. For slide 1, the median of all FACS values from Disease+ patients, 3.4 (1st–3rd quartiles 1–6), was significantly less than the median of 7 (3–14) from Disease– patients (p=0.019), and the area under the ROC was significant for surprise. For the second slide, the area under the curve was also significant for surprise. The median value for the expression of surprise was significant in the second slide (p=0.012), and none of the three expressions were different in the third slide.
Table 3 compares the FACS scores recorded by the two independent observers for stimulus slide 1 for the three facial expressions of interest (surprise, frown, smile). For the first stimulus slide (cartoon), both the surprise and frown expressions had significantly higher median values (summed for both observers) for Disease− patients compared with Disease+ patients (p=0.033 and p=0.022, respectively). The area under the ROC curve was significantly above random (0.5) for surprise. These data suggest increased probability of facial valence variability in response to visual stimuli among patients who had no significant cardiopulmonary disease, compared with patients who had a significant diagnosis. Moreover, the expression of surprise was the expression that best differentiated Disease− from Disease+ patients.
In this preliminary study, we found that patients with chest pain and dyspnoea who had a potentially serious cardiopulmonary diagnosis had significantly lower facial expression valence in response to a visual stimulus. The expression of surprise had the highest discriminative value for Disease+ versus Disease−. Taken another way, patients with serious cardiopulmonary diseases tended to hold their faces neutral when watching visual stimuli. We recorded the facial expressions of 50 patients and measured the FACS score in response to three brief stimulus slides in an experiment that lasted less than 60 s. We evaluated three facial expressions, surprise, frown and smile, that have been associated with cardiopulmonary disease, and we believed would be widely recognised by clinicians and easily generalisable in subsequent validation studies.19 ,20 We note that our Disease− patients included five with acute exacerbations of chronic disease that some might consider serious. Had we considered those patients as Disease+ or excluded them, the differences in facial expression variability between Disease+ and Disease− would have increased.
This work was motivated by years of research experience that has suggested a hidden layer in physician reasoning when they decide whether or not to prioritise life threatening conditions at the top of their differential diagnosis list.6 ,9 ,25 ,26 In particular, when research coordinators have asked clinicians how they arrived at a high gestalt pretest probability, their answers included the consideration of abnormal vital signs, significant past medical history and the explicit statement ‘he or she looked sick’. We emphasise that although all patients had chest pain as their reason for visiting the emergency department, our findings should not be equated to a study of affect expressed by patients in pain, because many had minimal chest pain at the time of image acquisition, as suggested by the median duration of chest pain of 24 h (1st–3rd quartiles 5–72 h). Because in most cases pain was minimal, detecting expressions of pain or lack thereof would not necessarily be expected to be informative. Consequently, we were more interested in their emotional responsiveness to affective stimuli. We believe that due to the gravity of their illness, Disease+ patients may not have been able to process and respond to an emotional stimulus the way that would be expected of most people under normal conditions. This may have resulted in less expression variability in our Disease+ patients.
The ultimate goal of this work is to provide clinicians with a new physical finding that can be associated with a healthy state to avoid unnecessary CTPA scanning. We would envision this finding to be documented as ‘normal affect variability’ or similar term in the general assessment of the patient. Thus, the present data allow at least three important inferences for future research. First, we found that two observers, who self-taught the FACS methodology, interpreted the videos with good overall blinded agreement in their assessments of the FACS scores for the first slide. The first slide, projected for only 10 s, provided the highest overall agreement between observers for the FACS scores. Second, the data suggest that a single stimulus can be used to elicit facial expressions that may be informative of a significant diagnosis. Third, among the three facial expressions examined, the expression representing surprise had the highest overall diagnostic accuracy. Future research can focus on the diagnostic accuracy of the expression of facial expression variability in response to a single visual stimulus, assessed with an automated method. Growing evidence that the human face reflects underlying health will promote the concept of video-based patient interviews (versus telephone) as a method of healthcare delivery.
This work is a first step towards unravelling hidden aspects of gestalt assessment of pretest probability.6 If these components could be made transparent and quantifiable, this could translate into important information in terms of education of emergency care providers, and development of more accurate and natural methods of pretest probability assessment for serious diseases. To accomplish this, we drew from the combination of common clinical practice (use of facial expressions of happiness to judge the lack of illness), and the well established biological connection between the autonomic nervous system and emotional content of facial expressions.18 ,20 ,27
Limitations to this work include the fact that its external validity lies only in its description of a potentially reproducible research methodology. This was a convenience sample from one hospital; a larger sample that more closely represents the total population of patients undergoing CTPA may not show the differences found here. We recognise that our list of life threatening illnesses may exclude some conditions that can cause death, such as a severe exacerbation of chronic obstructive pulmonary disease. However, the ultimate goal of this work is to define whether or not facial expressions can serve as a clue to new life threatening conditions, as opposed to existing medical conditions. Of note, the FACS values did have some overlap for Disease+ and Disease− patients, indicating that affect analysis will be inaccurate in some patients. It should also be noted that our examiners were not FACS certified, which may explain why the interobserver variability decreased with slides 2 and 3. Interobserver reliability for the FACS typically ranges from 0.73 to 0.83 within a certain window of viewing; the longer this window of viewing, the greater the likelihood that the observers will identify different time points for maximal facial expression, which will contribute to different ratings. The long window viewing of 10 s per slide was a limitation and probably contributed to this variability. Additionally, our stimulus slides were not standardised, which could account for smaller affective responses to slides 2 and 3, making the emotional responses to these stimuli more subtle and difficult to identify by uncertified coders. At this time, no inference can be made from these data about the diagnostic utility of facial expression analysis in clinical practice. Our methodology cannot discern potential differences in the FACS neutral face, which human perception may reliably detect. Lastly, we do not have large enough samples of patients with chronic cardiopulmonary disease to determine whether they have an increased false positive rate (ie, decreased affect variability) at baseline.
In response to a single visual stimulus, patients with serious cardiopulmonary diagnoses were more likely to have neutral facial expressions.
Contributors JAK conceived the study, wrote the protocol, oversaw data collection, performed the data analysis and wrote the manuscript. DN contributed to the study design, analysis and manuscript production. DK contributed to study design, performed programming, provided other technical support and edited the manuscript. MH and VK contributed to study design, performed data collection, provided other technical support and edited the manuscript.
Funding Supported by a grant from the Cannon Research Foundation.
Competing interests None.
Ethics approval Carolinas Medical Center IRB.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.