Abstract
To investigate the reliability and feasibility of six potential workplace-based assessment methods in general practice training: criterion audit, multi-source feedback from clinical and non-clinical colleagues, patient feedback (the CARE Measure), referral letters, significant event analysis, and video analysis of consultations. Performance of GP registrars (trainees) was evaluated with each tool to assess the reliabilities of the tools and feasibility, given raters and number of assessments needed. Participant experience of process determined by questionnaire. 171 GP registrars and their trainers, drawn from nine deaneries (representing all four countries in the UK), participated. The ability of each tool to differentiate between doctors (reliability) was assessed using generalisability theory. Decision studies were then conducted to determine the number of observations required to achieve an acceptably high reliability for “high-stakes assessment” using each instrument. Finally, descriptive statistics were used to summarise participants’ ratings of their experience using these tools. Multi-source feedback from colleagues and patient feedback on consultations emerged as the two methods most likely to offer a reliable and feasible opinion of workplace performance. Reliability co-efficients of 0.8 were attainable with 41 CARE Measure patient questionnaires and six clinical and/or five non-clinical colleagues per doctor when assessed on two occasions. For the other four methods tested, 10 or more assessors were required per doctor in order to achieve a reliable assessment, making the feasibility of their use in high-stakes assessment extremely low. Participant feedback did not raise any major concerns regarding the acceptability, feasibility, or educational impact of the tools. The combination of patient and colleague views of doctors’ performance, coupled with reliable competence measures, may offer a suitable evidence-base on which to monitor progress and completion of doctors’ training in general practice.
Similar content being viewed by others
References
Ackerman, E. W., & Mitchell, G. K. (2006). An audit of structured diabetes care in a rural general practice. Medical Journal of Australia, 185(2), 69–72.
Archer, J. C., Norcini, J., & Davies, H. A. (2005). Use of SPRAT for peer review of paediatricians in training. British Medical Journal, 330, 1251–1253.
Aveyard, P. (1997). Monitoring the performance of general practices. Journal of Evaluation in Clinical Practice, 3(4), 275–281.
Baker, R., Jones David, R., & Goldblatt, P. (2003). Monitoring mortality rates in general practice after Shipman. British Medical Journal, 326(7383), 274–276.
Campbell, L. M., Howie, J. G. R., & Murray, T. S. (1993). Summative assessment: A pilot project in the west of Scotland. British Journal of General Practice, 43, 430–434.
Campbell, L. M., Howie, J. G., & Murray, T. S. (1995). Use of videotaped consultations in summative assessment of trainees in general practice. British Journal of General Practice, 45(392), 137–141.
Crossley, J. G. M., Howe, A., Newble, D., Jolly, B., & Davies, H. A. (2001). Sheffield Assessment Instrument for Letters (SAIL): Performance assessment using outpatient letters. Medical Education, 35, 1115–1124.
Davis, M. H., Friedman, M., Harden, R. M., Howie, P., Ker, J., McGhee, C., Pippard, M. J., & Snadden, D. (2001). Portfolio assessment in medical students’ final examinations. Medical Teacher, 23, 357–366.
Eva, K. W. (2007). Putting the cart before the horse: Testing to improve learning. British Medical Journal, 334, 535.
Evans, R. G., Edwards, A., Evans, S., Elwyn, B., & Elwyn, G. (2007). Assessing the practicing physician using patient surveys: A systematic review of instruments and feedback methods. Family Practice, 24, 128–137.
Evans, R., Elwyn, G., & Edwards, A. (2004). Review of instruments for peer assessment of physicians. British Medical Journal, 328, 1240–1243.
Grant, A. J., Vermunt, J. D., Kinnersley, P., & Houston, H. (2007). Exploring students’ perceptions on the use of significant event analysis, as part of a portfolio assessment process in general practice, as a tool for learning how to use reflection in learning. BMC Medical Education, 7, 5. doi:10.1186/1472-6920-7-5.
Howie, J. G. R., Heaney, D. J., Maxwell, M., & Walker, J. J. (1998). A comparison of a Patient Enablement Instrument (PEI) against two established satisfaction scales as an outcome measure of primary care consultations. Family Practice, 15(2), 165–171.
Joshi, H. et al. (2007). Developing and maintaining an assessment system – a PMETB guide to good practice. January 2007. http://www.pmetb.org.uk/fileadmin/user/QA/Assessment/Assessment_system_guidance_0107.pdf (accessed 10.01.2008).
Lockyer, J. (2003). Multi-source feedback in the assessment of physician competencies. Journal of Continuing Education in the Health Professions, 23(1), 4–12.
Lough, J. R., & Murray, T. S. (2001). Audit and summative assessment: A completed audit cycle. Medical Education, 35(4), 357–363.
Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35, 382–385.
McKay, J., Bowie, P., & Lough, M. (2003). Evaluating significant event analyses: Implementing change is a measure of success. Education for Primary Care, 14, 34–38.
McKay, J., Murphy, D. J., Bowie, P., Schmuck, M., Lough, M., & Eva, K. W. (2007). Development and testing of an instrument for the formative peer assessment of significant event analyses. Quality and Safety in Health Care, 16, 150–153.
Mercer, S. W., & Howie, J. G. R. (2006). CQI-2, a new measure of holistic, interpersonal care in primary care consultations. British Journal of General Practice, 56(525), 262–268.
Mercer, S. W., McConnachie, A., Maxwell, M., Heaney, D. H., & Watt, G. C. M. (2005). Relevance and performance of the Consultation and Relational Empathy (CARE) Measure in general practice. Family Practice, 22(3), 328–334.
Mercer, S. W., Watt, G. C. M., Maxwell, M., & Heaney, D. H. (2004). The development and preliminary validation of the Consultation and Relational Empathy (CARE) Measure: An empathy-based consultation process measure. Family Practice, 21(6), 699–705.
Modernising Medical Careers (MMC). http://www.mmc.nhs.uk/pages/home (accessed 10.05.2007).
Multi-Source Feedback: 360° Team Assessment of Behaviour (TAB) West Midlands Deanery, UK. http://www.wmdeanery.org/Downloads/360download.asp (accessed 10.05.2007).
Murphy, D. J., Bruce, D. A., & Eva, K. W. (2008). Workplace-based assessment for general practitioners: Using stakeholder perception to aid blueprinting of an assessment battery. Medical Education, 42, 96–103.
National Office for Summative Assessment. First level assessor’s instructions and marking schedule. http://www.nosa.org.uk (accessed 17.02.2008)
National Office for Summative Assessment. http://www.nosa.org.uk/downloads/html/audit/marking.htm (accessed 22.05.2007).
Norcini, J. J., Blank, L. L., Arnold, G. K., & Kimball, H. R. (1995). The Mini-CEX (Clinical Evaluation Exercise): A preliminary investigation. Annals of Internal Medicine, 123(10), 795–799.
Pitts, J., Coles, C., & Thomas, P. (1999). Educational portfolios in the assessment of general practice trainers: Reliability of assessors. Medical Education, 33, 515–520.
Ram, P., Grol, R., Rethans, J. J., Schouten, B., van der Vleuten, C., & Kester, A. (1999). Assessment of general practitioners by video observation of communicative and medical performance in daily practice: Issues of validity, reliability, and feasibility. Medical Education, 33(6), 447–454.
Ramsay, P. G., Weinrich, M. D., Carline, J. D., Innui, T. S., Larson, E. B., & LoGerfo, J. P. (1993). Use of peer ratings to evaluate physician performance. The Journal of the American Medical Association, 269, 1655–1660.
Ramsey, P., & Wenrich, M. (1999). Peer ratings: An assessment tool whose time has come. Journal of General Internal medicine, 14, 581–582.
RCGP: Video assessment of consulting skills in 2008; Workbook and instructions http://www.rcgp.org.uk/the_gp_journey/mrcgp/video_workbook.aspx (accessed 13.01.2008).
Referral Advice. (2001). A guide to appropriate referral from general to specialist services. London: National Institute for Clinical Evidence (NICE).
Reznick, R., Smee, S., Rothman, A., Chalmers, A., Swanson, D., Dufresne, L., Lacombe, G., Baumber, J., Poldre, P., Lavasseur, L., et al. (1992). An objective structured clinical examination for the licentiate: Report of the pilot project of the Medical Council of Canada. Academic Medicine, 67, 487–494.
Roberts, C. (2002). Portfolio-based assessments in medical education: Are they valid and reliable for summative purposes? Medical Education, 36, 899–900.
Sargeant, J., Mann, K., & Ferrier, S. (2005). Making available a mentoring service to support physician feedback, reflection, learning and change, can increase acceptance and use of feedback. Medical Education, 39, 497–504.
Sargeant, J., Mann, K., Sinclair, D., van der Vleuten, C., & Metsemakers, J. (2007). Challenges in multi-source feedback: Intended and unintended outcomes. Medical Education, 41, 583–591.
Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2006). Challenges for educationalists. British Medical Journal, 333, 544–546.
Scottish Intercollegiate Guideline Network (SIGN). (1998). Report on a recommended referral document. Edinburgh: SIGN.
Scottish Revalidation Toolkit, RCGP Scotland. http://www.rcgp.org.uk/pdf/Complete%20Revalidation%20Toolkit%20(Read%20Only)%20PDF.pdf (accessed 10.05.2007).
Streiner, D. L., & Norman, G. R. (2003). Health measurement scales (3rd ed.). Oxford: Oxford Medical Publications.
Swanson, D., Norman, G. R., & Linn, R. I. (1995). Performance based assessment: Lessons from the health professions. Educational Researcher, 24, 5–12.
Tate, P., Foulkes, J., Neighbour, R., Campion, P., & Field, S. (1999). Assessing physicians’ interpersonal skills via videotaped encounters: A new approach for the Royal College of general practitioners membership examination. Journal of Health Communication, 4, 143–152.
van der Vleuten, C. P. M. (1996). The assessment of professional competence: Developments, research and practical implications. Advances in Health Sciences Education, 1, 41–67.
Verhulst, S. J., Colliver, J. A., Paiva, R. E., & Williams, R. G. (1986). A factor analysis study of first-year residents. Journal of Medical Education, 61, 132–134.
wbapilot@chs.dundee.ac.uk available http://www.dundee.ac.uk/gptraining
Williams, R. G., Verhulst, S., Colliver, J. A., & Dunnington, G. L. (2005). Assessing the reliability of resident performance appraisals: More items or more observations? Surgery, 137, 141–147.
Acknowledgements
The completion of the pilot was made possible thanks to the help and enthusiasm 171 GP registrars and staff from the Wales, Northern Ireland, Mersey, KSS, East Scotland, North and North East Scotland, South East Scotland and West Midlands Deaneries.
The authors would like to thank Mrs. Angela Inglis (Team Leader and Personal Assistant to Dr. David Bruce, GP Director in the East of Scotland Deanery) and her team, (Lee-Ann Troup, Linda Kirkcaldy, Susan Smith, Carol Ironside and Gill Ward) for their help, support, and contribution to the work contained in this paper.
© CARE SW Mercer, Scottish Executive 2004: The CARE Measure was originally developed by Dr. Stewart Mercer and colleagues as part of a Health Services Research Fellowship funded by the Chief Scientist Office of the Scottish Executive (2000–2003). The intellectual property rights of the measure belong to the Scottish Ministers. The measure is available for use free of charge for staff of the NHS and for research purposes, but cannot be used for commercial purposes. Anyone wishing to use the measure should contact and register with Stewart Mercer (email: stewmercer@blueyonder.co.uk).
© MSF Tool—NHS Education for Scotland 2005–2006: This two question Multi-Source Feedback (MSF) was developed by Drs. Douglas Murphy, David Bruce, and Kevin Eva on behalf of NHS Education Scotland (2005–2006). The measure is available for use free of charge for staff of the NHS and for research purposes, but cannot be used for commercial purposes. Anyone wishing to use the measure should contact and register with Douglas Murphy douglas.murphy@hotmail.co.uk or David Bruce david.bruce@nes.scot.nhs.uk.
Ethical approval: Formal application and submission of the research proposal was made and ethical approval granted for all of the work contained in this paper by NHS Ethics Committee (Glasgow West).
Conflict of interest and source of funding statement
NHS Education Scotland and The Royal College of General Practitioners (RCGP) funded this study. DM was and DB is employed by NHS Education Scotland. DM and SWM are supported by a Primary Care Research Career Award Chief Scientist Office, Scottish Executive Health Department. The RCGP had no role in study design, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data and had final responsibility for the decision to submit for publication. Contributors D. Murphy and K. Eva designed the studies. Data collection was done by D. Murphy and D. Bruce. Data were analysed by D. Murphy and K. Eva. Data were interpreted by D. Murphy, D. Bruce, S. Mercer and K. Eva. The manuscript was written by D. Murphy, D. Bruce, S. Mercer and K. Eva. All authors were involved in the decision to submit the manuscript for publication.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Murphy, D.J., Bruce, D.A., Mercer, S.W. et al. The reliability of workplace-based assessment in postgraduate medical education and training: a national evaluation in general practice in the United Kingdom. Adv in Health Sci Educ 14, 219–232 (2009). https://doi.org/10.1007/s10459-008-9104-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-008-9104-8