Skip to main content
Log in

The reliability of workplace-based assessment in postgraduate medical education and training: a national evaluation in general practice in the United Kingdom

  • Original Paper
  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

To investigate the reliability and feasibility of six potential workplace-based assessment methods in general practice training: criterion audit, multi-source feedback from clinical and non-clinical colleagues, patient feedback (the CARE Measure), referral letters, significant event analysis, and video analysis of consultations. Performance of GP registrars (trainees) was evaluated with each tool to assess the reliabilities of the tools and feasibility, given raters and number of assessments needed. Participant experience of process determined by questionnaire. 171 GP registrars and their trainers, drawn from nine deaneries (representing all four countries in the UK), participated. The ability of each tool to differentiate between doctors (reliability) was assessed using generalisability theory. Decision studies were then conducted to determine the number of observations required to achieve an acceptably high reliability for “high-stakes assessment” using each instrument. Finally, descriptive statistics were used to summarise participants’ ratings of their experience using these tools. Multi-source feedback from colleagues and patient feedback on consultations emerged as the two methods most likely to offer a reliable and feasible opinion of workplace performance. Reliability co-efficients of 0.8 were attainable with 41 CARE Measure patient questionnaires and six clinical and/or five non-clinical colleagues per doctor when assessed on two occasions. For the other four methods tested, 10 or more assessors were required per doctor in order to achieve a reliable assessment, making the feasibility of their use in high-stakes assessment extremely low. Participant feedback did not raise any major concerns regarding the acceptability, feasibility, or educational impact of the tools. The combination of patient and colleague views of doctors’ performance, coupled with reliable competence measures, may offer a suitable evidence-base on which to monitor progress and completion of doctors’ training in general practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Ackerman, E. W., & Mitchell, G. K. (2006). An audit of structured diabetes care in a rural general practice. Medical Journal of Australia, 185(2), 69–72.

    Google Scholar 

  • Archer, J. C., Norcini, J., & Davies, H. A. (2005). Use of SPRAT for peer review of paediatricians in training. British Medical Journal, 330, 1251–1253.

    Article  Google Scholar 

  • Aveyard, P. (1997). Monitoring the performance of general practices. Journal of Evaluation in Clinical Practice, 3(4), 275–281.

    Article  Google Scholar 

  • Baker, R., Jones David, R., & Goldblatt, P. (2003). Monitoring mortality rates in general practice after Shipman. British Medical Journal, 326(7383), 274–276.

    Article  Google Scholar 

  • Campbell, L. M., Howie, J. G. R., & Murray, T. S. (1993). Summative assessment: A pilot project in the west of Scotland. British Journal of General Practice, 43, 430–434.

    Google Scholar 

  • Campbell, L. M., Howie, J. G., & Murray, T. S. (1995). Use of videotaped consultations in summative assessment of trainees in general practice. British Journal of General Practice, 45(392), 137–141.

    Google Scholar 

  • Crossley, J. G. M., Howe, A., Newble, D., Jolly, B., & Davies, H. A. (2001). Sheffield Assessment Instrument for Letters (SAIL): Performance assessment using outpatient letters. Medical Education, 35, 1115–1124.

    Article  Google Scholar 

  • Davis, M. H., Friedman, M., Harden, R. M., Howie, P., Ker, J., McGhee, C., Pippard, M. J., & Snadden, D. (2001). Portfolio assessment in medical students’ final examinations. Medical Teacher, 23, 357–366.

    Article  Google Scholar 

  • Eva, K. W. (2007). Putting the cart before the horse: Testing to improve learning. British Medical Journal, 334, 535.

    Article  Google Scholar 

  • Evans, R. G., Edwards, A., Evans, S., Elwyn, B., & Elwyn, G. (2007). Assessing the practicing physician using patient surveys: A systematic review of instruments and feedback methods. Family Practice, 24, 128–137.

    Article  Google Scholar 

  • Evans, R., Elwyn, G., & Edwards, A. (2004). Review of instruments for peer assessment of physicians. British Medical Journal, 328, 1240–1243.

    Article  Google Scholar 

  • Grant, A. J., Vermunt, J. D., Kinnersley, P., & Houston, H. (2007). Exploring students’ perceptions on the use of significant event analysis, as part of a portfolio assessment process in general practice, as a tool for learning how to use reflection in learning. BMC Medical Education, 7, 5. doi:10.1186/1472-6920-7-5.

  • Howie, J. G. R., Heaney, D. J., Maxwell, M., & Walker, J. J. (1998). A comparison of a Patient Enablement Instrument (PEI) against two established satisfaction scales as an outcome measure of primary care consultations. Family Practice, 15(2), 165–171.

    Article  Google Scholar 

  • Joshi, H. et al. (2007). Developing and maintaining an assessment system – a PMETB guide to good practice. January 2007. http://www.pmetb.org.uk/fileadmin/user/QA/Assessment/Assessment_system_guidance_0107.pdf (accessed 10.01.2008).

  • Lockyer, J. (2003). Multi-source feedback in the assessment of physician competencies. Journal of Continuing Education in the Health Professions, 23(1), 4–12.

    Article  Google Scholar 

  • Lough, J. R., & Murray, T. S. (2001). Audit and summative assessment: A completed audit cycle. Medical Education, 35(4), 357–363.

    Article  Google Scholar 

  • Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35, 382–385.

    Article  Google Scholar 

  • McKay, J., Bowie, P., & Lough, M. (2003). Evaluating significant event analyses: Implementing change is a measure of success. Education for Primary Care, 14, 34–38.

    Google Scholar 

  • McKay, J., Murphy, D. J., Bowie, P., Schmuck, M., Lough, M., & Eva, K. W. (2007). Development and testing of an instrument for the formative peer assessment of significant event analyses. Quality and Safety in Health Care, 16, 150–153.

    Article  Google Scholar 

  • Mercer, S. W., & Howie, J. G. R. (2006). CQI-2, a new measure of holistic, interpersonal care in primary care consultations. British Journal of General Practice, 56(525), 262–268.

    Google Scholar 

  • Mercer, S. W., McConnachie, A., Maxwell, M., Heaney, D. H., & Watt, G. C. M. (2005). Relevance and performance of the Consultation and Relational Empathy (CARE) Measure in general practice. Family Practice, 22(3), 328–334.

    Article  Google Scholar 

  • Mercer, S. W., Watt, G. C. M., Maxwell, M., & Heaney, D. H. (2004). The development and preliminary validation of the Consultation and Relational Empathy (CARE) Measure: An empathy-based consultation process measure. Family Practice, 21(6), 699–705.

    Article  Google Scholar 

  • Modernising Medical Careers (MMC). http://www.mmc.nhs.uk/pages/home (accessed 10.05.2007).

  • Multi-Source Feedback: 360° Team Assessment of Behaviour (TAB) West Midlands Deanery, UK. http://www.wmdeanery.org/Downloads/360download.asp (accessed 10.05.2007).

  • Murphy, D. J., Bruce, D. A., & Eva, K. W. (2008). Workplace-based assessment for general practitioners: Using stakeholder perception to aid blueprinting of an assessment battery. Medical Education, 42, 96–103.

    Article  Google Scholar 

  • National Office for Summative Assessment. First level assessor’s instructions and marking schedule. http://www.nosa.org.uk (accessed 17.02.2008)

  • National Office for Summative Assessment. http://www.nosa.org.uk/downloads/html/audit/marking.htm (accessed 22.05.2007).

  • Norcini, J. J., Blank, L. L., Arnold, G. K., & Kimball, H. R. (1995). The Mini-CEX (Clinical Evaluation Exercise): A preliminary investigation. Annals of Internal Medicine, 123(10), 795–799.

    Google Scholar 

  • Pitts, J., Coles, C., & Thomas, P. (1999). Educational portfolios in the assessment of general practice trainers: Reliability of assessors. Medical Education, 33, 515–520.

    Article  Google Scholar 

  • Ram, P., Grol, R., Rethans, J. J., Schouten, B., van der Vleuten, C., & Kester, A. (1999). Assessment of general practitioners by video observation of communicative and medical performance in daily practice: Issues of validity, reliability, and feasibility. Medical Education, 33(6), 447–454.

    Article  Google Scholar 

  • Ramsay, P. G., Weinrich, M. D., Carline, J. D., Innui, T. S., Larson, E. B., & LoGerfo, J. P. (1993). Use of peer ratings to evaluate physician performance. The Journal of the American Medical Association, 269, 1655–1660.

    Article  Google Scholar 

  • Ramsey, P., & Wenrich, M. (1999). Peer ratings: An assessment tool whose time has come. Journal of General Internal medicine, 14, 581–582.

    Article  Google Scholar 

  • RCGP: Video assessment of consulting skills in 2008; Workbook and instructions http://www.rcgp.org.uk/the_gp_journey/mrcgp/video_workbook.aspx (accessed 13.01.2008).

  • Referral Advice. (2001). A guide to appropriate referral from general to specialist services. London: National Institute for Clinical Evidence (NICE).

  • Reznick, R., Smee, S., Rothman, A., Chalmers, A., Swanson, D., Dufresne, L., Lacombe, G., Baumber, J., Poldre, P., Lavasseur, L., et al. (1992). An objective structured clinical examination for the licentiate: Report of the pilot project of the Medical Council of Canada. Academic Medicine, 67, 487–494.

    Article  Google Scholar 

  • Roberts, C. (2002). Portfolio-based assessments in medical education: Are they valid and reliable for summative purposes? Medical Education, 36, 899–900.

    Article  Google Scholar 

  • Sargeant, J., Mann, K., & Ferrier, S. (2005). Making available a mentoring service to support physician feedback, reflection, learning and change, can increase acceptance and use of feedback. Medical Education, 39, 497–504.

    Article  Google Scholar 

  • Sargeant, J., Mann, K., Sinclair, D., van der Vleuten, C., & Metsemakers, J. (2007). Challenges in multi-source feedback: Intended and unintended outcomes. Medical Education, 41, 583–591.

    Article  Google Scholar 

  • Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2006). Challenges for educationalists. British Medical Journal, 333, 544–546.

    Article  Google Scholar 

  • Scottish Intercollegiate Guideline Network (SIGN). (1998). Report on a recommended referral document. Edinburgh: SIGN.

  • Scottish Revalidation Toolkit, RCGP Scotland. http://www.rcgp.org.uk/pdf/Complete%20Revalidation%20Toolkit%20(Read%20Only)%20PDF.pdf (accessed 10.05.2007).

  • Streiner, D. L., & Norman, G. R. (2003). Health measurement scales (3rd ed.). Oxford: Oxford Medical Publications.

    Google Scholar 

  • Swanson, D., Norman, G. R., & Linn, R. I. (1995). Performance based assessment: Lessons from the health professions. Educational Researcher, 24, 5–12.

    Google Scholar 

  • Tate, P., Foulkes, J., Neighbour, R., Campion, P., & Field, S. (1999). Assessing physicians’ interpersonal skills via videotaped encounters: A new approach for the Royal College of general practitioners membership examination. Journal of Health Communication, 4, 143–152.

    Article  Google Scholar 

  • van der Vleuten, C. P. M. (1996). The assessment of professional competence: Developments, research and practical implications. Advances in Health Sciences Education, 1, 41–67.

    Article  Google Scholar 

  • Verhulst, S. J., Colliver, J. A., Paiva, R. E., & Williams, R. G. (1986). A factor analysis study of first-year residents. Journal of Medical Education, 61, 132–134.

    Google Scholar 

  • wbapilot@chs.dundee.ac.uk available http://www.dundee.ac.uk/gptraining

  • Williams, R. G., Verhulst, S., Colliver, J. A., & Dunnington, G. L. (2005). Assessing the reliability of resident performance appraisals: More items or more observations? Surgery, 137, 141–147.

    Article  Google Scholar 

Download references

Acknowledgements

The completion of the pilot was made possible thanks to the help and enthusiasm 171 GP registrars and staff from the Wales, Northern Ireland, Mersey, KSS, East Scotland, North and North East Scotland, South East Scotland and West Midlands Deaneries.

The authors would like to thank Mrs. Angela Inglis (Team Leader and Personal Assistant to Dr. David Bruce, GP Director in the East of Scotland Deanery) and her team, (Lee-Ann Troup, Linda Kirkcaldy, Susan Smith, Carol Ironside and Gill Ward) for their help, support, and contribution to the work contained in this paper.

© CARE SW Mercer, Scottish Executive 2004: The CARE Measure was originally developed by Dr. Stewart Mercer and colleagues as part of a Health Services Research Fellowship funded by the Chief Scientist Office of the Scottish Executive (2000–2003). The intellectual property rights of the measure belong to the Scottish Ministers. The measure is available for use free of charge for staff of the NHS and for research purposes, but cannot be used for commercial purposes. Anyone wishing to use the measure should contact and register with Stewart Mercer (email: stewmercer@blueyonder.co.uk).

© MSF Tool—NHS Education for Scotland 2005–2006: This two question Multi-Source Feedback (MSF) was developed by Drs. Douglas Murphy, David Bruce, and Kevin Eva on behalf of NHS Education Scotland (2005–2006). The measure is available for use free of charge for staff of the NHS and for research purposes, but cannot be used for commercial purposes. Anyone wishing to use the measure should contact and register with Douglas Murphy douglas.murphy@hotmail.co.uk or David Bruce david.bruce@nes.scot.nhs.uk.

Ethical approval: Formal application and submission of the research proposal was made and ethical approval granted for all of the work contained in this paper by NHS Ethics Committee (Glasgow West).

Conflict of interest and source of funding statement

NHS Education Scotland and The Royal College of General Practitioners (RCGP) funded this study. DM was and DB is employed by NHS Education Scotland. DM and SWM are supported by a Primary Care Research Career Award Chief Scientist Office, Scottish Executive Health Department. The RCGP had no role in study design, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data and had final responsibility for the decision to submit for publication. Contributors D. Murphy and K. Eva designed the studies. Data collection was done by D. Murphy and D. Bruce. Data were analysed by D. Murphy and K. Eva. Data were interpreted by D. Murphy, D. Bruce, S. Mercer and K. Eva. The manuscript was written by D. Murphy, D. Bruce, S. Mercer and K. Eva. All authors were involved in the decision to submit the manuscript for publication.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin W. Eva.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Murphy, D.J., Bruce, D.A., Mercer, S.W. et al. The reliability of workplace-based assessment in postgraduate medical education and training: a national evaluation in general practice in the United Kingdom. Adv in Health Sci Educ 14, 219–232 (2009). https://doi.org/10.1007/s10459-008-9104-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-008-9104-8

Keywords

Navigation