University of Tasmania, Australia

Systematic reviews for health: 1. formulate the research question.

  • Handbooks / Guidelines for Systematic Reviews
  • Standards for Reporting
  • Registering a Protocol
  • Tools for Systematic Review
  • Online Tutorials & Courses
  • Books and Articles about Systematic Reviews
  • Finding Systematic Reviews
  • Critical Appraisal
  • Library Help
  • Bibliographic Databases
  • Grey Literature
  • Handsearching
  • Citation Searching
  • 1. Formulate the Research Question
  • 2. Identify the Key Concepts
  • 3. Develop Search Terms - Free-Text
  • 4. Develop Search Terms - Controlled Vocabulary
  • 5. Search Fields
  • 6. Phrase Searching, Wildcards and Proximity Operators
  • 7. Boolean Operators
  • 8. Search Limits
  • 9. Pilot Search Strategy & Monitor Its Development
  • 10. Final Search Strategy
  • 11. Adapt Search Syntax
  • Documenting Search Strategies
  • Handling Results & Storing Papers

how to formulate a research question systematic review

Step 1. Formulate the Research Question

A systematic review is based on a pre-defined specific research question ( Cochrane Handbook, 1.1 ). The first step in a systematic review is to determine its focus - you should clearly frame the question(s) the review seeks to answer  ( Cochrane Handbook, 2.1 ). It may take you a while to develop a good review question - it is an important step in your review.  Well-formulated questions will guide many aspects of the review process, including determining eligibility criteria, searching for studies, collecting data from included studies, and presenting findings ( Cochrane Handbook, 2.1 ).

The research question should be clear and focused - not too vague, too specific or too broad.

You may like to consider some of the techniques mentioned below to help you with this process. They can be useful but are not necessary for a good search strategy.

PICO - to search for quantitative review questions

P I C O

if appropriate
Most important characteristics of patient (e.g. age, disease/condition, gender) Main intervention (e.g. drug treatment, diagnostic/screening test) Main alternative (e.g. placebo, standard therapy, no treatment, gold standard) What you are trying to accomplish, measure, improve, affect (e.g. reduced mortality or morbidity, improved memory)

Richardson, WS, Wilson, MC, Nishikawa, J & Hayward, RS 1995, 'The well-built clinical question: A key to evidence-based decisions', ACP Journal Club , vol. 123, no. 3, pp. A12-A12 .

We do not have access to this article at UTAS.

A variant of PICO is PICOS . S stands for Study designs . It establishes which study designs are appropriate for answering the question, e.g. randomised controlled trial (RCT). There is also PICO C (C for context) and PICO T (T for timeframe).

You may find this document on PICO / PIO / PEO useful:

  • Framing a PICO / PIO / PEO question Developed by Teesside University

SPIDER - to search for qualitative and mixed methods research studies

S PI D E R
Sample Phenomenon of Interest Design Evaluation Research type

Cooke, A, Smith, D & Booth, A 2012, 'Beyond pico the spider tool for qualitative evidence synthesis', Qualitative Health Research , vol. 22, no. 10, pp. 1435-1443.

This article is only accessible for UTAS staff and students.

SPICE - to search for qualitative evidence

S P I C E
Setting (where?) Perspecitve (for whom?) Intervention (what?) Comparison (compared with what?) Evaluation (with what result?)

Cleyle, S & Booth, A 2006, 'Clear and present questions: Formulating questions for evidence based practice', Library hi tech , vol. 24, no. 3, pp. 355-368.

ECLIPSE - to search for health policy/management information

E C L I P Se
Expectation (improvement or information or innovation) Client group (at whom the service is aimed) Location (where is the service located?) Impact (outcomes) Professionals (who is involved in providing/improving the service) Service (for which service are you looking for information)

Wildridge, V & Bell, L 2002, 'How clip became eclipse: A mnemonic to assist in searching for health policy/management information', Health Information & Libraries Journal , vol. 19, no. 2, pp. 113-115.

There are many more techniques available. See the below guide from the CQUniversity Library for an extensive list:

  • Question frameworks overview from Framing your research question guide, developed by CQUniversity Library

This is the specific research question used in the example:

"Is animal-assisted therapy more effective than music therapy in managing aggressive behaviour in elderly people with dementia?"

Within this question are the four PICO concepts :

P elderly patients with dementia
I animal-assisted therapy
C music therapy
O aggressive behaviour

S - Study design

This is a therapy question. The best study design to answer a therapy question is a randomised controlled trial (RCT). You may decide to only include studies in the systematic review that were using a RCT, see  Step 8 .

See source of example

Need More Help? Book a consultation with a  Learning and Research Librarian  or contact  [email protected] .

  • << Previous: Building Search Strategies
  • Next: 2. Identify the Key Concepts >>
  • Last Updated: Jun 7, 2024 2:00 PM
  • URL: https://utas.libguides.com/SystematicReviews

Australian Aboriginal Flag

X

Library Services

UCL LIBRARY SERVICES

  • Guides and databases
  • Library skills
  • Systematic reviews

Formulating a research question

  • What are systematic reviews?
  • Types of systematic reviews
  • Identifying studies
  • Searching databases
  • Describing and appraising studies
  • Synthesis and systematic maps
  • Software for systematic reviews
  • Online training and support
  • Live and face to face training
  • Individual support
  • Further help

Searching for information

Clarifying the review question leads to specifying what type of studies can best address that question and setting out criteria for including such studies in the review. This is often called inclusion criteria or eligibility criteria. The criteria could relate to the review topic, the research methods of the studies, specific populations, settings, date limits, geographical areas, types of interventions, or something else.

Systematic reviews address clear and answerable research questions, rather than a general topic or problem of interest. They also have clear criteria about the studies that are being used to address the research questions. This is often called inclusion criteria or eligibility criteria.

Six examples of types of question are listed below, and the examples show different questions that a review might address based on the topic of influenza vaccination. Structuring questions in this way aids thinking about the different types of research that could address each type of question. Mneumonics can help in thinking about criteria that research must fulfil to address the question. The criteria could relate to the context, research methods of the studies, specific populations, settings, date limits, geographical areas, types of interventions, or something else.

Examples of review questions

  • Needs - What do people want? Example: What are the information needs of healthcare workers regarding vaccination for seasonal influenza?
  • Impact or effectiveness - What is the balance of benefit and harm of a given intervention? Example: What is the effectiveness of strategies to increase vaccination coverage among healthcare workers. What is the cost effectiveness of interventions that increase immunisation coverage?
  • Process or explanation - Why does it work (or not work)? How does it work (or not work)?  Example: What factors are associated with uptake of vaccinations by healthcare workers?  What factors are associated with inequities in vaccination among healthcare workers?
  • Correlation - What relationships are seen between phenomena? Example: How does influenza vaccination of healthcare workers vary with morbidity and mortality among patients? (Note: correlation does not in itself indicate causation).
  • Views / perspectives - What are people's experiences? Example: What are the views and experiences of healthcare workers regarding vaccination for seasonal influenza?
  • Service implementation - What is happening? Example: What is known about the implementation and context of interventions to promote vaccination for seasonal influenza among healthcare workers?

Examples in practice :  Seasonal influenza vaccination of health care workers: evidence synthesis / Loreno et al. 2017

Example of eligibility criteria

Research question: What are the views and experiences of UK healthcare workers regarding vaccination for seasonal influenza?

  • Population: healthcare workers, any type, including those without direct contact with patients.
  • Context: seasonal influenza vaccination for healthcare workers.
  • Study design: qualitative data including interviews, focus groups, ethnographic data.
  • Date of publication: all.
  • Country: all UK regions.
  • Studies focused on influenza vaccination for general population and pandemic influenza vaccination.
  • Studies using survey data with only closed questions, studies that only report quantitative data.

Consider the research boundaries

It is important to consider the reasons that the research question is being asked. Any research question has ideological and theoretical assumptions around the meanings and processes it is focused on. A systematic review should either specify definitions and boundaries around these elements at the outset, or be clear about which elements are undefined. 

For example if we are interested in the topic of homework, there are likely to be pre-conceived ideas about what is meant by 'homework'. If we want to know the impact of homework on educational attainment, we need to set boundaries on the age range of children, or how educational attainment is measured. There may also be a particular setting or contexts: type of school, country, gender, the timeframe of the literature, or the study designs of the research.

Research question: What is the impact of homework on children's educational attainment?

  • Scope : Homework - Tasks set by school teachers for students to complete out of school time, in any format or setting.
  • Population: children aged 5-11 years.
  • Outcomes: measures of literacy or numeracy from tests administered by researchers, school or other authorities.
  • Study design: Studies with a comparison control group.
  • Context: OECD countries, all settings within mainstream education.
  • Date Limit: 2007 onwards.
  • Any context not in mainstream primary schools.
  • Non-English language studies.

Mnemonics for structuring questions

Some mnemonics that sometimes help to formulate research questions, set the boundaries of question and inform a search strategy.

Intervention effects

PICO  Population – Intervention– Outcome– Comparison

Variations: add T on for time, or ‘C’ for context, or S’ for study type,

Policy and management issues

ECLIPSE : Expectation – Client group – Location – Impact ‐ Professionals involved – Service

Expectation encourages  reflection on what the information is needed for i.e. improvement, innovation or information.  Impact looks at what  you would like to achieve e.g. improve team communication .

  • How CLIP became ECLIPSE: a mnemonic to assist in searching for health policy/management information / Wildridge & Bell, 2002

Analysis tool for management and organisational strategy

PESTLE:  Political – Economic – Social – Technological – Environmental ‐ Legal

An analysis tool that can be used by organizations for identifying external factors which may influence their strategic development, marketing strategies, new technologies or organisational change.

  • PESTLE analysis / CIPD, 2010

Service evaluations with qualitative study designs

SPICE:  Setting (context) – Perspective– Intervention – Comparison – Evaluation

Perspective relates to users or potential users. Evaluation is how you plan to measure the success of the intervention.

  • Clear and present questions: formulating questions for evidence based practice / Booth, 2006

Read more about some of the frameworks for constructing review questions:

  • Formulating the Evidence Based Practice Question: A Review of the Frameworks / Davis, 2011
  • << Previous: Stages in a systematic review
  • Next: Identifying studies >>
  • Last Updated: May 30, 2024 4:38 PM
  • URL: https://library-guides.ucl.ac.uk/systematic-reviews

University of Texas

  • University of Texas Libraries
  • UT Libraries

Systematic Reviews & Evidence Synthesis Methods

  • Formulate Question
  • Types of Reviews
  • Find Existing Reviews & Protocols
  • Register a Protocol
  • Searching Systematically
  • Supplementary Searching
  • Managing Results
  • Deduplication
  • Critical Appraisal
  • Glossary of terms
  • Librarian Support
  • Video tutorials This link opens in a new window
  • Systematic Review & Evidence Synthesis Boot Camp

Formulate your Research Question

Formulating a strong research question for a systematic review can be a lengthy process. While you may have an idea about the topic you want to explore, your specific research question is what will drive your review and requires some consideration. 

You will want to conduct preliminary or  exploratory searches  of the literature as you refine your question. In these searches you will want to:

  • Determine if a systematic review has already been conducted on your topic and if so, how yours might be different, or how you might shift or narrow your anticipated focus.
  • Scope the literature to determine if there is enough literature on your topic to conduct a systematic review.
  • Identify key concepts and terminology.
  • Identify seminal or landmark studies.
  • Identify key studies that you can test your search strategy against (more on that later).
  • Begin to identify databases that might be useful to your search question.

Types of Research Questions for Systematic Reviews

A narrow and specific research question is required in order to conduct a systematic review. The goal of a systematic review is to provide an evidence synthesis of ALL research performed on one particular topic. Your research question should be clearly answerable from the studies included in your review. 

Another consideration is whether the question has been answered enough to warrant a systematic review. If there have been very few studies, there won't be enough qualitative and/or quantitative data to synthesize. You then have to adjust your question... widen the population, broaden the topic, reconsider your inclusion and exclusion criteria, etc.

When developing your question, it can be helpful to consider the FINER criteria (Feasible, Interesting, Novel, Ethics, and Relevant). Read more about the FINER criteria on the Elsevier blog .

If you have a broader question or aren't certain that your question has been answered enough in the literature, you may be better served by pursuing a systematic map, also know as a scoping review . Scoping reviews are conducted to give a broad overview of a topic, to review the scope and themes of the prior research, and to identify the gaps and areas for future research.

What is the effectiveness of talk therapy in treating ADHD in children? Systematic Review
What treatments are available for treating children with ADHD? Systematic Map/Scoping Review

Are animal-assisted therapies as effective as traditional cognitive behavioral therapies in treating people with depressive disorders?

Systematic Review
  • CEE Example Questions Collaboration for Environmental Evidence Guidelines contains Table 2.2 outlining answers sought and example questions in environmental management. 

Learn More . . .

Cochrane Handbook Chapter 2  - Determining the scope of the review and the questions it will address

Frameworks for Developing your Research Question

PICO : P atient/ P opulation, I ntervention, C omparison, O utcome.

PEO: P opulation, E xposure, O utcomes

SPIDER : S ample, P henomenon of I nterest, D esign, E valuation, R esearch Type

For more frameworks and guidance on developing the research question, check out:

1. Advanced Literature Search and Systematic Reviews: Selecting a Framework. City University of London Library

2. Select the Appropriate Framework for your Question. Tab "1-1" from PIECES: A guide to developing, conducting, & reporting reviews [Excel workbook ]. Margaret J. Foster, Texas A&M University.  CC-BY-3.0 license .

3. Formulating a Research Question.  University College London Library. Systematic Reviews .

4. Question Guidance.  UC Merced Library. Systematic Reviews

 Sort Link Group  

 Add / Reorder  

Video - Formulating a Research Question (4:43 minutes)

  • Last Updated: Jul 18, 2024 6:32 AM
  • URL: https://guides.lib.utexas.edu/systematicreviews

Creative Commons License

Systematic and systematic-like review toolkit: Step 1: Formulating the research question

Systematic and systematic-like review toolkit.

  • Systematic and systematic-like reviews overview

Step 1: Formulating the research question

  • Step 2: Developing the search
  • Step 3: Screening and selection of articles
  • Step 4: Appraisal of articles
  • Step 5: Writing and publishing
  • Filters and complex search examples
  • Evidence synthesis support services

Tip: Look for these icons for guidance on which technique is required

Systematic Review

Email your Librarians

The first stage in a review is formulating the research question. The research question accurately and succinctly sums up the review's line of inquiry. This page outlines approaches to developing a research question that can be used as the basis for a review.

Research question frameworks

It can be useful to use a framework to aid in the development of a research question. Frameworks can help you identify searchable parts of a question and focus your search on relevant results

A technique often used in research for formulating a clinical research question is the PICO model. Slightly different versions of this concept are used to search for quantitative and qualitative reviews.

The PICO/ PECO   framework is an adaptable approach to help you focus your research question and guide you in developing search terms. The framework prompts you to consider your question in terms of these four elements:

P : P atient/ P opulation/ P roblem

I/E : I ntervention/ I ndicator/ E xposure/ E vent

C : C omparison/ C ontrol

O : O utcome

For more detail, there are also the PICOT and PICOS additions:

PICO T - adds T ime  

PICO S - adds S tudy design

PICO example

Consider this scenario:

Current guidelines indicate that nicotine replacement therapies (NRTs) should not be used as an intervention in young smokers.  Counselling is generally the recommended best practice for young smokers, however youth who are at high risk for smoking often live in regional or remote communities with limited access to counselling services.  You have been funded to review the evidence for the effectiveness of NRTs for smoking cessation in Australian youths to update the guidelines.

The research question stemming from this scenario could be phrased in this way:

In (P) adolescent smokers , how does (I) nicotine replacement therapy compared with (C) counselling affect (O) smoking cessation rates ?

PICO element Definition Scenario
P (patient/population/problem) Describe your patient, population, or problem adolescent smokers
I (intervention/indicator Describe your intervention or indicator Nicotine Replacement Therapy (NRT)
C (comparison/control) What is your comparison or control? counselling
O (outcome) What outcome are you looking for? smoking cessation / risk of continued nicotine dependency

Alternative frameworks

PICO is one of the most frequently used frameworks, but there are several other frameworks available to use, depending on your question.

Question type

  • Qualitative; Aetiology or risk
  • Services, policy, social care
  • Prevalence & prognosis; Economics

Structuring qualitative questions?

Try PIC or SPIDER :

  • P opulation, Phenomena of I nterest, C ontext
  • S ample, P henomenon of I nterest, D esign, E valuation, R esearch type   

Cooke, A., Smith, D., & Booth, A. (2012). Beyond PICO: the SPIDER tool for qualitative evidence synthesis . Qualitative health research, 22(10), 1435-1443.

Question about aetiology or risk? 

  • P opulation, E xposure, O utcomes

Moola, Sandeep; Munn, Zachary; Sears, Kim; Sfetcu, Ralucac; Currie, Marian; Lisy, Karolina; Tufanaru, Catalin; Qureshi, Rubab; Mattis, Patrick; Mu, Peifanf. Conducting systematic reviews of association (etiology) , International Journal of Evidence-Based Healthcare: September 2015 - Volume 13 - Issue 3 - p 163-169.

Evaluating an intervention, policy or service? 

Try SPICE :

  • S etting, P opulation or P erspective, I ntervention, C omparison, E valuation

Booth, A. (2006), " Clear and present questions: formulating questions for evidence based practice ", Library Hi Tech, Vol. 24 No. 3, pp. 355-368. https://doi-org.ezproxy-b.deakin.edu.au/10.1108/07378830610692127

Investigating the outcome of a service or policy? 

Try ECLIPSE :

  • E xpectation, C lient group, L ocation, I mpact, P rofessionals, SE rvice  

Wildridge, V., & Bell, L. (2002). How CLIP became ECLIPSE: a mnemonic to assist in searching for health policy/management information . Health Information & Libraries Journal, 19(2), 113-115.

Working out prevalence or incidence? 

Try CoCoPop :

  • Co ndition, Co ntext, Pop ulation

Munn, Z., Moola, S., Lisy, K., Riitano, D., & Tufanaru, C. (2015). Methodological guidance for systematic reviews of observational epidemiological studies reporting prevalence and cumulative incidence data . International journal of evidence-based healthcare, 13(3), 147-153.

Determining prognosis?

  • P opulation, Prognostic F actors, O utcome

Conducting an economic evaluation? 

Try PICOC :

  • P opulation, I ntervention, C omparator/s, O utomes, Context

Petticrew, M., & Roberts, H. (2006). Systematic reviews in the social sciences: a practical guide . Blackwell Pub.

how to formulate a research question systematic review

JBI recommends the PCC (Population (or Participants), Concept, and Context) search framework to develop the research question of a scoping review. In some instances, just the concept and context are used in the search.

The University of Notre Dame Australia provides information on some different frameworks available to help structure the research question.

Further Readings

Booth A, Noyes J, Flemming K, et al, Formulating questions to explore complex interventions within qualitative evidence synthesis . BMJ Global Health 2019;4:e001107. This paper explores the importance of focused, relevant questions in qualitative evidence syntheses to address complexity and context in interventions.

Kim, K. W., Lee, J., Choi, S. H., Huh, J., & Park, S. H. (2015). Systematic review and meta-analysis of studies evaluating diagnostic test accuracy: a practical review for clinical researchers-part I. General guidance and tips . Korean journal of radiology, 16(6), 1175-1187. As the use of systematic reviews and meta-analyses is increasing in the field of diagnostic test accuracy (DTA), this first of a two-part article provides a practical guide on how to conduct, report, and critically appraise studies of DTA. 

Methley, A. M., Campbell, S., Chew-Graham, C., McNally, R., & Cheraghi-Sohi, S. (2014). PICO, PICOS and SPIDER: A comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews . BMC Health Services Research, 14(1), 579. In this article the ‘SPIDER’ search framework, developed for more effective searching of qualitative research, was evaluated against PICO and PICOD. 

Munn, Z., Stern, C., Aromataris, E., Lockwood, C., & Jordan, Z. (2018). What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences . BMC medical research methodology, 18(1), 5. https://doi.org/10.1186/s12874-017-0468-4 This article aligns review types to question development frameworks.

Search for existing reviews

Before you start searching, find out whether any systematic reviews have been conducted recently on your topic. This is because similar systematic reviews could help with identifying your search terms, and information on your topic. It is also helpful to know if there is already a systematic review on your topic as it may mean you need to change your question.  

Cochrane Library and Joanna Briggs Institute publish systematic reviews. You can also search for the term "systematic review" in any of the subject databases. You can also search PROSPERO , an international register of systematic reviews, to see if there are any related reviews underway but not yet published; there are additional review registers detailed below.  

Watch this video to find out how to search for published systematic reviews

Protocols and Guidelines for reviews

It is recommended that authors consult relevant guidelines and create a protocol for their review.  

Protocols provide a clear plan for how the review will be conducted, including what will and will not be included in the final review. Protocols are widely recommended for any systematic review and are increasingly a requirement for publication of a completed systematic review.

Guidelines provide specific information on how to perform a review in your field of study. A completed review may be evaluated against the relevant guidelines by peer reviewers or readers, so it makes sense to follow the guidelines as best you can.

Click the headings below to learn more about the importance of protocols and guidelines.

how to formulate a research question systematic review

Your protocol (or plan for conducting your review) should include the rationale, objectives, hypothesis, and planned methods used in searching, screening and analysing identified studies used in the review. The rationale should clearly state what will be included and excluded from the review. The aim is to minimise any bias by having pre-defined eligibility criteria.

Base the protocol on the relevant guidelines for the review that you are conducting.  PRISMA-P was developed for reporting and development of protocols for systematic reviews. Their Explanation and Elaboration paper includes examples of what to write in your protocol. York's CRD has also created a document on how to submit a protocol to PROSPERO .

There are several registers of protocols, often associated with the organisation publishing the review. Cochrane and Joanna Briggs Institute both have their own protocol registries, and PROSPERO is a wide-reaching registry covering protocols for Cochrane, non-Cochrane and non-JBI reviews on a range of health, social care, education, justice, and international development topics.

Before beginning your protocol, search within protocol registries such as those listed above, or Open Science Framework or Research Registry , or journals such as Systematic Reviews and BMJ Open . This is a useful step to see if a protocol has already been submitted on your review topic and to find examples of protocols in similar areas of research.    

While a protocol will contain details of the intended search strategy, a protocol should be registered before the search strategy is finalised and run, so that you can show that your intention for the review has remained true and to limit duplication of in progress reviews.  

A protocol should typically address points that define the kind of studies to be included and the kind of data required to ensure the systematic review is focused on the appropriate studies for the topic. Some points to think about are:

  • What study types are you looking for? For example, randomised controlled trials, cohort studies, qualitative studies
  • What sample size is acceptable in each study (power of the study)? 
  • What population are you focusing on? Consider age ranges, gender, disease severity, geography of patients.
  • What type of intervention are you focusing on?
  • What outcomes are of importance to the review, including how those outcomes are measured?
  • What context should you be looking for in a study? A lab, acute care, school, community...
  • How will you appraise the studies? What methodology will you use?
  • Does the study differentiate between the target population and other groups in the data? How will you handle it if it does not?
  • Is the data available to access if the article does not specify the details you need? If not, what will you do?
  • What languages are you able to review? Do you have funding to translate articles from languages other than English?  

Further reading

PLoS Medicine Editors. (2011). Best practice in systematic reviews: the importance of protocols and registration . PLoS medicine, 8(2), e1001009.

Systematic Review guidelines

The Cochrane handbook of systematic reviews of interventions is a world-renowned resource for information on designing systematic reviews of intervention.  

Many other guidelines have been developed from these extensive guidelines.

General systematic reviews

  • The  PRISMA Statement  includes the well-used Checklist and Flow Diagram.
  • Systematic Reviews: CRD's guidance on undertaking reviews in health care . One of the founding institutions that developed systematic review procedure. CRD's guide gives detailed clearly written explanations for different fields in Health.
  • National Academies Press (US); 2011. 3, Standards for Finding and Assessing Individual Studies. Provides guidance on searching, screening, data collection, and appraisal of individual studies for a systematic review.

Meta-analyses

  • An alternative to PRISMA is the Meta‐analysis Of Observational Studies in Epidemiology (MOOSE) for observational studies. It is a 35‐item checklist. It pays more attention to certain aspects of the search strategy, in particular the inclusion of unpublished and non‐English‐language studies.

Surgical systematic reviews

  • Systematic reviews in surgery-recommendations from the Study Center of the German Society of Surgery . Provides recommendations for systematic reviews in surgery with or without meta-analysis, for each step of the process with specific recommendations important to surgical reviews.

Nursing/Allied Health systematic reviews

Joanna Briggs Institute Manual for Evidence Synthesis  a comprehensive guide to conducting JBI systematic and similar reviews

Nutrition systematic reviews

  • Academy of Nutrition and Dietetics Evidence Analysis Manual  is designed to guide expert workgroup members and evidence analysts to understand and carry out the process of conducting a systematic review.

Occupational therapy

  • American Occupational Therapy Association: Guidelines for Systematic reviews . The American Journal of Occupational Therapy (AJOT) provides guidance for authors conducting systematic reviews.

Education/Law/ Sociology systematic reviews

  • Campbell Collaboration, Cochrane's sister organisation provides guidelines for systematic reviews in the social sciences:  MECIR
  • Systematic Reviews in Educational Research: Methodology, Perspectives and Application

Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy

COSMIN Guideline for Systematic Reviews of Outcome Measurement Instruments – This was developed for patient reported outcomes (PROMs) but has since been adapted for use with other types of outcome measurements in systematic reviews.

Prinsen, C.A.C., Mokkink, L.B., Bouter, L.M. et al. COSMIN guideline for systematic reviews of patient-reported outcome measures . Qual Life Res 27, 1147–1157 (2018). https://doi.org/10.1007/s11136-018-1798-3

HuGENet™ Handbook of systematic reviews – particularly useful for describing population-based data and human genetic variants.

AHRQ: Methods Guide for Effectiveness and Comparative Effectiveness Reviews - from the US Department of Health and Human Services, guidelines on conducting systematic reviews of existing research on the effectiveness, comparative effectiveness, and harms of different health care interventions.

Mariano, D. C., Leite, C., Santos, L. H., Rocha, R. E., & de Melo-Minardi, R. C. (2017). A guide to performing systematic literature reviews in bioinformatics . arXiv preprint arXiv:1707.05813.

Integrative Review guidelines

how to formulate a research question systematic review

Integrative reviews may incorporate experimental and non-experimental data, as well as theoretical information.  They differ from systematic reviews in the diversity of the study methodologies included.

Guidelines:

  • Whittemore, R. and Knafl, K. (2005), The integrative review: updated methodology. Journal of Advanced Nursing, 52: 546–553. doi:10.1111/j.1365-2648.2005.03621.x
  • A step-by-step guide to conducting an Integrative Review (2020), edited by C.E. Toronto & Ruth Remington, Springer Books

Rapid Review guidelines

how to formulate a research question systematic review

Rapid reviews differ from systematic reviews in the shorter timeframe taken and reduced comprehensiveness of the search.

Cochrane has a methods group to inform the conduct of rapid reviews with a bibliography of relevant publications .

A modified approach to systematic review guidelines can be used for rapid reviews, but guidelines are beginning to appear:

Crawford C, Boyd C, Jain S, Khorsan R and Jonas W (2015), Rapid Evidence Assessment of the Literature (REAL©): streamlining the systematic review process and creating utility for evidence-based health care . BMC Res Notes 8:631 DOI 10.1186/s13104-015-1604-z

Philip Moons, Eva Goossens, David R. Thompson, Rapid reviews: the pros and cons of an accelerated review process , European Journal of Cardiovascular Nursing, Volume 20, Issue 5, June 2021, Pages 515–519, https://doi.org/10.1093/eurjcn/zvab041

Rapid Review Guidebook: Steps for conducting a rapid review National Collaborating Centre for Methods and Tools (McMaster University and Public Health Agency Canada) 2017

Tricco AC, Langlois EV, Straus SE, editors (2017) Rapid reviews to strengthen health policy and systems: a practical guide (World Health Organization). This guide is particularly aimed towards developing rapid reviews to inform health policy. 

Scoping Review guidelines

how to formulate a research question systematic review

Scoping reviews can be used to map an area, or to determine the need for a subsequent systematic review. Scoping reviews tend to have a broader focus than many other types of reviews, however, still require a focused question.

  • Peters MDJ, Godfrey C, McInerney P, Munn Z, Tricco AC, Khalil, H. Chapter 11: Scoping Reviews (2020 version). In: Aromataris E, Munn Z (Editors). Joanna Briggs Institute Reviewer's Manual, JBI, 2020. 
  • Statement / Explanatory paper

Scoping reviews: what they are and how you can do them - Series of Cochrane Training videos presented by Dr. Andrea C. Tricco and Kafayat Oboirien

Martin, G. P., Jenkins, D. A., Bull, L., Sisk, R., Lin, L., Hulme, W., ... & Group, P. H. A. (2020). Toward a framework for the design, implementation, and reporting of methodology scoping reviews . Journal of Clinical Epidemiology, 127, 191-197.

Khalil, H., McInerney, P., Pollock, D., Alexander, L., Munn, Z., Tricco, A. C., ... & Peters, M. D. (2021). Practical guide to undertaking scoping reviews for pharmacy clinicians, researchers and policymakers . Journal of clinical pharmacy and therapeutics.

Colquhoun, H (2016) Current best practices for the conduct of scoping reviews (presentation)

Arksey H & O'Malley L (2005) Scoping studies: towards a methodological framework , International Journal of Social Research Methodology, 8:1, 19-32, DOI: 10.1080/1364557032000119616

Umbrella reviews

  • Pollock M, Fernandes RM, Becker LA, Pieper D, Hartling L. Chapter V: Overviews of Reviews . In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.2 (updated February 2021). Cochrane, 2021. Available from www.training.cochrane.org/handbook .  
  • Aromataris E, Fernandez R, Godfrey C, Holly C, Khalil H, Tungpunkom P. Chapter 10: Umbrella Reviews . In: Aromataris E, Munn Z (Editors). JBI Manual for Evidence Synthesis. JBI, 2020. Available from https://jbi-global-wiki.refined.site/space/MANUAL/4687363 .
  • Aromataris, Edoardo; Fernandez, Ritin; Godfrey, Christina M.; Holly, Cheryl; Khalil, Hanan; Tungpunkom, Patraporn. Summarizing systematic reviews: methodological development, conduct and reporting of an umbrella review approach , International Journal of Evidence-Based Healthcare: September 2015 - Volume 13 - Issue 3 - p 132-140.

Meta-syntheses

Noyes, J., Booth, A., Cargo, M., Flemming, K., Garside, R., Hannes, K., ... & Thomas, J. (2018). Cochrane Qualitative and Implementation Methods Group guidance series—paper 1: introduction . Journal of clinical epidemiology, 97, 35-38.

Harris, J. L., Booth, A., Cargo, M., Hannes, K., Harden, A., Flemming, K., ... & Noyes, J. (2018). Cochrane Qualitative and Implementation Methods Group guidance series—paper 2: methods for question formulation, searching, and protocol development for qualitative evidence synthesis . Journal of clinical epidemiology, 97, 39-48.

Noyes, J., Booth, A., Flemming, K., Garside, R., Harden, A., Lewin, S., ... & Thomas, J. (2018). Cochrane Qualitative and Implementation Methods Group guidance series—paper 3: methods for assessing methodological limitations, data extraction and synthesis, and confidence in synthesized qualitative findings . Journal of clinical epidemiology, 97, 49-58.

Cargo, M., Harris, J., Pantoja, T., Booth, A., Harden, A., Hannes, K., ... & Noyes, J. (2018). Cochrane Qualitative and Implementation Methods Group guidance series—paper 4: methods for assessing evidence on intervention implementation . Journal of clinical epidemiology, 97, 59-69.

Harden, A., Thomas, J., Cargo, M., Harris, J., Pantoja, T., Flemming, K., ... & Noyes, J. (2018). Cochrane Qualitative and Implementation Methods Group guidance series—paper 5: methods for integrating qualitative and implementation evidence within intervention effectiveness reviews . Journal of clinical epidemiology, 97, 70-78.

Flemming, K., Booth, A., Hannes, K., Cargo, M., & Noyes, J. (2018). Cochrane Qualitative and Implementation Methods Group guidance series—Paper 6: Reporting guidelines for qualitative, implementation, and process evaluation evidence syntheses . Journal of Clinical Epidemiology, 97, 79-85.

Walsh, D. and Downe, S. (2005), Meta-synthesis method for qualitative research: a literature review . Journal of Advanced Nursing, 50: 204–211. doi:10.1111/j.1365-2648.2005.03380.x

Living reviews

  • Akl, E.A., Meerpohl, J.J., Elliott, J., Kahale, L.A., Schünemann, H.J., Agoritsas, T., Hilton, J., Perron, C., Akl, E., Hodder, R. and Pestridge, C., 2017. Living systematic reviews: 4. Living guideline recommendations . Journal of clinical epidemiology, 91, pp.47-53.

Qualitative systematic reviews

  • Dixon-Woods, M., Bonas, S., Booth, A., Jones, D. R., Miller, T., Sutton, A. J., . . . Young, B. (2006). How can systematic reviews incorporate qualitative research? A critical perspective . Qualitative Research,6(1), 27–44.
  • Thomas, J., & Harden, A. (2008). Methods for the thematic synthesis of qualitative research in systematic reviews . BMC Medical Research Methodology,8, 45–45.

Mixed methods systematic review

  • Lizarondo L, Stern C, Carrier J, Godfrey C, Rieger K, Salmond S, Apostolo J, Kirkpatrick P, Loveday H. Chapter 8: Mixed methods systematic reviews . In: Aromataris E, Munn Z (Editors). JBI Manual for Evidence Synthesis. JBI, 2020. Available from https://synthesismanual.jbi.global. https://doi.org/10.46658/JBIMES-20-09
  • Pearson, A, White, H, Bath-Hextall, F, Salmond, S, Apostolo, J, & Kirkpatrick, P 2015, ' A mixed-methods approach to systematic reviews ', International Journal of Evidence-Based Healthcare, vol. 13, no. 3, p. 121-131. Available from: 10.1097/XEB.0000000000000052
  • Dixon-Woods, M., Agarwal, S., Jones, D., Young, B., & Sutton, A. (2005). Synthesising qualitative and quantitative evidence: A review of possible methods . Journal of Health Services Research &Policy,10(1), 45–53.

Realist reviews

The RAMESES Projects - Includes information on publication, quality, and reporting standards, as well as training materials for realist reviews, meta-narrative reviews, and realist evaluation.

Rycroft-Malone, J., McCormack, B., Hutchinson, A. M., DeCorby, K., Bucknall, T. K., Kent, B., ... & Wilson, V. (2012). Realist synthesis: illustrating the method for implementation research . Implementation Science, 7(1), 1-10.

Wong, G., Westhorp, G., Manzano, A. et al. RAMESES II reporting standards for realist evaluations. BMC Med 14, 96 (2016). https://doi.org/10.1186/s12916-016-0643-1

Wong, G., Greenhalgh, T., Westhorp, G., Buckingham, J., & Pawson, R. (2013). RAMESES publication standards: realist syntheses. BMC medicine, 11, 21. https://doi.org/10.1186/1741-7015-11-21

Wong, G., Greenhalgh, T., Westhorp, G., Buckingham, J., & Pawson, R. (2013). RAMESES publication standards: realist syntheses. BMC medicine, 11(1), 1-14.  https://doi.org/10.1186/1741-7015-11-21

Social sciences

  • Chapman, K. (2021). Characteristics of systematic reviews in the social sciences . The Journal of Academic Librarianship, 47(5), 102396.
  • Crisp, B. R. (2015). Systematic reviews: A social work perspective . Australian Social Work, 68(3), 284-295.  

Further Reading

Uttley, L., Montgomery, P. The influence of the team in conducting a systematic review . Syst Rev 6, 149 (2017). https://doi.org/10.1186/s13643-017-0548-x

  • << Previous: Review Process Steps
  • Next: Step 2: Developing the search >>
  • Last Updated: Jul 22, 2024 11:44 AM
  • URL: https://deakin.libguides.com/systematicreview

library logo banner

Systematic reviews: Formulate your question

  • Introduction
  • Formulate your question
  • Write a protocol
  • Search the literature
  • Manage references
  • Select studies
  • Assess the evidence
  • Write your review
  • Further resources

Defining the question

Defining the research question and developing a protocol are the essential first steps in your systematic review.  The success of your systematic review depends on a clear and focused question, so take the time to get it right.

  • A framework may help you to identify the key concepts in your research question and to organise your search terms in one of the Library's databases.
  • Several frameworks or models exist to help researchers structure a research question and three of these are outlined on this page: PICO, SPICE and SPIDER.
  • It is advisable to conduct some scoping searches in a database to look for any reviews on your research topic and establish whether your topic is an original one .
  • Y ou will need to identify the relevant database(s) to search and your choice will depend on your topic and the research question you need to answer.
  • By scanning the titles, abstracts and references retrieved in a scoping search, you will reveal the terms used by authors to describe the concepts in your research question, including the synonyms or abbreviations that you may wish to add to a database search.
  • The Library can help you to search for existing reviews: make an appointment with your Subject Librarian to learn more.

The PICO framework

PICO may be the most well-known model framework: it has its origins in epidemiology and now is widely-used for evidence-based practice and systematic reviews.

PICO normally stands for Population (or Patient or Problem)  - Intervention - Comparator - Outcome.

Population defines the group you are studying.  It may for example be healthy adults, or adults with dementia, or children under 5 years of age with asthma.
Intervention is the type of treatment you aim to study, e.g. a medicine or a physical therapy.
Comparator is another type of treatment you aim to compare the first treatment with, or perhaps a placebo.
Outcome is the result you intend to measure, for example (increased or decreased) life expectancy, or (cessation of) pain.

open in new tab

The SPICE framework

SPICE is used mostly in social science and healthcare research.  It stands for Setting - Population (or Perspective) - Intervention - Comparator - Evaluation.  It is similar to PICO and was devised by Booth (2004).  

Setting: the location or environment relevant to your research (e.g. accident and emergency unit) 
Population (or perspective): the type of group that you are studying (e.g. older people)

Intervention: the intervention/practice/treatment that you are evaluating (e.g. initial examination of patients by allied health staff)

Comparator: an intervention with which you compare the above comparator (e.g. initial examination by medical staff) 
Evaluation: the hypothetical result you intend to evaluate e.g. lower mortality rates)

The examples in the SPICE table are based on the following research question:  Can mortality rates for older people be reduced if a greater proportion are examined initially by allied health staff in A&E? Source: Booth, A (2004) Formulating answerable questions. In Booth, A & Brice, A (Eds) Evidence Based Practice for Information Professionals: A handbook. (pp. 61-70) London: Facet Publishing.

The SPIDER framework

SPIDER was  adapted from the  PIC O framework  in order to include searches for qualitative and mixed-methods research.  SPIDER was developed by Cooke, Smith and Booth (2012).

Sample: qualitative research may have fewer participants than quantitative research and findings may not be generalised to the entire population.
Phenonemon of Interest: experiences, behaviours or decisions may be of more interest to the qualitative researcher, rather than an intervention.
Design: the research method may be an interview or a survey.
Evaluation: outcomes may include more subjective ones, e.g. attitudes.
Research type: the search can encompass qualitative and mixed-methods research, as well as quantitative research.

Source : Cooke, A., Smith, D. & Booth, A. (2012). Beyond PICO: the SPIDER tool for qualitative evidence synthesis.  Qualitative Health Research (10), 1435-1443. http://doi.org/10.1177/1049732312452938 .

More advice about formulating a research question

Module 1  in Cochrane Interactive Learning  explains the importance of the research question, some types of review question and the PICO  framework.  The Library is subscribing to  Cochrane Interactive Learning . 

Log in to Module 1:  Cochrane Interactive Learning

  • << Previous: Introduction
  • Next: Write a protocol >>
  • Last Updated: Jul 12, 2024 12:29 PM
  • URL: https://library.bath.ac.uk/systematic-reviews

Home

  • Duke NetID Login
  • 919.660.1100
  • Duke Health Badge: 24-hour access
  • Accounts & Access
  • Databases, Journals & Books
  • Request & Reserve
  • Training & Consulting
  • Request Articles & Books
  • Renew Online
  • Reserve Spaces
  • Reserve a Locker
  • Study & Meeting Rooms
  • Course Reserves
  • Pay Fines/Fees
  • Recommend a Purchase
  • Access From Off Campus
  • Building Access
  • Computers & Equipment
  • Wifi Access
  • My Accounts
  • Mobile Apps
  • Known Access Issues
  • Report an Access Issue
  • All Databases
  • Article Databases
  • Basic Sciences
  • Clinical Sciences
  • Dissertations & Theses
  • Drugs, Chemicals & Toxicology
  • Grants & Funding
  • Interprofessional Education
  • Non-Medical Databases
  • Search for E-Journals
  • Search for Print & E-Journals
  • Search for E-Books
  • Search for Print & E-Books
  • E-Book Collections
  • Biostatistics
  • Global Health
  • MBS Program
  • Medical Students
  • MMCi Program
  • Occupational Therapy
  • Path Asst Program
  • Physical Therapy
  • Researchers
  • Community Partners

Conducting Research

  • Archival & Historical Research
  • Black History at Duke Health
  • Data Analytics & Viz Software
  • Data: Find and Share
  • Evidence-Based Practice
  • NIH Public Access Policy Compliance
  • Publication Metrics
  • Qualitative Research
  • Searching Animal Alternatives

Systematic Reviews

  • Test Instruments

Using Databases

  • JCR Impact Factors
  • Web of Science

Finding & Accessing

  • COVID-19: Core Clinical Resources
  • Health Literacy
  • Health Statistics & Data
  • Library Orientation

Writing & Citing

  • Creating Links
  • Getting Published
  • Reference Mgmt
  • Scientific Writing

Meet a Librarian

  • Request a Consultation
  • Find Your Liaisons
  • Register for a Class
  • Request a Class
  • Self-Paced Learning

Search Services

  • Literature Search
  • Systematic Review
  • Animal Alternatives (IACUC)
  • Research Impact

Citation Mgmt

  • Other Software

Scholarly Communications

  • About Scholarly Communications
  • Publish Your Work
  • Measure Your Research Impact
  • Engage in Open Science
  • Libraries and Publishers
  • Directions & Maps
  • Floor Plans

Library Updates

  • Annual Snapshot
  • Conference Presentations
  • Contact Information
  • Gifts & Donations
  • What is a Systematic Review?
  • Types of Reviews
  • Manuals and Reporting Guidelines
  • Our Service
  • 1. Assemble Your Team

2. Develop a Research Question

  • 3. Write and Register a Protocol
  • 4. Search the Evidence
  • 5. Screen Results
  • 6. Assess for Quality and Bias
  • 7. Extract the Data
  • 8. Write the Review
  • Additional Resources
  • Finding Full-Text Articles

A well-developed and answerable question is the foundation for any systematic review. This process involves:

  • Systematic review questions typically follow a PICO-format (patient or population, intervention, comparison, and outcome)
  • Using the PICO framework can help team members clarify and refine the scope of their question. For example, if the population is breast cancer patients, is it all breast cancer patients or just a segment of them? 
  • When formulating your research question, you should also consider how it could be answered. If it is not possible to answer your question (the research would be unethical, for example), you'll need to reconsider what you're asking
  • Typically, systematic review protocols include a list of studies that will be included in the review. These studies, known as exemplars, guide the search development but also serve as proof of concept that your question is answerable. If you are unable to find studies to include, you may need to reconsider your question

Other Question Frameworks

PICO is a helpful framework for clinical research questions, but may not be the best for other types of research questions. Did you know there are at least  25 other question frameworks  besides variations of PICO?  Frameworks like PEO, SPIDER, SPICE, and ECLIPS can help you formulate a focused research question. The table and example below were created by the  Medical University of South Carolina (MUSC) Libraries .

The PEO question framework is useful for qualitative research topics. PEO questions identify three concepts: population, exposure, and outcome. Research question : What are the daily living experiences of mothers with postnatal depression?

opulation Who is my question focused on? mothers
xposure What is the issue I am interested in? postnatal depression
utcome What, in relation to the issue, do I want to examine? daily living experiences

The SPIDER question framework is useful for qualitative or mixed methods research topics focused on "samples" rather than populations. SPIDER questions identify five concepts: sample, phenomenon of interest, design, evaluation, and research type.

Research question : What are the experiences of young parents in attendance at antenatal education classes?

Element Definition Example
ample Who is the group of people being studied? young parents
henomenon
of  nterest
What are the reasons for behavior and decisions? attendance at antenatal education classes
esign How has the research been collected (e.g., interview, survey)? interviews
valuation What is the outcome being impacted?

experiences

esearch type What type of research (qualitative or mixed methods)? qualitative studies

The SPICE question framework is useful for qualitative research topics evaluating the outcomes of a service, project, or intervention. SPICE questions identify five concepts: setting, perspective, intervention/exposure/interest, comparison, and evaluation.

Research question : For teenagers in South Carolina, what is the effect of provision of Quit Kits to support smoking cessation on number of successful attempts to give up smoking compared to no support ("cold turkey")?

Element Definition Example
etting Setting is the context for the question (where). South Carolina
erspective Perspective is the users, potential users, or stakeholders of the service (for whom). teenagers
ntervention / Exposure Intervention is the action taken for the users, potential users, or stakeholders (what). provision of Quit Kits to support smoking cessation
omparison Comparison is the alternative actions or outcomes (compared to what).

no support or "cold turkey"

valuation Evaluation is the result or measurement that will determine the success of the intervention (what is the result, how well). number of successful attempts to give up smoking with Quit Kits compared to number of successful attempts with no support

The ECLIPSE framework is useful for qualitative research topics investigating the outcomes of a policy or service. ECLIPSE questions identify six concepts: expectation, client group, location, impact, professionals, and service.

Research question:  How can I increase access to wireless internet for hospital patients?

xpectation What are you looking to improve or change? What is the information going to be used for? to increase access to wireless internet in the hospital
lient group Who is the service or policy aimed at? patients and families
ocation Where is the service or policy located? hospitals
mpact What is the change in service or policy that the researcher is investigating? clients have easy access to free internet
rofessionals Who is involved in providing or improving the service or policy? IT, hospital administration
rvice What kind of service or policy is this? provision of free wireless internet to patients
  • << Previous: 1. Assemble Your Team
  • Next: 3. Write and Register a Protocol >>
  • Last Updated: Jun 18, 2024 9:41 AM
  • URL: https://guides.mclibrary.duke.edu/sysreview
  • Duke Health
  • Duke University
  • Duke Libraries
  • Medical Center Archives
  • Duke Directory
  • Seeley G. Mudd Building
  • 10 Searle Drive
  • [email protected]

Banner

Systematic Reviews: Formulate your question and protocol

  • Formulate your question and protocol
  • Developing the review protocol
  • Searching for evidence
  • Search strategy
  • Managing search results
  • Evaluating results (critical appraisal)
  • Synthesising and reporting
  • Further resources

This video illustrates how to use the PICO framework to formulate an effective research question, and it also shows how to search a database using the search terms identified. The database used in this video is CINAHL but the process is very similar in databases from other companies as well.

Recommended Reading

  • BMJ Best Practice Advice on using the PICO framework.

A longer on the important pre-planning and protocol development stages of systematic reviews, including tips for success and pitfalls to avoid. 

* You can start watching this video from around the 9 minute mark.*

Formulate Your Question

Having a focused and specific research question is especially important when undertaking a systematic review. If your search question is too broad you will retrieve too many search results and you will be unable to work with them all. If your question is too narrow, you may miss relevant papers. Taking the time to break down your question into separate, focused concepts will also help you search the databases effectively.

Deciding on your inclusion and exclusion criteria early on in the research process can also help you when it comes to focusing your research question and your search strategy.

A literature searching planning template can help to break your search question down into concepts and to record alternative search terms. Frameworks such as PICO and PEO can also help guide your search. A planning template is available to download below, and there is also information on PICO and other frameworks ( Adapted from: https://libguides.kcl.ac.uk/systematicreview/define).

Looking at published systematic reviews can give you ideas of how to construct a focused research question and an effective search strategy.

Example of an unfocused research question: How can deep vein thrombosis be prevented?

Example of a focused research question: What are the effects of wearing compression stockings versus not wearing them for preventing DVT in people travelling on flights lasting at least four hours.

In this Cochrane systematic review by Clarke et al. (2021), publications on randomised trials of compression stockings versus no stockings in passengers on flights lasting at least four hours were gathered. The appendix of the published review contains the comprehensive search strategy used.  This research question has focused on a particular method (wearing compression stockings) in a particular setting (flights of at least 4 hrs) and included only specific studies (randomised trails). An additional way of focusing a question could be to look at a particular section of the population.

Clarke  M. J., Broderick  C., Hopewell  S., Juszczak  E., and Eisinga  A., 20121. Compression stockings for preventing deep vein thrombosis in airline passengers. Cochrane Database of Systematic Reviews 2021, Issue 4. Art. No.: CD004002  [Accessed 30th April 2021].  Available from: 10.1002/14651858.CD004002.pub4

There are many different frameworks that you can use to structure your research question with clear parameters. The most commonly used framework is PICO:

  • Population This could be the general population, or a specific group defined by: age, socioeconomic status, location and so on.
  • Intervention This is the therapy/test/strategy to be investigated and can include medication, exercise, environmental factors, and counselling for example. It may help to think of this as 'the thing that will make a difference'.
  • Comparator This is a measure that you will use to compare results against. This can be patients who received no treatment or a placebo, or people who received alternative treatment/exposure, for instance.
  • Outcome What outcome is significant to your population or issue? This may be different from the outcome measures used in the studies.

Adapted from:  https://libguides.reading.ac.uk/systematic-review/protocol

  • Developing an efficient search strategy using PICO A tool created by Health Evidence to help construct a search strategy using PICO

Other Frameworks: alternatives to PICO

As well as PICO, there are other frameworks available, for instance:

  • PICOT : Population, Intervention, Comparison, Outcome, Time.
  • PEO: Population and/or Problem, Exposures, Outcome
  • SPICE: Setting, Population or Perspective, Intervention, Comparison, Evaluation
  • ECLIPS: Expectations, Client Group, Location, Impact, Professionals Involved, Service
  • SPIDER: Sample, Phenomenon of interest, Design, Evaluation, Research type

This page from City, University of London, contains useful information on several frameworks, including the ones listed above.

Develop Your Protocol

Atfer you have created your research question, the next step is to develop a protocol which outlines the study methodology. You need to include the following:

  • Research question and aims
  • Criteria for inclusion and exclusion
  • search strategy
  • selecting studies for inclusion
  • quality assessment
  • data extraction & analysis
  • synthesis of results
  • dissemination

To find out how much has been published on a particular topic, you can perform scoping searches in relevant databases. This can help you decide on the time limits of your study.

  • Systematic review protocol template This template from the University of Reading can help you plan your protocol.
  • Protocol Guidance This document from the University of York describes what each element of your protocol should cover.

Register Your Protocol

It is good practice to register your protocol and often this is a requirement for future publication of the review.

You can register your protocol here:

  • PROSPERO: international prospective register of systematic review
  • Cochrane Collaboration, Getting Involved
  • Campbell Collaboration, Co-ordinating Groups

Adapted from:   https://libguides.bodleian.ox.ac.uk/systematic-reviews/methodology

  • << Previous: Home
  • Next: Developing the review protocol >>
  • Last Updated: Sep 12, 2023 5:29 PM
  • URL: https://libguides.qmu.ac.uk/systematic-reviews
  • Library guides
  • Book study rooms
  • Library Workshops
  • Library Account  

City, University of London

Library Services

  • Library Services Home

Advanced literature search and systematic reviews

  • Introduction

Formulate your question

Using frameworks to structure your question, selecting a framework, inclusion and exclusion criteria, the scoping search.

  • Videos and Support
  • Step 2 - Develop a search strategy
  • Step 3 - Selecting databases
  • Step 4 - Develop your protocol
  • Step 5 - Perform your search
  • Step 6 - Searching grey literature
  • Step 7 - Manage your results
  • Step 8 - Analyse and understand your results
  • Step 9 - Write your methodology
  • Videos and support

Formulating a clear, well-defined, relevant and answerable research question is essential to finding the best evidence for your topic. On this page we outline the approaches to developing a research question that can be used as the basis for a review. 

Frameworks have been designed to help you structure research questions and identify the main concepts you want to focus on. Your topic may not fit perfectly into one of the frameworks listed on this page, but just using part of a framework can be sufficient.

The framework you should use depends on the type of question you will be researching.

PICO (variants: PIO, PICOT, PICOS) Health
PEO, PICO (variants: PIO, PICOT, PICOS) Health; Social Sciences; Business and Policy; Environment
PEO, PICo, CLIP, ECLIPSE, SPICE, SPIDER Social Sciences; Management; Health
SPICE, SPIDER Health; Social Sciences
BeHEMoTH Health

A framework used for formulating a clinical research question, i.e. questions covering the effectiveness of an intervention, treatment, etc. 

PICO element Definition Scenario
P (Patient / Population / Problem) Describe your patient, population or problem

Children

I (Intervention / Indicator) What intervention is being considered? Mind body therapies
C (Comparison / Control) What is your comparison or control? Prescription drugs
O (Outcome) What outcome are you looking for?  Controlling headaches

Extensions to PICO

If your topic has additional concepts, there are extensions to the PICO framework that you can use: 

PICOS - S stands for study design.  Use this framework if you are only interested in examining specific designs of study. 

PICOT - T  stands for timeframe.  Use this framework if your outcomes need to be measured in a certain amount of time, e.g. 24 hours after surgery. 

PICOC - C stands for context.  Use this framework if you are focussing on a particular organisation or circumstances or scenario. 

A framework used for questions relating to prognosis issues. 

PFO element Definition Scenario
P (Population) Who is the question focussed on? Children 
F (Prognostic Factors) What is being prognosed? Febrile seizures
O (Seizure disorders) What are the possible outcomes? Seizure disorders 

A framework used for questions relating to the prevalence / incidence of a condition.

CoCoPop element Definition Scenario
Co (Condition) What condition / problem are you examining? Claustrophobia
Co (Context) In which context is your question set? MRI
Pop (Population) Describe your population Adults 

Used for questions relating to cost effectiveness, economic evaluations and service improvements.

CLIP element Definition Scenario
C (Client) Who is the service aimed at? Elderly
L (Location) Where is the service located? Rural communities
I (Improvement) What do you want to find out? How the services can be improved
P (Professional) Who is involved in providing the service? Health visiting

Used for questions relating to cost effectiveness, economic evaluations, and service improvements.

ECLIPSE element Definition Scenario
E (Expectation) Purpose of the study - what are you trying to achieve? To find retention rates
C (Client group) Who is the information needed for? Patients? Managers?
L (Location) Where is the client group based? NHS
I (Impact) If your research is looking for service improvement, what is it? How is it measured? Retention of staff
P (Professionals) What professional staff are involved? Nurses
S (Service) For which service are you looking for information? A&E

Used for qualitative questions evaluating experiences and meaningfulness 

PICo element Definition Scenario
P (Patient / Population / Problem) Describe your patient, population, or problem Patients with pressure sores
I (Interest) Describe the event, experience, activity or process Experiences / views / opinions
Co (Context) Describe the setting or characteristics  Care in the home

For quantitative and qualitative questions evaluating experiences, and meaningfulness.

PEO element Definition Scenario
P (Patient / Population / Problem) Describe your patient, population or problem Carers
E (Exposure) What is the issue you are interested in? Dementia
O (Outcomes or themes) What (in relation to the issue) do you want to examine? Quality of life

Used for qualitative questions evaluating experiences and meaningfulness.

SPICE element Definition Scenario
S (Setting) Where is the study set? United Kingdom? Care homes?
P (Population / Perspective) From which population / perspective is the study done? Carers
I (Intervention) Describe the intervention being studied Reminiscence therapy
C (Comparison) Is the intervention being compared with another? Not available
E (Evaluation) How well did the intervention work? Attitudes

Framework used for qualitative questions evaluating experiences and meaningfulness.

SPIDER element Definition Scenario
S (Sample) Describe the group you are focussing on Young parents
PI (Phenomenon of interest) The behaviour or experience your research is examining Ante-natal education classes
D (Design) How was the research carried out?  Interview, questionnaire, phenomenology
E (Evaluation) Which outcome are you measuring? Experiences
R (Research type) Qualitative? Quantitative? Or mixed methods? Qualitative

When you formulate a research question you also need to consider your inclusion and exclusion criteria. These are a list of pre-defined characteristics the literature must have, if they are to be included in a study. Different factors can be used as inclusion or exclusion criteria. 

The most common inclusion / exclusion criteria are: 

Geographic location

Limit the review of study to geographical area.

How far back do you wish to search for information? (For systematic reviews you need to give a reason if you choose to restrict your search by date).

Publication type

Common excluded publications are reviews and editorials.

Participants

Adults, child studies, certain age groups?

Limit the review of study to language.

Peer review

Has to be reviewed by accredited professionals in the field.

Study design

Randomised controlled trials, cohort studies?

Primary care, hospitals, general practice, schools?

Once you have a clear research question, you need to conduct a scoping search to identify:

  • The search terms you should use to retrieve information on your topic.
  • The body of the literature that has already been written on your topic.
  • If a systematic review, covering the question you are considering, has already been published or has been registered and it is in the process of being completed. If that is the case, you need to modify your research question. If the systematic review was completed over five years ago, you can perform an update of the same question. 

Search the following resources to find systematic reviews, either completed or in progress. Check the Supporting videos and online tutorials page on this guide for demonstration of how to do a scoping search. 

Database of systematic review protocols and published systematic reviews. Accessibility statement for Prospero

A clinical search engine providing access to research evidence in the form of primary research articles, clinical trials, systematic reviews and evidence summaries. Grey literature is also available.

  • The Cochrane Library The Cochrane Library is a collection of databases that contain different types of high-quality, independent evidence to inform healthcare decision-making. more... less... Advanced search.

To find primary research related to your topic you can search databases available via: 

Platform providing access to databases covering a variety of subjects including business, economics, education, environment, food science, health, politics and sociology. Accessibility statement for EBSCO

Provides access to a number of health databases covering general health topics as well as allied health; complementary medicine, health management, international health, maternity care, nursing and social policy. Accessibility statement for Ovid Online

  • << Previous: Introduction
  • Next: Step 2 - Develop a search strategy >>
  • Last Updated: Jun 13, 2024 11:44 AM
  • URL: https://libguides.city.ac.uk/systematic-reviews
  • Mayo Clinic Libraries
  • Systematic Reviews

Develop & Refine Your Research Question

Systematic reviews: develop & refine your research question.

  • Knowledge Synthesis Comparison
  • Knowledge Synthesis Decision Tree
  • Standards & Reporting Results
  • Materials in the Mayo Clinic Libraries
  • Training Resources
  • Review Teams
  • Develop a Timeline
  • Project Management
  • Communication
  • PRISMA-P Checklist
  • Eligibility Criteria
  • Register your Protocol
  • Other Resources
  • Other Screening Tools
  • Grey Literature Searching
  • Citation Searching
  • Data Extraction Tools
  • Minimize Bias
  • Risk of Bias by Study Design
  • GRADE & GRADE-CERQual
  • Synthesis & Meta-Analysis
  • Publishing your Systematic Review

A clear, well-defined, and answerable research question is essential for any systematic review, meta-analysis, or other form of evidence synthesis. The question must be answerable. Spend time refining your research question.

  • PICO Worksheet

PICO Framework

Focused question frameworks.

The PICO mnemonic is frequently used for framing quantitative clinical research questions. 1

Patient or problem being addressed
Intervention or exposure being studied
Comparison intervention or exposure
Clinical Outcome

The PEO acronym is appropriate for studies of diagnostic accuracy 2

Patient
Exposure (the test that is being evaluated)
Outcome

The SPICE framework is effective “for formulating questions about qualitative or improvement research.” 3

Setting of your project
Population being studied
Intervention (drug, therapy, improvement program)
Comparison
Evaluation (how were outcomes evaluated?)

The SPIDER search strategy was designed for framing questions best answered by qualitative and mixed-methods research. 4

Sample: what groups are of interest?
Phenomenon of Interest: what behaviors, decisions, or experience do you want to study?
Design: are you applying a theoretical framework or specific research method?
Evaluation: how were outcomes evaluated and measured?
Research type: qualitative or mixed-methods?

References & Recommended Reading

1.          Anastasiadis E, Rajan P, Winchester CL. Framing a research question: The first and most vital step in planning research. Journal of Clinical Urology. 2015;8(6):409-411.

2.          Speckman RA, Friedly JL. Asking Structured, Answerable Clinical Questions Using the Population, Intervention/Comparator, Outcome (PICO) Framework. PM&R. 2019;11(5):548-553.

3.          Knowledge Into Action Toolkit. NHS Scotland. http://www.knowledge.scot.nhs.uk/k2atoolkit/source/identify-what-you-need-to-know/spice.aspx . Accessed April 23, 2021.

4.          Cooke A, Smith D, Booth A. Beyond PICO: the SPIDER tool for qualitative evidence synthesis. Qualitative health research. 2012;22(10):1435-1443.

  • << Previous: Review Teams
  • Next: Develop a Timeline >>
  • Last Updated: Jul 17, 2024 12:30 PM
  • URL: https://libraryguides.mayo.edu/systematicreviewprocess

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10248995

Logo of sysrev

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table ​ (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses

 Cochrane (formerly Cochrane Collaboration)
 JBI (formerly Joanna Briggs Institute)
 National Institute for Health and Care Excellence (NICE)—United Kingdom
 Scottish Intercollegiate Guidelines Network (SIGN) —Scotland
 Agency for Healthcare Research and Quality (AHRQ)—United States

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Review typeTopic assessedElements of research question (mnemonic)
Intervention [ , ]Benefits and harms of interventions used in healthcare. opulation, ntervention, omparator, utcome ( )
Diagnostic test accuracy [ ]How well a diagnostic test performs in diagnosing and detecting a particular disease. opulation, ndex test(s), and arget condition ( )
Qualitative
 Cochrane [ ]Questions are designed to improve understanding of intervention complexity, contextual variations, implementation, and stakeholder preferences and experiences.

etting, erspective, ntervention or Phenomenon of nterest, omparison, valuation ( )

ample, henomenon of nterest, esign, valuation, esearch type ( )

spective, etting, henomena of interest/Problem, nvironment, omparison (optional), me/timing, indings ( )

 JBI [ ]Questions inform meaningfulness and appropriateness of care and the impact of illness through documentation of stakeholder experiences, preferences, and priorities. opulation, the Phenomena of nterest, and the ntext
Prognostic [ ]Probable course or future outcome(s) of people with a health problem. opulation, ntervention (model), omparator, utcomes, iming, etting ( )
Etiology and risk [ ]The relationship (association) between certain factors (e.g., genetic, environmental) and the development of a disease or condition or other health outcome. opulation or groups at risk, xposure(s), associated utcome(s) (disease, symptom, or health condition of interest), the context/location or the time period and the length of time when relevant ( )
Measurement properties [ , ]What is the most suitable instrument to measure a construct of interest in a specific study population? opulation, nstrument, onstruct, utcomes ( )
Prevalence and incidence [ ]The frequency, distribution and determinants of specific factors, health states or conditions in a defined population: eg, how common is a particular disease or condition in a specific group of individuals?Factor, disease, symptom or health ndition of interest, the epidemiological indicator used to measure its frequency (prevalence, incidence), the ulation or groups at risk as well as the ntext/location and time period where relevant ( )

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI

Intervention857296.3Effectiveness43561.5
Diagnostic1761.9Diagnostic Test Accuracy91.3
Overview640.7Umbrella40.6
Methodology410.45Mixed Methods20.3
Qualitative170.19Qualitative15922.5
Prognostic110.12Prevalence and Incidence60.8
Rapid110.12Etiology and Risk71.0
Prototype 80.08Measurement Properties30.4
Economic60.6
Text and Opinion10.14
Scoping436.0
Comprehensive 324.5
Total = 8900Total = 707

a Data from https://www.cochranelibrary.com/cdsr/reviews . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. ​ Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis

 Quality of Reporting of Meta-analyses (QUOROM) StatementMoher 1999 [ ]
 Meta-analyses Of Observational Studies in Epidemiology (MOOSE)Stroup 2000 [ ]
 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)Moher 2009 [ ]
 PRISMA 2020 Page 2021 [ ]
 Overview Quality Assessment Questionnaire (OQAQ)Oxman and Guyatt 1991 [ ]
 Systematic Review Critical Appraisal SheetCentre for Evidence-based Medicine 2005 [ ]
 A Measurement Tool to Assess Systematic Reviews (AMSTAR)Shea 2007 [ ]
 AMSTAR-2 Shea 2017 [ ]
 Risk of Bias in Systematic Reviews (ROBIS) Whiting 2016 [ ]

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

Characteristic
ExtensiveExtensive
InterventionIntervention, diagnostic, etiology, prognostic
7 critical, 9 non-critical4
 Total number1629
 Response options

Items # 1, 3, 5, 6, 10, 13, 14, 16: rated or

Items # 2, 4, 7, 8, 9 : rated or

Items # 11 , 12, 15: rated or

24 assessment items: rated

5 items regarding level of concern: rated

 ConstructConfidence based on weaknesses in critical domainsLevel of concern for risk of bias
 CategoriesHigh, moderate, low, critically lowLow, high, unclear

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions

PRISMA for systematic reviews with a focus on health equity [ ]PRISMA-E2012
Reporting systematic reviews in journal and conference abstracts [ ]PRISMA for Abstracts2015; 2020
PRISMA for systematic review protocols [ ]PRISMA-P2015
PRISMA for Network Meta-Analyses [ ]PRISMA-NMA2015
PRISMA for Individual Participant Data [ ]PRISMA-IPD2015
PRISMA for reviews including harms outcomes [ ]PRISMA-Harms2016
PRISMA for diagnostic test accuracy [ ]PRISMA-DTA2018
PRISMA for scoping reviews [ ]PRISMA-ScR2018
PRISMA for acupuncture [ ]PRISMA-A2019
PRISMA for reporting literature searches [ ]PRISMA-S2021

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a

Table Table
Methods for study selection#5#2.5All three components must be done in duplicate, and methods fully described.Helps to mitigate CoI and bias; also may improve accuracy.
Methods for data extraction#6#3.1
Methods for RoB assessmentNA#3.5
Study description#8#3.2Research design features, components of research question (eg, PICO), setting, funding sources.Allows readers to understand the individual studies in detail.
Sources of funding#10NAIdentified for all included studies.Can reveal CoI or bias.
Publication bias#15*#4.5Explored, diagrammed, and discussed.Publication and other selective reporting biases are major threats to the validity of systematic reviews.
Author CoI#16NADisclosed, with management strategies described.If CoI is identified, management strategies must be described to ensure confidence in the review.

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

AcronymMeaning
feasible, interesting, novel, ethical, and relevant
specific, measurable, attainable, relevant, timely
time, outcomes, population, intervention, context, study design, plus (effect) moderators

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses

 BMJ Open
 BioMed Central
 JMIR Research Protocols
 World Journal of Meta-analysis
 Cochrane
 JBI
 PROSPERO

 Research Registry-

 Registry of Systematic Reviews/Meta-Analyses

 International Platform of Registered Systematic Review and Meta-analysis Protocols (INPLASY)
 Center for Open Science
 Protocols.io
 Figshare
 Open Science Framework
 Zenodo

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis

Aggregate data

Individual

participant data

Weighted average of effect estimates

Pairwise comparisons of effect estimates, CI

Overall effect estimate, CI, value

Evaluation of heterogeneity

Forest plot with summary statistic for average effect estimate
Network Variable The interventions, which are compared directly indirectlyNetwork diagram or graph, tabular presentations
Comparisons of relative effects between any pair of interventionsEffect estimates for intervention pairings
Summary relative effects for pair-wise comparisons with evaluations of inconsistency and heterogeneityForest plot, other methods
Treatment rankings (ie, probability that an intervention is among the best options)Rankogram plot
Summarizing effect estimates from separate studies (without combination that would provide an average effect estimate)Range and distribution of observed effects such as median, interquartile range, range

Box-and-whisker plot, bubble plot

Forest plot (without summary effect estimate)

Combining valuesCombined value, number of studiesAlbatross plot (study sample size against values per outcome)
Vote counting by direction of effect (eg, favors intervention over the comparator)Proportion of studies with an effect in the direction of interest, CI, valueHarvest plot, effect direction plot

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4  for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

[ ]
Risk of bias [ ]Large magnitude of effect
Imprecision [ ]Dose–response gradient
Inconsistency [ ]All residual confounding would decrease magnitude of effect (in situations with an effect)
Indirectness [ ]
Publication bias [ ]

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

 ⊕  ⊕  ⊕  ⊕ High: We are very confident that the true effect lies close to that of the estimate of the effect
 ⊕  ⊕  ⊕ Moderate: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
 ⊕  ⊕ Low: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect
 ⊕ Very low: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

1. The certainty in the evidence (also known as quality of evidence or confidence in the estimates) should be defined consistently with the definitions used by the GRADE Working Group.
2. Explicit consideration should be given to each of the GRADE domains for assessing the certainty in the evidence (although different terminology may be used).
3. The overall certainty in the evidence should be assessed for each important outcome using four or three categories (such as high, moderate, low and/or very low) and definitions for each category that are consistent with the definitions used by the GRADE Working Group.
4. Evidence summaries … should be used as the basis for judgments about the certainty in the evidence.

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a

Cochrane , JBICochrane, JBICochraneCochrane, JBIJBIJBIJBICochrane, JBIJBI
 ProtocolPRISMA-P [ ]PRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-P
 Systematic reviewPRISMA 2020 [ ]PRISMA-DTA [ ]PRISMA 2020

eMERGe [ ]

ENTREQ [ ]

PRISMA 2020PRISMA 2020PRISMA 2020PRIOR [ ]PRISMA-ScR [ ]
 Synthesis without MASWiM [ ]PRISMA-DTA [ ]SWiM eMERGe [ ] ENTREQ [ ] SWiM SWiM SWiM PRIOR [ ]

For RCTs: Cochrane RoB2 [ ]

For NRSI:

ROBINS-I [ ]

Other primary research

QUADAS-2[ ]

Factor review QUIPS [ ]

Model review PROBAST [ ]

CASP qualitative checklist [ ]

JBI Critical Appraisal Checklist [ ]

JBI checklist for studies reporting prevalence data [ ]

For NRSI: ROBINS-I [ ]

Other primary research

COSMIN RoB Checklist [ ]AMSTAR-2 [ ] or ROBIS [ ]Not required
GRADE [ ]GRADE adaptation GRADE adaptation

CERQual [ ]

ConQual [ ]

GRADE adaptation Risk factors GRADE adaptation

GRADE (for intervention reviews)

Risk factors

Not applicable

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

A review that uses explicit, systematic methods to collate and synthesize findings of studies that address a clearly formulated question.
The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates and other methods, such as combining values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect.
A statistical technique used to synthesize results when study effect estimates and their variances are available, yielding a quantitative summary of results.
An event or measurement collected for participants in a study (such as quality of life, mortality).
The combination of a point estimate (such as a mean difference, risk ratio or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome.
A document (paper or electronic) supplying information about a particular study. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information.
The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.
An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses.

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

PreferredPotentially problematic

Evidence synthesis with meta-analysis

Systematic review with meta-analysis

Meta-analysis
Overview or umbrella review

Systematic review of systematic reviews

Review of reviews

Meta-review

RandomizedExperimental
Non-randomizedObservational
Single case experimental design

Single-subject research

N-of-1 design

Case report or case seriesDescriptive study
Methodological qualityQuality
Certainty of evidence

Quality of evidence

Grade of evidence

Level of evidence

Strength of evidence

Qualitative systematic reviewQualitative synthesis
Synthesis of qualitative data Qualitative synthesis
Synthesis without meta-analysis

Narrative synthesis , narrative summary

Qualitative synthesis

Descriptive synthesis, descriptive summary

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Declarations

The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Duke University Libraries

Systematic Reviews for Non-Health Sciences

  • Getting started
  • Types of reviews
  • 0. Planning the systematic review
  • 1. Formulating the research question

Formulating a research question

Purpose of a framework, selecting a framework.

  • 2. Developing the protocol
  • 3. Searching, screening, and selection of articles
  • 4. Critical appraisal
  • 5. Writing and publishing
  • Guidelines & standards
  • Software and tools
  • Software tutorials
  • Resources by discipline
  • Duke Med Center Library: Systematic reviews This link opens in a new window
  • Overwhelmed? General literature review guidance This link opens in a new window

Email a Librarian

how to formulate a research question systematic review

Contact a Librarian

Ask a Librarian

Formulating a question.

Formulating a strong research question for a systematic review can be a lengthy process. While you may have an idea about the topic you want to explore, your specific research question is what will drive your review and requires some consideration. 

You will want to conduct preliminary  or  exploratory searches  of the literature as you refine your question. In these searches you will want to:

  • Determine if a systematic review has already been conducted on your topic and if so, how yours might be different, or how you might shift or narrow your anticipated focus
  • Scope the literature to determine if there is enough literature on your topic to conduct a systematic review
  • Identify key concepts and terminology
  • Identify seminal or landmark studies
  • Identify key studies that you can test your research strategy against (more on that later)
  • Begin to identify databases that might be useful to your search question

Systematic review vs. other reviews

Systematic reviews required a  narrow and specific research question. The goal of a systematic review is to provide an evidence synthesis of ALL research performed on one particular topic. So, your research question should be clearly answerable from the data you gather from the studies included in your review.

Ask yourself if your question even warrants a systematic review (has it been answered before?). If your question is more broad in scope or you aren't sure if it's been answered, you might look into performing a systematic map or scoping review instead.

Learn more about systematic reviews versus scoping reviews:

  • CEE. (2022). Section 2:Identifying the need for evidence, determining the evidence synthesis type, and establishing a Review Team. Collaboration for Environmental Evidence.  https://environmentalevidence.org/information-for-authors/2-need-for-evidence-synthesis-type-and-review-team-2/
  • DistillerSR. (2022). The difference between systematic reviews and scoping reviews. DistillerSR.  https://www.distillersr.com/resources/systematic-literature-reviews/the-difference-between-systematic-reviews-and-scoping-reviews
  • Nalen, CZ. (2022). What is a scoping review? AJE.  https://www.aje.com/arc/what-is-a-scoping-review/

Illustration of man holding check mark, woman holding cross, with large page in between them

  • Frame your entire research process
  • Determine the scope of your review
  • Provide a focus for your searches
  • Help you identify key concepts
  • Guide the selection of your papers

There are different frameworks you can use to help structure a question.

  • PICO / PECO
  • What if my topic doesn't fit a framework?

The PICO or PECO framework is typically used in clinical and health sciences-related research, but it can also be adapted for other quantitative research.

P — Patient / Problem / Population

I / E — Intervention / Indicator / phenomenon of Interest / Exposure / Event 

C  — Comparison / Context / Control

O — Outcome

Example topic : Health impact of hazardous waste exposure

Population E Comparators Outcomes
People living near hazardous waste sites Exposure to hazardous waste All comparators All diseases/health disorders

Fazzo, L., Minichilli, F., Santoro, M., Ceccarini, A., Della Seta, M., Bianchi, F., Comba, P., & Martuzzi, M. (2017). Hazardous waste and health impact: A systematic review of the scientific literature.  Environmental Health ,  16 (1), 107.  https://doi.org/10.1186/s12940-017-0311-8

The SPICE framework is useful for both qualitative and mixed-method research. It is often used in the social sciences.

S — Setting (where?)

P — Perspective (for whom?)

I — Intervention / Exposure (what?)

C — Comparison (compared with what?)

E — Evaluation (with what result?)

Learn more : Booth, A. (2006). Clear and present questions: Formulating questions for evidence based practice.  Library Hi Tech ,  24 (3), 355-368.  https://doi.org/10.1108/07378830610692127

The SPIDER framework is useful for both qualitative and mixed-method research. It is most often used in health sciences research.

S — Sample

PI — Phenomenon of Interest

D — Design

E — Evaluation

R — Study Type

Learn more : Cooke, A., Smith, D., & Booth, A. (2012). Beyond PICO: The SPIDER tool for qualitative evidence synthesis.  Qualitative Health Research, 22 (10), 1435-1443.  https://doi.org/10.1177/1049732312452938

The CIMO framework is used to understand complex social and organizational phenomena, most useful for management and business research.

C — Context (the social and organizational setting of the phenomenon)

I  — Intervention (the actions taken to address/influence the phenomenon)

M — Mechanisms (the underlying processes or mechanisms that drive change within the phenomenon)

O — Outcomes (the resulting changes that occur due to intervention/mechanisms)

Learn more : Denyer, D., Tranfield, D., & van Aken, J. E. (2008). Developing design propositions through research synthesis. Organization Studies, 29 (3), 393-413. https://doi.org/10.1177/0170840607088020

Click  here   for an exhaustive list of research question frameworks from the University of Maryland Libraries.

You might find that your topic does not always fall into one of the models listed on this page. You can always modify a model to make it work for your topic, and either remove or incorporate additional elements. Be sure to document in your review the established framework that yours is based off and how it has been modified.

  • << Previous: 0. Planning the systematic review
  • Next: 2. Developing the protocol >>
  • Last Updated: Jul 11, 2024 11:01 AM
  • URL: https://guides.library.duke.edu/systematicreviews

Duke University Libraries

Services for...

  • Faculty & Instructors
  • Graduate Students
  • Undergraduate Students
  • International Students
  • Patrons with Disabilities

Twitter

  • Harmful Language Statement
  • Re-use & Attribution / Privacy
  • Support the Libraries

Creative Commons License

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

How to Do a Systematic Review: A Best Practice Guide for Conducting and Reporting Narrative Reviews, Meta-Analyses, and Meta-Syntheses

Affiliations.

  • 1 Behavioural Science Centre, Stirling Management School, University of Stirling, Stirling FK9 4LA, United Kingdom; email: [email protected].
  • 2 Department of Psychological and Behavioural Science, London School of Economics and Political Science, London WC2A 2AE, United Kingdom.
  • 3 Department of Statistics, Northwestern University, Evanston, Illinois 60208, USA; email: [email protected].
  • PMID: 30089228
  • DOI: 10.1146/annurev-psych-010418-102803

Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to draw broad theoretical conclusions about what a literature means, linking theory to evidence and evidence to theory. This guide describes how to plan, conduct, organize, and present a systematic review of quantitative (meta-analysis) or qualitative (narrative review, meta-synthesis) information. We outline core standards and principles and describe commonly encountered problems. Although this guide targets psychological scientists, its high level of abstraction makes it potentially relevant to any subject area or discipline. We argue that systematic reviews are a key methodology for clarifying whether and how research findings replicate and for explaining possible inconsistencies, and we call for researchers to conduct systematic reviews to help elucidate whether there is a replication crisis.

Keywords: evidence; guide; meta-analysis; meta-synthesis; narrative; systematic review; theory.

PubMed Disclaimer

Similar articles

  • The future of Cochrane Neonatal. Soll RF, Ovelman C, McGuire W. Soll RF, et al. Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
  • Summarizing systematic reviews: methodological development, conduct and reporting of an umbrella review approach. Aromataris E, Fernandez R, Godfrey CM, Holly C, Khalil H, Tungpunkom P. Aromataris E, et al. Int J Evid Based Healthc. 2015 Sep;13(3):132-40. doi: 10.1097/XEB.0000000000000055. Int J Evid Based Healthc. 2015. PMID: 26360830
  • RAMESES publication standards: meta-narrative reviews. Wong G, Greenhalgh T, Westhorp G, Buckingham J, Pawson R. Wong G, et al. BMC Med. 2013 Jan 29;11:20. doi: 10.1186/1741-7015-11-20. BMC Med. 2013. PMID: 23360661 Free PMC article.
  • A Primer on Systematic Reviews and Meta-Analyses. Nguyen NH, Singh S. Nguyen NH, et al. Semin Liver Dis. 2018 May;38(2):103-111. doi: 10.1055/s-0038-1655776. Epub 2018 Jun 5. Semin Liver Dis. 2018. PMID: 29871017 Review.
  • Publication Bias and Nonreporting Found in Majority of Systematic Reviews and Meta-analyses in Anesthesiology Journals. Hedin RJ, Umberham BA, Detweiler BN, Kollmorgen L, Vassar M. Hedin RJ, et al. Anesth Analg. 2016 Oct;123(4):1018-25. doi: 10.1213/ANE.0000000000001452. Anesth Analg. 2016. PMID: 27537925 Review.
  • Surveillance of Occupational Exposure to Volatile Organic Compounds at Gas Stations: A Scoping Review Protocol. Mendes TMC, Soares JP, Salvador PTCO, Castro JL. Mendes TMC, et al. Int J Environ Res Public Health. 2024 Apr 23;21(5):518. doi: 10.3390/ijerph21050518. Int J Environ Res Public Health. 2024. PMID: 38791733 Free PMC article. Review.
  • Association between poor sleep and mental health issues in Indigenous communities across the globe: a systematic review. Fernandez DR, Lee R, Tran N, Jabran DS, King S, McDaid L. Fernandez DR, et al. Sleep Adv. 2024 May 2;5(1):zpae028. doi: 10.1093/sleepadvances/zpae028. eCollection 2024. Sleep Adv. 2024. PMID: 38721053 Free PMC article.
  • Barriers to ethical treatment of patients in clinical environments: A systematic narrative review. Dehkordi FG, Torabizadeh C, Rakhshan M, Vizeshfar F. Dehkordi FG, et al. Health Sci Rep. 2024 May 1;7(5):e2008. doi: 10.1002/hsr2.2008. eCollection 2024 May. Health Sci Rep. 2024. PMID: 38698790 Free PMC article.
  • Studying Adherence to Reporting Standards in Kinesiology: A Post-publication Peer Review Brief Report. Watson NM, Thomas JD. Watson NM, et al. Int J Exerc Sci. 2024 Jan 1;17(7):25-37. eCollection 2024. Int J Exerc Sci. 2024. PMID: 38666001 Free PMC article.
  • Evidence for Infant-directed Speech Preference Is Consistent Across Large-scale, Multi-site Replication and Meta-analysis. Zettersten M, Cox C, Bergmann C, Tsui ASM, Soderstrom M, Mayor J, Lundwall RA, Lewis M, Kosie JE, Kartushina N, Fusaroli R, Frank MC, Byers-Heinlein K, Black AK, Mathur MB. Zettersten M, et al. Open Mind (Camb). 2024 Apr 3;8:439-461. doi: 10.1162/opmi_a_00134. eCollection 2024. Open Mind (Camb). 2024. PMID: 38665547 Free PMC article.
  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Ingenta plc
  • Ovid Technologies, Inc.

Other Literature Sources

  • scite Smart Citations

Miscellaneous

  • NCI CPTAC Assay Portal
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

how to formulate a research question systematic review

Systematic Reviews

  • Developing a Research Question
  • Developing a Protocol
  • Literature Searching
  • Screening References
  • Data Extraction
  • Quality Assessment
  • Reporting Results
  • Related Guides
  • Getting Help

Developing A Research Question

There are several different methods researchers might use in developing a research question. The best method to use depends on the discipline and nature of the research you hope to review. Consider the following example question templates.

Variations to PICO

Using PICO can help you define and narrow your research question so that it is specific.

  • P  - Patient, population, or problem
  • I   - Intervention
  • C - Comparison or Control
  • O - Outcome

Think about whether your question is relevant to practitioners, and whether the answer will help people (doctors, patients, nurses) make better informed health care decisions.

You can find out more about properly formulated questions by reviewing the YouTube video below.

The PICO method is used frequently, though there are some variations that exist to add other specifications to studies collected. Some variations include PICOSS, PICOT, and PICOC.

  • In addition to the fundamental components of PICO, additional criteria are made for  study design  (S) and  setting  (S).
  • (T), in this instance, represents  timeframe . This method could be used to narrow down length of treatment or intervention in health research.
  • In research where there may not be a comparison, Co instead denotes the  context  of the population and intervention being studied.

Using SPIDER can help you define and narrow your research question so that it is specific. This is typically used in qualitative research (Cooke, Smith, & Booth, 2012).

  • PI - Phenomenon of Interest 
  • E - Evaluation
  • R - Research type

Yet another search measure relating to Evidence-Based Practice (EBP) is SPICE. This framework builds on PICO by considering two additional axes: perspective and setting (Booth, 2006).

  • S - Setting
  • P - Perspective
  • I - Intervention
  • C - Comparison

Inclusion and Exclusion Criteria

Setting inclusion and exclusion criteria is a critical step in the systematic review process.

  • Inclusion criteria determine what characteristics are needed for a study to be included in a systematic review.
  • Exclusion criteria denote what attributes disqualify a study from consideration in a systematic review.
  • Knowing what to exclude or include helps speed up the review process.

These criteria will be used at different parts of the review process, including in search statements and the screening process.

Has this review been done?

After developing the research question, it is necessary to confirm that the review has not previously been conducted (or is currently in progress).

Make sure to check for both published reviews and registered protocols (to see if the review is in progress). Do a thorough search of appropriate databases; if additional help is needed,  consult a librarian  for suggestions.

  • << Previous: Planning a Review
  • Next: Developing a Protocol >>
  • Last Updated: Feb 9, 2024 4:57 PM
  • URL: https://guides.library.duq.edu/systematicreviews

Systematic reviews

  • Introduction to systematic reviews
  • Steps in a systematic review

Formulating a clear and concise question

Pico framework, other search frameworks.

  • Create a protocol (plan)
  • Sources to search
  • Conduct a thorough search
  • Post search phase
  • Select studies (screening)
  • Appraise the quality of the studies
  • Extract data, synthesise and analyse
  • Interpret results and write
  • Guides and manuals
  • Training and support

General principles

"A good systematic review is based on a well formulated, answerable question. The question guides the review by defining which studies will be included, what the search strategy to identify the relevant primary studies should be, and which data need to be extracted from each study."

A systematic review question needs to be

You may find it helpful to use a search framework, such as those listed below, to help you to refine your research question, but it is not mandatory. Similarly, you may not always need to use every aspect of the framework in order to build a workable research question.

Counsell C. Formulating questions and locating primary studies for inclusion in systematic reviews . Ann Intern Med. 1997;127(5):380–387.

To help formulate a focussed research question the PICO tool has been created. PICO is a mnemonic for Population, Intervention, Comparison, and Outcome. These elements have been highlighted to help define the core elements of the question which will be used in the literature search.

The elements of PICO

Population:.

Who or what is the topic of interest, in the health sciences this may be a disease or a condition, in the social sciences this may be a social group with a particular need.

Intervention:

The intervention is the effect or the change upon the population in question. In the health sciences, this could be a treatment, such as a drug, a procedure, or a preventative activity. Depending on the discipline the intervention could be a social policy, education, ban, or legislation.

Comparison:

The comparison is a comparison to the intervention, so if it were a drug it may be a similar drug in which effectiveness is compared. Sometimes the comparator is a placebo or no comparison.

The outcomes in PICO represent the outcomes of interest for the research question. The outcome measures will vary according to the question but will provide the data against which the effectiveness of the intervention is measured.

  • Examples of using the PICO framework (PDF, 173KB) This document contains worked examples of how to use the PICO search framework as well as other frameworks based on PICO.

Not all systematic review questions are well served by the PICO mnemonic and a number of other models have been created, these include:  ECLIPSE (Wildridge & Bell, 2002), SPICE (Booth, 2004), and SPIDER (Cooke, Smith, & Booth, 2012).

- used to define search terms for health & social care management, services or policies.

Expectation

Client group

Location

Impact

Professionals

Service

Wildridge V, Bell L. How CLIP became ECLIPSE: a mnemonic to assist in searching for health policy/ management information . Health Info Libr J. 2002;19(2):113–115.

- useful when examining a service, intervention, or policy.

Setting

Population/Perspective

Intervention

Comparison

Evaluation

Booth A.  Clear and present questions: formulating questions for evidence based practice . Library Hi Tech. 2006;24(3):355-368.

- useful when searching for qualitative or mixed methods studies.

Sample

Phenomenon of Interest

Design

Evaluation

 

Research type

 

Cooke A, Smith D, Booth, A. Beyond PICO: the SPIDER tool for qualitative evidence synthesis . Qual Health Res. 2012;22(10):1435–1443.

Remember: you do not have to use a search framework but it can help you to focus your research question and identify the key concepts and terms that you can use in your search. Similarly, you may not need to use all of the elements in your chosen framework, only the ones that are useful for your individual research question.

  • Using the SPIDER search framework (PDF, 134 KB) This document shows how you can use the SPIDER framework to guide your search.
  • Using the SPICE search framework (PDF, 134 KB) This document shows how you can use the SPICE framework to guide your search.
  • Using the ECLIPSE search framework (PDF, 145 KB) This document shows how you can use the ECLIPSE framework to guide your search.
  • << Previous: Steps in a systematic review
  • Next: Create a protocol (plan) >>
  • Last Updated: Jun 27, 2024 9:31 AM
  • URL: https://guides.library.uq.edu.au/research-techniques/systematic-reviews

Library Guides

Systematic Reviews

  • Introduction to Systematic Reviews
  • Systematic review
  • Systematic literature review
  • Scoping review
  • Rapid evidence assessment / review
  • Evidence and gap mapping exercise
  • Meta-analysis
  • Systematic Reviews in Science and Engineering
  • Timescales and processes
  • Question frameworks (e.g PICO)
  • Inclusion and exclusion criteria
  • Using grey literature
  • Search Strategy This link opens in a new window
  • Subject heading searching (e.g MeSH)
  • Database video & help guides This link opens in a new window
  • Documenting your search and results
  • Data management
  • How the library can help
  • Systematic reviews A to Z

how to formulate a research question systematic review

Using a framework to structure your research question

Your systematic review or systematic literature review will be defined by your research question. A well formulated question will help:

  • Frame your entire research process
  • Determine the scope of your review
  • Provide a focus for your searches
  • Help you identify key concepts
  • Guide the selection of your papers

There are different models you can use to structure help structure a question, which will help with searching.

Selecting a framework

  • What if my topic doesn't fit a framework?

A model commonly used for clinical and healthcare related questions, often, although not exclusively, used for searching for quantitively designed studies. 

Example question: Does handwashing reduce hospital acquired infections in elderly people?

opulation Any characteristic that define your patient or population group.  Elderly people
ntervention What do you want to do with the patient or population? Handwashing
omparison (if relevant)  What are the alternatives to the main intervention? No handwashing
utcome Any specific outcomes or effects of your intervention. Reduced infection

Richardson, W.S., Wilson, M.C, Nishikawa, J. and Hayward, R.S.A. (1995) 'The well-built clinical question: a key to evidence-based decisions.' ACP Journal Club , 123(3) pp. A12

PEO is useful for qualitative research questions.

Example question:  How does substance dependence addiction play a role in homelessness?

Who are the users - patients, family, practitioners or community being affected? What are the symptoms, condition, health status, age, gender, ethnicity? What is the setting e.g. acute care, community, mental health? homeless persons
Exposure to a condition or illness, a risk factor (e.g. smoking), screening, rehabilitation, service etc. drug and alcohol addiction services
Experiences, attitudes, feelings, improvement in condition, mobility, responsiveness to treatment, care, quality of life or daily living. reduced homelessness

Moola S, Munn Z, Sears K, Sfetcu R, Currie M, Lisy K, Tufanaru C, Qureshi R, Mattis P & Mu P. (2015) 'Conducting systematic reviews of association (etiology): The Joanna Briggs Institute's approach'. International Journal of Evidence - Based Healthcare, 13(3), pp. 163-9. Available at: 10.1097/XEB.0000000000000064.

PCC is useful for both qualitative and quantitative (mixed methods) topics, and is commonly used in scoping reviews.

Example question:    “What patient-led models of care are used to manage chronic disease in high income countries?"

Population "Important characteristics of participants, including age and other qualifying criteria.  You may not need to include this element unless your question focuses on a specific condition or cohort." N/A.  As our example considers chronic diseases broadly, not a specific condition/population - such as women with chronic obstructive pulmonary disorder.
Concept

"The core concept examined by the scoping review should be clearly articulated to guide the scope and breadth of the inquiry. This may include details that pertain to elements that would be detailed in a standard systematic review, such as the "interventions" and/or "phenomena of interest" and/or "outcomes".

Chronic disease

Patient-led care models

Peters MDJ, Godfrey C, McInerney P, Munn Z, Tricco AC, Khalil, H. Chapter 11: Scoping Reviews (2020 version). In: Aromataris E, Munn Z (Editors). JBI Manual for Evidence Synthesis, JBI, 2020. Available from   https://synthesismanual.jbi.global  .    https://doi.org/10.46658/JBIMES-20-12

A model useful for qualitative and mixed method type research questions.

Example question: What are young parents’ experiences of attending antenatal education? (Cooke et al., 2012)

ample The group you are focusing on. Young parents
henomenon of nterest  The behaviour or experience your research is examining. Experience of antenatal classes
esign How the research will be carried out? Interviews, questionnaires
valuation What are the outcomes you are measuring? Experiences and views
esearch type What is the research type you are undertaking?  Qualitative

Cooke, A., Smith, D. and Booth, A. (2012) 'Beyond PICO: the SPIDER tool for qualitative evidence synthesis.' Qualitative Health Research , 22(10) pp. 1435-1443

A model useful for qualitative and mixed method type research questions. 

Example question: How effective is mindfulness used as a cognitive therapy in a counseling service in improving the attitudes of patients diagnosed with cancer?

etting The setting or the context Counseling service
opulation or perspective Which population or perspective will the research be conducted for/from Patients diagnosed with cancer
ntervention The intervention been studied Mindfulness based cognitive therapy
omparison  Is there a comparison to be made? No  comparison
valuation How well did the intervention work, what were the results? Assess patients attitudes to see if the intervention improved their quality of life

Example question taken from: Tate, KJ., Newbury-Birch, D., and McGeechan, GJ. (2018) ‘A systematic review of qualitative evidence of  cancer patients’ attitudes to mindfulness.’ European Journal of Cancer Care , 27(2) pp. 1 – 10.

A model useful for qualitative and mixed method type research questions, especially for question examining particular services or professions.

Example question: Cross service communication in supporting adults with learning difficulties

xpectation Purpose of the study - what are you trying to achieve? How communication can be improved between services to create better care
lient group Which group are you focusing on? Adult with learning difficulties
ocation Where is that group based? Community
mpact If your research is looking for service improvement, what is this and how is it being measured? Better support services for adults with learning difficulties through joined up, cross-service working
rofessionals What professional staff are involved? Community nurses, social workers, carers
ervice  Which service are you focusing on? Adult support services

You might find that your topic does not always fall into one of the models listed on this page. You can always modify a model to make it work for your topic, and either remove or incorporate additional elements.

The important thing is to ensure that you have a high quality question that can be separated into its component parts.

  • << Previous: Timescales and processes
  • Next: Inclusion and exclusion criteria >>
  • Last Updated: Jan 23, 2024 10:52 AM
  • URL: https://plymouth.libguides.com/systematicreviews

University of Maryland Libraries Logo

Systematic Review

  • Library Help
  • What is a Systematic Review (SR)?
  • Steps of a Systematic Review
  • Framing a Research Question
  • Developing a Search Strategy
  • Searching the Literature
  • Managing the Process
  • Meta-analysis
  • Publishing your Systematic Review

Developing a Research Question

Image:  

 

 

There are many ways of framing questions depending on the topic, discipline, or type of questions.

Try to generate a few options for your initial research topic and narrow it down to a specific population, geographical location, disease, etc. You may explore a similar tool,   to identify additional search terms.

Several frameworks are listed in the table below.

Source:

Foster, M. & Jewell, S. (Eds). (2017).  . Medical Library Association, Lanham: Rowman & Littlefield. p. 38, Table 3.

_______________________________________________________________________

Watch the 4 min. video on how to frame a research question with PICO.

___ ______ ______________________________________________________________

Frameworks for research questions

Be: behavior of interest
H: health contest (service/policy/intervention)
E: exclusions
MoTh: models or theories
Booth, A., & Carroll, C. (2015). (3), 220–235. https://doi.org/10.1111/hir.12108
 
Questions about theories
Context
How
Issues
Population
Shaw, R. (2010). . In M. A. Forester (Ed.),   (pp. 39-52). London, Sage.
 
Psychology, qualitative
Context
Intervention
Mechanisms
Outcomes
. In D. A. Buchanan & A. Bryman (Eds.),   (pp. 671-689). Thousand Oaks, CA: Sage Publications Ltd. Management, business, administration
Client group
Location of provided service
Improvement/Information/Innovation
Professionals (who provides the service?)
Wildridge, V., & Bell, L. (2002). (2), 113–115. https://doi.org/10.1046/j.1471-1842.2002.00378.x
 
Librarianship, management, policy
Client-Oriented
Practical
Evidence
Search
Gibbs, L. (2003).  Pacific Grove, CA: Brooks/Cole-Thomson Learning. Social work, health care, nursing
Expectation
Client
Location
Impact
Professionals
Service
Wildridge, V., & Bell, L. (2002).    (2), 113–115. https://doi.org/10.1046/j.1471-1842.2002.00378.x Management, services, policy, social care
Population
Exposure
Outcome
Khan, K. S., Kunz, R., Kleijnen, J., & Antes, G. (2003).  . London: Royal Society of Medicine Press. Qualitative
Patient/population/problem
Exposure
Comparison
Outcome
Duration
Results
Dawes, M., Pluye, P., Shea, L., Grad, R., Greenberg, A., & Nie, J.-Y. (2007). . (1), 9–16.
 
Medicine

Perspective
Setting
Phenomenon of interest/Problem
Environment
Comparison (optional)
Time/Timing
Findings

Booth, A., Noyes, J., Flemming, K., Moore, G., Tunçalp, Ö., & Shakibazadeh, E. (2019). . (Suppl 1). Qualitative research
Person
Environments
Stakeholders
Intervention
Comparison
Outcome
Schlosser, R. W., & O'Neil-Pirozzi, T. (2006). .  , 5-10. Augmentative and alternative communication
Patient
Intervention
Comparison
Outcome
Richardson, W. S., Wilson, M. C., Nishikawa, J., & Hayward, R. S. (1995). .  (3), A12-A12. Clinical medicine

Patient
Intervention
Comparison
Outcome

+context, patient values, and preferences

Bennett, S., & Bennett, J. W. (2000). .  (4), 171-180. Occupational therapy

Patient
Intervention
Comparison
Outcome

Context

Petticrew, M., & Roberts, H. (2006).   Malden, MA: Blackwell Publishers.  Social Sciences

Patient
Intervention
Comparison
Outcome

Study Type

Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & Prisma Group. (2009).   (7), e1000097. Medicine

Patient
Intervention
Comparison
Outcome

Time

Richardson, W. S., Wilson, M. C., Nishikawa, J., & Hayward, R. S. (1995).  (3), A12-A12. Education, health care
Patient/participants/population
Index tests
Comparator/reference tests
Outcome
Kim, K. W., Lee, J., Choi, S. H., Huh, J., & Park, S. H. (2015).   (6), 1175-1187. Diagnostic questions
Population
Intervention
Professionals
Outcomes
Health care setting/context
ADAPTE Collaboration. (2009). . Version 2.0. Available from Screening
Problem
Phenomenon of interest
Time

Booth, A., Noyes, J., Flemming, K., Gerhardus, A., Wahlster, P., van der Wilt, G. J., ... & Rehfuess, E. (2016). [Technical Report]. https://doi.org/10.13140/RG.2.1.2318.0562

-----

Booth, A., Sutton, A., & Papaioannou, D. (2016).  (2. ed.). London: Sage.

Social sciences, qualitative, library science
Setting
Perspective
Interest
Comparison
Evaluation
Booth, A. (2006). .  (3), 355-368. Library and information sciences
Sample
Phenomenon of interest
Design
Evaluation
Research type
Cooke, A., Smith, D., & Booth, A. (2012).   (10), 1435-1443. Health, qualitative research
Who
What
How

What was done? (intervention, exposure, policy, phenomenon)

How does the what affect the who?

 

Further reading:

Methley, A. M., Campbell, S., Chew-Graham, C., McNally, R., & Cheraghi-Sohi, S. (2014). PICO, PICOS and SPIDER: A comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews.   BMC Health Services Research, 14 (1), 579.

  • << Previous: Steps of a Systematic Review
  • Next: Developing a Search Strategy >>
  • Last Updated: Jul 11, 2024 6:38 AM
  • URL: https://lib.guides.umd.edu/SR

Doing a Systematic Review: A Student’s Guide

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

What is Systematic Review?

A systematic review is a comprehensive, structured analysis of existing research on a specific topic. It uses predefined criteria to identify, evaluate, and synthesize relevant studies, aiming to provide an unbiased summary of the current evidence.

The explicit and systematic approach of a systematic review distinguishes it from traditional reviews and commentaries.

Here are some key ways that systematic reviews differ from narrative reviews:

  • Goals: Narrative reviews provide a summary or overview of a topic, while systematic reviews answer a focused review question.
  • Sources of Literature: Narrative reviews often use a non-exhaustive and unstated body of literature, which can lead to publication bias. Systematic reviews consider a list of databases, grey literature, and other sources.
  • Selection Criteria: Narrative reviews usually use subjective or no selection criteria, which can lead to selection bias. Systematic reviews have a clear and explicit selection process.
  • Appraisal of Study Quality: Narrative reviews vary in their evaluation of study quality. Systematic reviews use standard checklists for a rigorous appraisal of study quality.

Systematic reviews are time-intensive and need a research team with multiple skills and contributions. There are some cases where systematic reviews are unable to meet the necessary objectives of the review question.

In these cases, scoping reviews (which are sometimes called scoping exercises/scoping studies) may be more useful to consider.

Scoping reviews are different from systematic reviews because they may not include a mandatory critical appraisal of the included studies or synthesize the findings from individual studies.

systematic review

Assessing The Need For A Systematic Review

When assessing the need for a systematic review, one must first check if any existing or ongoing reviews already exist and determine if a new review is justified.

This process should begin by searching relevant databases.

Resources to consider searching include:

  • NICE : National Institute for Health and Clinical Excellence
  • Campbell Library of Systematic Reviews for reviews in education, crime and justice, and social welfare
  • EPPI : Evidence for Policy and Practice Information Centre, particularly their database of systematic and non-systematic reviews of public health interventions (DoPHER)
  • MEDLINE : Primarily covers the medical domain, making it a primary resource for systematic reviews concerning healthcare interventions
  • PsycINFO : For research in psychology, psychiatry, behavioral sciences, and social sciences
  • Cochrane Library (specifically CDSR) : Focuses on systematic reviews of health care interventions, providing regularly updated and critically appraised reviews

If an existing review addressing the question of interest is found, its quality should be assessed to determine its suitability for guiding policy and practice.

If a high-quality, relevant review is located, but its completion date is some time ago, updating the review might be warranted.

Assessing current relevance is vital, especially in rapidly evolving research fields. Collaboration with the original research team might be beneficial during the update process, as they could provide access to their data.

If the review is deemed to be of adequate quality and remains relevant, undertaking another systematic review may not be necessary.

When a new systematic review or an update is deemed necessary, the subsequent step involves establishing a review team and potentially an advisory group, who will then develop the review protocol.

How To Conduct A Systematic Review

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) is a reporting guideline designed to improve the transparency and completeness of systematic review reporting.

PRISMA was created to tackle the issue of inadequate reporting often found in systematic reviews:

  • Checklist : PRISMA features a 27-item checklist covering all aspects of a systematic review, from the rationale and objectives to the synthesis of findings and discussion of limitations. Each checklist item is accompanied by detailed reporting recommendations in an Explanation and Elaboration document .
  • Flow Diagram : PRISMA also includes a flow diagram to visually represent the study selection process, offering a clear, standardized way to illustrate how researchers arrived at the final set of included studies.

systematic review3

Step 1: write a research protocol

A protocol in the context of systematic reviews is a detailed plan that outlines the methodology to be employed throughout the review process.

The protocol serves as a roadmap, guiding researchers through each stage of the review in a transparent and replicable manner.

This document should provide specific details about every stage of the research process, including the methodology for identifying, selecting, and analyzing relevant studies.

For example, the protocol should specify search strategies for relevant studies, including whether the search will encompass unpublished works.

The protocol should be created before beginning the research process to ensure transparency and reproducibility.

This pre-determined plan ensures that decisions made during the review are objective and free from bias, as they are based on pre-established criteria.

Protocol modifications are sometimes necessary during systematic reviews. While adhering to the protocol is crucial for minimizing bias, there are instances where modifications are justified. For instance, a deeper understanding of the research question that emerges from examining primary research might necessitate changes to the protocol.

Systematic reviews should be registered at inception (at the protocol stage) for these reasons:

  • To help avoid unplanned duplication
  • To enable the comparison of reported review methods with what was planned in the protocol

This registration prevents duplication (research waste) and makes the process easy when the full systematic review is sent for publication.

PROSPERO is an international database of prospectively registered systematic reviews in health and social care. Non-Cochrane protocols should be registered on PROSPERO.

Research Protocol

Rasika Jayasekara, Nicholas Procter. The effects of cognitive behaviour therapy for major depression in older adults: a systematic review. PROSPERO 2012 CRD42012003151 Available from:  https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42012003151

Review question

How effective is CBT compared with other interventions, placebo or standard treatment in achieving relapse prevention and improving mental status for older adults with major depression?

The search strategy aims to find both published and unpublished studies and publications. The search will be limited to English language papers published from 2002 to 2012.

A three-step search strategy will be developed using MeSH terminology and keywords to ensure that all materials relevant to the review are captured.

An initial limited search of MEDLINE and CINAHL will be undertaken followed by an analysis of the text words contained in the title and abstract, and of the index terms used to describe the article. A second search using all identified keywords and index terms will then be undertaken.

Thirdly, the reference list of all identified reports and articles will be searched for additional studies.

The databases to be searched included:

  • Cochrane Central Register of Controlled Trials
  • Controlled Trials
  • Current Contents

The search for unpublished studies will include:

  • Digital Dissertations (Proquest)
  • Conference Proceedings

Experts in the field will be contacted for ongoing and unpublished trials. Experts will be identified through journal publications.

Types of study to be included

All randomised controlled trials (RCTs) assessing the effectiveness of CBT as a treatment for older adults with major depression when compared to standard care, specific medication, other therapies and no intervention will be considered.

In the absence of RCTs, other research designs such as quasi-experimental studies, case-controlled studies and cohort studies will be examined. However, descriptive studies and expert opinion will be excluded.

Condition or domain being studied

Major depression is diagnosed according to DSM IV or ICD 10 criteria.

Where trials fail to employ diagnostic criteria, the severity of depression will be described by the use of standardised rating scales, including the Hamilton Depression Rating Scale, Montgomery and Asberg Rating Scale and the Geriatric Depression Rating Scale.

The trials including participants with an explicit diagnosis of dementia or Parkinson’s disease and other mental illnesses will be excluded.

The review will include trials conducted in primary, secondary, community, nursing homes and in-patient settings.

Participants/population

The review will include trials in which patients are described as elderly, geriatric, or older adults, or in which all patients will be aged 55 or over (many North American trials of older adult populations use a cut-off of 55 years).

The review will include trials with subjects of either sex. Where possible, participants will be categorised as community or long term care residents.

Intervention(s), exposure(s)

The review will focus on interventions designed to assess the effects of CBT for older adults with major depression.

The label cognitive behavioural therapy has been applied to a variety of interventions and, accordingly, it is difficult to provide a single, unambiguous definition.

In order to be classified as CBT the intervention must clearly demonstrate the following components:

  • the intervention involves the recipient establishing links between their thoughts, feelings and actions with respect to the target symptom;
  • the intervention involves the correction of the person’s misperceptions, irrational beliefs and reasoning biases related to the target symptom.
  • – the recipient monitoring his or her own thoughts, feelings and behaviours with respect to the target symptom; and
  • – the promotion of alternative ways of coping with the target symptom.

In addition, all therapies that do not meet these criteria (or that provide insufficient information) but are labelled as ‘CBT’ or ‘Cognitive Therapy’ will be included as ‘less well defined’ CBT.

Comparator(s)/control

other interventions, placebo or standard treatment

Main outcome(s)

Primary outcomes

  • Depression level as assessed by Hamilton Depression Rating Scale, Montgomery or Asberg Rating Scale or the Geriatric Depression Rating Scale.
  • Relapse (as defined in the individual studies)
  • Death (sudden, unexpected death or suicide).
  • Psychological well being (as defined in the individual studies)

Measures of effect

The review will categorise outcomes into those measured in the shorter term (within 12 weeks of the onset of therapy), medium term (within 13 to 26 weeks of the onset of therapy) and longer term (over 26 weeks since the onset of therapy).

Additional outcome(s)

Secondary outcomes

  • Mental state
  • Quality of life
  • Social functioning
  • Hospital readmission
  • Unexpected or unwanted effect (adverse effects), such as anxiety, depression and dependence on the relationship with the therapist

Data extraction (selection and coding)

Data will be extracted from papers included in the review using JBI-MAStARI. In this stage, any relevant studies will be extracted in relation to their population, interventions, study methods and outcomes.

Where data are missing or unclear, authors will be contacted to obtain information.

Risk of bias (quality) assessment

All papers selected for retrieval will be assessed by two independent reviewers for methodological validity prior to inclusion in the review.

Since the review will evaluate the experimental studies only, The Joanna Briggs Institute Meta Analysis of Statistics Assessment and Review Instrument (JBI-MAStARI) will be used to evaluate each study’s methodological validity.

If there is a disagreement between the two reviewers, there will be a discussion with the third reviewer to solve the dissimilarity.

Strategy for data synthesis

Where possible quantitative research study results will be pooled in statistical meta-analysis using Review Manager Software from the Cochrane Collaboration.

Odds ratio (for categorical outcome data) or standardised mean differences (for continuous data) and their 95% confidence intervals will be calculated for each study.

Heterogeneity will be assessed using the standard Chi-square. Where statistical pooling is not possible the findings will be presented in narrative form.

Step 2: formulate a research question 

Developing a focused research question is crucial for a systematic review, as it underpins every stage of the review process.

The question defines the review’s nature and scope, guides the identification of relevant studies, and shapes the data extraction and synthesis processes.

It’s essential that the research question is answerable and clearly stated in the review protocol, ensuring that the review’s boundaries are well-defined.

A narrow question may limit the number of relevant studies and generalizability, while a broad question can make it challenging to reach specific conclusions.

PICO Framework

The PICO framework is a model for creating focused clinical research questions. The acronym PICO stands for:
  • P opulation/Patient/Problem: This element defines the specific group of people the research question pertains to.
  • I ntervention: This is the treatment, test, or exposure being considered for the population.
  • C omparison: This is the alternative intervention or control group against which the intervention is being compared.
  • O utcome: This element specifies the results or effects of the interventions being investigated

Using the PICO format when designing research helps to minimize bias because the questions and methods of the review are formulated before reviewing any literature.

The PICO elements are also helpful in defining the inclusion criteria used to select sources for the systematic review.

The PICO framework is commonly employed in systematic reviews that primarily analyze data from randomized controlled trials .

Not every element of PICO is required for every research question. For instance, it is not always necessary to have a comparison

Types of questions that can be answered using PICO:

“In patients with a recent acute stroke (less than 6 weeks) with reduced mobility ( P ), is any specific physiotherapy approach ( I ) more beneficial than no physiotherapy ( C ) at improving independence in activities of daily living and gait speed ( O )?
“For women who have experienced domestic violence ( P ), how effective are advocacy programmes ( I ) compared to other treatments ( C ) on improving the quality of life ( O )?”

Etiology/Harm

Are women with a history of pelvic inflammatory disease (PID) ( P ) at higher risk for gynecological cancers ( O ) than women with no history of PID ( C )?
Among asymptomatic adults at low risk of colon cancer ( P ), is fecal immunochemical testing (FIT) ( I ) as sensitive and specific for diagnosing colon cancer ( O ) as colonoscopy ( C )?
Among adults with pneumonia ( P ), do those with chronic kidney disease (CKD) ( I ) have a higher mortality rate ( O ) than those without CKD ( C )?

Alternative Frameworks

  • PICOCS : This framework, used in public health research, adds a “ C ontext” element to the PICO framework. This is useful for examining how the environment or setting in which an intervention is delivered might influence its effectiveness.
  • PICOC : This framework expands on PICO by incorporating “ C osts” as an element of the research question. It is particularly relevant to research questions involving economic evaluations of interventions.
  • ECLIPSE : E xpectations, C lient group, L ocation, I mpact, P rofessionals involved, S ervice, and E valuation. It is a mnemonic device designed to aid in searching for health policy and management information.
  • PEO : This acronym, standing for P atient, E xposure, and O utcome, is a variation of PICO used when the research question focuses on the relationship between exposure to a risk factor and a specific outcome.
  • PIRD : This acronym stands for P opulation, I ndex Test, R eference Test, and D iagnosis of Interest, guiding research questions that focus on evaluating the diagnostic accuracy of a particular test.
  • PFO : This acronym, representing P opulation, P rognostic F actors, and O utcome, is tailored for research questions that aim to investigate the relationship between specific prognostic factors and a particular health outcome.
  • SDMO : This framework, which stands for S tudies, D ata, M ethods, and O utcomes, assists in structuring research questions focused on methodological aspects of research, examining the impact of different research methods or designs on the quality of research findings.

Step 3: Search Strategy

PRISMA  (Preferred Reporting Items for Systematic reviews and Meta-Analyses) provide appropriate guidance for reporting quantitative literature searches.

Present the full search strategies for all databases, registers and websites, including any filters and limits used. PRISMA 2020 Checklist

A search strategy is a comprehensive and reproducible plan for identifying all relevant research studies that address a specific research question.

This systematic approach to searching helps minimize bias and distinguishes systematic reviews from other types of literature reviews.

It’s important to be transparent about the search strategy and document all decisions for auditability. The goal is to identify all potentially relevant studies for consideration.

Here’s a breakdown of a search strategy:

Search String Construction

It is recommended to consult topic experts on the review team and advisory board in order to create as complete a list of search terms as possible for each concept.

To retrieve the most relevant results, a search string is used. This string is made up of:

  • Keywords:  Search terms should be relevant to the subject areas of the research question and should be identified for all components of the research question (e.g., Population, Intervention, Comparator, and Outcomes – PICO). Using relevant keywords helps minimize irrelevant search returns. Sources such as dictionaries, textbooks, and published articles can help identify appropriate keywords.
  • Synonyms: These are words or phrases with similar meanings to the keywords, as authors may use different terms to describe the same concepts. Including synonyms helps cover variations in terminology and increases the chances of finding all relevant studies. For example, a drug intervention may be referred to by its generic name or by one of its several proprietary names.
  • Truncation symbols : These broaden the search by capturing variations of a keyword. They function by locating every word that begins with a specific root. For example, if a user was researching interventions for smoking, they might use a truncation symbol to search for “smok*” to retrieve records with the words “smoke,” “smoker,” “smoking,” or “smokes.” This can save time and effort by eliminating the need to input every variation of a word into a database.
  • Boolean operators: The use of Boolean operators (AND/OR/NEAR/NOT) helps to combine these terms effectively, ensuring that the search strategy is both sensitive and specific. For instance, using “AND” narrows the search to include only results containing both terms, while “OR” expands it to include results containing either term.

Information Sources

The primary goal is to find all published and unpublished studies that meet the predefined criteria of the research question. This includes considering various sources beyond typical databases

Information sources for systematic reviews can include a wide range of resources like scholarly databases, unpublished literature, conference papers, books, and even expert consultations.

Specify all databases, registers, websites, organisations, reference lists and other sources searched or consulted to identify studies. Specify the date when each source was last searched or consulted. PRISMA 2020 Checklist

An exhaustive, systematic search strategy is developed with the assistance of an expert librarian.

  • Electronic Databases:  Searches should include seven key databases: CINAHL, Medline, APA PsycArticles, Psychology and Behavioral Sciences Collection, APA PsycInfo, SocINDEX with Full Text, and Web of Science: Core Collections.
  • Grey Literature:  In addition to databases, forensic or ‘expansive’ searches can be conducted. This includes: conference proceedings, unpublished reports, theses, ongoing clinical trial databases, searches by names of authors of relevant publications. Independent research bodies may also be good sources of material, e.g. Centre for Research in Ethnic Relations , Joseph Rowntree Foundation , Carers UK .
  • Citation Searching : Reference lists often lead to highly cited and influential papers in the field, providing valuable context and background information for the review.
  • Handsearching: Manually searching through specific journals or conference proceedings page-by-page is another way to ensure all relevant studies are captured, particularly those not yet indexed in databases.
  • Contacting Experts: Reaching out to researchers or experts in the field can provide access to unpublished data or ongoing research not yet publicly available.

It is important to note that this may not be an exhaustive list of all potential databases.

A systematic computerized search was performed for publications that appeared between 1974 and 2018 in English language journals. Four databases were searched including PsychINFO, Embase, OVOID MEDLINE, and AMED. The databases were searched with combinations of search terms relating to attachment (“attachment” OR “working model” OR “safe haven” OR “secure base” OR “felt security”) AND romantic couples (“dyad” OR “couple” OR “spous” OR “partner” OR “romantic” OR “wife” OR “husband” OR “close relationship” OR “interpersonal” OR “intimate” OR “mari”) AND social support (“support prov” OR “caregiving” OR “support giv” OR “social support” OR “enacted support” OR “support received” OR “receiv* support” OR “prov support” OR “dyadic coping” OR “interpersonal coping” OR “collaborative coping” OR “help‐seeking” OR “emotional support” OR “tangible support” OR “instrumental support” OR “perceived support” OR “responsive” OR “buffer” OR “partner support” OR “Support avail*” OR “available support”). The reference lists of the retrieved studies were checked to find other relevant publications, which were not identified in the computerized database searches.

Inclusion Criteria

Specify the inclusion and exclusion criteria for the review. PRISMA 2020 Checklist

Before beginning the literature search, researchers should establish clear eligibility criteria for study inclusion.

Inclusion criteria are used to select studies for a systematic review and should be based on the study’s research method and PICO elements.

To maintain transparency and minimize bias, eligibility criteria for study inclusion should be established a priori. Ideally, researchers should aim to include only high-quality randomized controlled trials that adhere to the intention-to-treat principle.

The selection of studies should not be arbitrary, and the rationale behind inclusion and exclusion criteria should be clearly articulated in the research protocol.

When specifying the inclusion and exclusion criteria, consider the following aspects:

  • Intervention Characteristics: Researchers might decide that, in order to be included in the review, an intervention must have specific characteristics. They might require the intervention to last for a certain length of time, or they might determine that only interventions with a specific theoretical basis are appropriate for their review.
  • Population Characteristics: A systematic review might focus on the effects of an intervention for a specific population. For instance, researchers might choose to focus on studies that included only nurses or physicians.
  • Outcome Measures: Researchers might choose to include only studies that used outcome measures that met a specific standard.
  • Age of Participants: If a systematic review is examining the effects of a treatment or intervention for children, the authors of the review will likely choose to exclude any studies that did not include children in the target age range.
  • Diagnostic Status of Participants: Researchers conducting a systematic review of treatments for anxiety will likely exclude any studies where the participants were not diagnosed with an anxiety disorder.
  • Study Design: Researchers might determine that only studies that used a particular research design, such as a randomized controlled trial, will be included in the review.
  • Control Group: In a systematic review of an intervention, researchers might choose to include only studies that included certain types of control groups, such as a waiting list control or another type of intervention.
  • Publication status : Decide whether only published studies will be included or if unpublished works, such as dissertations or conference proceedings, will also be considered.
Studies that met the following criteria were included: (a) empirical studies of couples (of any gender) who are in a committed romantic relationship, whether married or not; (b) measurement of the association between adult attachment and support in the context of this relationship; (c) the article was a full report published in English; and (d) the articles were reports of empirical studies published in peer‐reviewed journals, dissertations, review papers, and conference presentations.

Iterative Process

The iterative nature of developing a search strategy for systematic reviews stems from the need to refine and adapt the search process based on the information encountered at each stage.

A single attempt rarely yields the perfect final strategy. Instead, it is an evolving process involving a series of test searches, analysis of results, and discussions among the review team.

Here’s how the iterative process unfolds:

  • Initial Strategy Formulation: Based on the research question, the team develops a preliminary search strategy, including identifying relevant keywords, synonyms, databases, and search limits.
  • Test Searches and Refinement: The initial search strategy is then tested on chosen databases. The results are reviewed for relevance, and the search strategy is refined accordingly. This might involve adding or modifying keywords, adjusting Boolean operators, or reconsidering the databases used.
  • Discussions and Iteration: The search results and proposed refinements are discussed within the review team. The team collaboratively decides on the best modifications to improve the search’s comprehensiveness and relevance.
  • Repeating the Cycle: This cycle of test searches, analysis, discussions, and refinements is repeated until the team is satisfied with the strategy’s ability to capture all relevant studies while minimizing irrelevant results.

The iterative nature of developing a search strategy is crucial for ensuring that the systematic review is comprehensive and unbiased.

By constantly refining the search strategy based on the results and feedback, researchers can be more confident that they have identified all relevant studies.

This iterative process ensures that the applied search strategy is sensitive enough to capture all relevant studies while maintaining a manageable scope.

Throughout this process, meticulous documentation of the search strategy, including any modifications, is crucial for transparency and future replication of the systematic review.

Step 4: Search the Literature

Conduct a systematic search of the literature using clearly defined search terms and databases.

Applying the search strategy involves entering the constructed search strings into the respective databases’ search interfaces. These search strings, crafted using Boolean operators, truncation symbols, wildcards, and database-specific syntax, aim to retrieve all potentially relevant studies addressing the research question.

The researcher, during this stage, interacts with the database’s features to refine the search and manage the retrieved results.

This might involve employing search filters provided by the database to focus on specific study designs, publication types, or other relevant parameters.

Applying the search strategy is not merely a mechanical process of inputting terms; it demands a thorough understanding of database functionalities and a discerning eye to adjust the search based on the nature of retrieved results.

Step 5: screening and selecting research articles

Once the search strategy is finalized, it is applied to the selected databases, yielding a set of search results.

These search results are then screened against pre-defined inclusion criteria to determine their eligibility for inclusion in the review.

The goal is to identify studies that are both relevant to the research question and of sufficient quality to contribute to a meaningful synthesis.

Studies meeting the inclusion criteria are usually saved into electronic databases, such as Endnote or Mendeley , and include title, authors, date and publication journal along with an abstract (if available).

Study Selection

Specify the methods used to decide whether a study met the inclusion criteria of the review, including how many reviewers screened each record and each report retrieved, whether they worked independently, and if applicable, details of automation tools used in the process. PRISMA 2020 Checklist

The selection process in a systematic review involves multiple reviewers to ensure rigor and reliability.

To minimize bias and enhance the reliability of the study selection process, it is recommended that at least two reviewers independently assess the eligibility of each study. This independent assessment helps reduce the impact of individual biases or errors in judgment.

  • Initial screening of titles and abstracts: After applying a strategy to search the literature, the next step involves screening the titles and abstracts of the identified articles against the predefined inclusion and exclusion criteria. During this initial screening, reviewers aim to identify potentially relevant studies while excluding those clearly outside the scope of the review. It is crucial to prioritize over-inclusion at this stage, meaning that reviewers should err on the side of keeping studies even if there is uncertainty about their relevance. This cautious approach helps minimize the risk of inadvertently excluding potentially valuable studies.
  • Retrieving and assessing full texts: For studies which a definitive decision cannot be made based on the title and abstract alone, reviewers need to obtain the full text of the articles for a comprehensive assessment against the predefined inclusion and exclusion criteria. This stage involves meticulously reviewing the full text of each potentially relevant study to determine its eligibility definitively.
  • Resolution of disagreements : In cases of disagreement between reviewers regarding a study’s eligibility, a predefined strategy involving consensus-building discussions or arbitration by a third reviewer should be in place to reach a final decision. This collaborative approach ensures a fair and impartial selection process, further strengthening the review’s reliability.
First, the search results from separate databases were combined, and any duplicates were removed. The lead author (S. M.) and a postgraduate researcher (F. N.) applied the described inclusion criteria in a standardized manner. First, both the titles and abstracts of the articles were evaluated for relevance. If, on the basis of the title and/or abstract, the study looked likely to meet inclusion criteria hard copies of the manuscripts were obtained. If there was doubt about the suitability of an article, then the manuscript was included in the next step. The remaining articles were obtained for full‐text review, and the method and results sections were read to examine whether the article fitted the inclusion criteria. If there was doubt about the suitability of the manuscripts during this phase, then this article was discussed with another author (C. H.). Finally, the reference lists of the eligible articles were checked for additional relevant articles not identified during the computerized search. For the selected articles (n = 43), the results regarding the relationship between attachment and support were included in this review (see Figure 1, for PRISMA flowchart).

PRISMA Flowchart

The PRISMA flowchart is a visual representation of the study selection process within a systematic review.

The flowchart illustrates the step-by-step process of screening, filtering, and selecting studies based on predefined inclusion and exclusion criteria.

The flowchart visually depicts the following stages:

  • Identification: The initial number of titles and abstracts identified through database searches.
  • Screening: The screening process, based on titles and abstracts.
  • Eligibility: Full-text copies of the remaining records are retrieved and assessed for eligibility.
  • Inclusion: Applying the predefined inclusion criteria resulted in the inclusion of publications that met all the criteria for the review.
  • Exclusion: The flowchart details the reasons for excluding the remaining records.

This systematic and transparent approach, as visualized in the PRISMA flowchart, ensures a robust and unbiased selection process, enhancing the reliability of the systematic review’s findings.

The flowchart serves as a visual record of the decisions made during the study selection process, allowing readers to assess the rigor and comprehensiveness of the review.

prisma flowchart

Step 6: Criticallay Appraising the Quality of Included Studies

Quality assessment provides a measure of the strength of the evidence presented in a review.

High-quality studies with rigorous methodologies contribute to a more robust and reliable evidence base, increasing confidence in the review’s conclusions.

Conversely, including low-quality studies with methodological weaknesses can undermine the review’s findings and potentially lead to inaccurate recommendations.

To judge the quality of studies included in a systematic review, standardized instruments, such as checklists and scales, are commonly used. These tools help to ensure a transparent and reproducible assessment process.

The choice of tool should be justified and aligned with the study design and the level of detail required. Using quality scores alone is discouraged; instead, individual aspects of methodological quality should be considered.

Here are some specific tools mentioned in the sources:

  • Jadad score
  • Cochrane Risk of Bias tool
  • Cochrane Effective Practice and Organisation of Care (EPOC) Group Risk of Bias Tool
  • Quality Assessment of Diagnostic Accuracy Studies (QUADAS)
  • Newcastle – Ottawa Quality Assessment Scale for case-control and cohort studies
  • EPHPP Assessment Tool
  • Critical Appraisal Skills Programme (CASP) Appraisal Checklist
  • Cochrane Public Health Group (CPHG)
The quality of the study was not an inclusion criterion; however, a study quality check was carried out. Two independent reviewers (S. M. and C. H.) rated studies that met the inclusion criteria to determine the strength of the evidence. The Effective Public Health Practice Project Quality Assessment Tool for Quantitative Studies was adapted to assess the methodological quality of each study (Thomas, Ciliska, Dobbins, & Micucci, 2004). The tool was adjusted to include domains relevant to the method of each study. For example, blinding was removed for nonexperimental studies. Following recommendations by Thomas et al. (2004) each domain was rated as either weak (3 points), moderate (2 points), or strong (1 point). The mean score across questions was used as an indicator of overall quality, and studies were assigned an overall quality rating of strong (1.00–1.50), moderate (1.51–2.50),

Evidence Tables

Aspects of the appraisal of studies included in the review should be recorded as evidence tables (NICE 2009): simple text tables where the design and scope of studies are summarised.

The reader of the review can use the evidence tables to check the details, and assess the credibility and generalisability of findings, of particular studies.

Critical appraisal of the quality of included studies may be combined with data extraction tables.

quality assessment table e1721414351960

Step 7: extracting data from studies

To effectively extract data from studies that meet your systematic review’s inclusion criteria, you should follow a structured process that ensures accuracy, consistency, and minimizes bias.

1. Develop a data extraction form:

  • Design a standardized form (paper or electronic) to guide the data extraction process : This form should be tailored to your specific review question and the types of studies included.
  • Pilot test the form : Test the form on a small sample of included studies (e.g., 3-5). Assess for clarity, completeness, and usability. Refine the form based on feedback and initial experiences.
  • Reliability : Ensure all team members understand how to use the form consistently.

2. Extract the data:

  • General Information: This includes basic bibliographic details (journal, title, author, volume, page numbers), study objective as stated by the authors, study design, and funding source.
  • Study Characteristics: Capture details about the study population (demographics, inclusion/exclusion criteria, recruitment procedures), interventions (description, delivery methods), and comparators (description if applicable).
  • Outcome Data: Record the results of the intervention and how they were measured, including specific statistics used. Clearly define all outcomes for which data are being extracted.
  • Risk of Bias Assessment: Document the methods used to assess the quality of the included studies and any potential sources of bias. This might involve using standardized checklists or scales.
  • Additional Information: Depending on your review, you may need to extract data on other variables like adverse effects, economic evaluations, or specific methodological details.

3. Dual independent review:

  • Ensure that at least two reviewers independently extract data from each study using the standardized form. Cross-check extracted data for accuracy to minimize bias and helps identify any discrepancies.
  • Have a predefined strategy for resolving disagreements: This might involve discussion, consensus, or arbitration by a third reviewer.
  • Record the reasons for excluding any studies during the data extraction phase. :This enhances the transparency and reproducibility of your review.
  • If necessary, contact study authors to obtain missing or clarify unclear information : This is particularly important for data critical to your review’s outcomes.
  • Clearly document your entire data extraction process, including any challenges encountered and decisions made. This enhances the transparency and rigor of your systematic review.

By following these steps, you can effectively extract data from studies that meet your inclusion criteria, forming a solid foundation for the analysis and synthesis phases of your systematic review.

Step 8: synthesize the extracted data

The key element of a systematic review is the synthesis: that is the process that brings together the findings from the set of included studies in order to draw conclusions based on the body of evidence.

Data synthesis in a systematic review involves collating, combining, and summarizing findings from the included studies.

This process aims to provide a reliable and comprehensive answer to the review question by considering the strength of the evidence, examining the consistency of observed effects, and investigating any inconsistencies.

The data synthesis will be presented in the results section of the systematic review.

  • Develop a clear text narrative that explains the key findings
  • Use a logical heading structure to guide readers through your results synthesis
  • Ensure your text narrative addresses the review’s research questions
  • Use tables to summarise findings (can be same table as data extraction)

Identifying patterns, trends, and differences across studies

Narrative synthesis uses a textual approach to analyze relationships within and between studies to provide an overall assessment of the evidence’s robustness. All systematic reviews should incorporate elements of narrative synthesis, such as tables and text.

Systematic Review Data Extraction Form Patient Outcomes e1721413775469

Remember, the goal of a narrative synthesis is to go beyond simply summarizing individual studies. You’re aiming to create a new understanding by integrating and interpreting the available evidence in a systematic and transparent way.

Organize your data:

  • Group studies by themes, interventions, or outcomes
  • Create summary tables to display key information across studies
  • Use visual aids like concept maps to show relationships between studies

Describe the studies:

  • Summarize the characteristics of included studies (e.g., designs, sample sizes, settings)
  • Highlight similarities and differences across studies
  • Discuss the overall quality of the evidence

Develop a preliminary synthesis:

  • Start by describing the results of individual studies
  • Group similar findings together
  • Identify overarching themes or trends

Explore relationships:

  • Look for patterns in the data
  • Identify factors that might explain differences in results across studies
  • Consider how study characteristics relate to outcomes

Address contradictions:

  • Consider differences in study populations, interventions, or contexts
  • Look at methodological differences that might explain discrepancies
  • Consider the implications of inconsistent results
  • Don’t ignore conflicting findings
  • Discuss possible reasons for contradictions

Avoid vote counting:

  • Don’t simply tally positive versus negative results
  • Instead, consider the strength and quality of evidence for each finding

Assess the robustness of the synthesis:

  • Reflect on the strength of evidence for each finding
  • Consider how gaps or limitations in the primary studies affect your conclusions
  • Discuss any potential biases in the synthesis process

Step 9: discussion section and conclusion

Summarize key findings:.

  • Summarize key findings in relation to your research questions
  • Highlight main themes or patterns across studies
  • Explain the nuances and complexities in the evidence
  • Discuss the overall strength and consistency of the evidence
  • This provides a clear takeaway message for readers

Consider study quality and context:

  • Assess whether higher quality studies tend to show different results
  • Examine if findings differ based on study setting or participant characteristics
  • This helps readers weigh the relative importance of conflicting findings

Discuss implications:

  • For practice: How might professionals apply these findings?
  • For policy: What policy changes might be supported by the evidence?
  • Consider both positive and negative implications
  • This helps translate your findings into real-world applications

Identify gaps and future research:

  • Point out areas where evidence is lacking or inconsistent
  • Suggest specific research questions or study designs to address these gaps
  • This helps guide future research efforts in the field

State strengths and limitations:

  • Discuss the strengths of your review (e.g., comprehensive search, rigorous methodology)
  • Acknowledge limitations (e.g., language restrictions, potential for publication bias)
  • This balanced approach demonstrates critical thinking and helps readers interpret your findings

Minimizing Bias

To reduce bias in a systematic review, it is crucial to establish a systematic and transparent review process that minimizes bias at every stage. Sources provide insights into strategies and methods to achieve this goal.

  • Protocol development and publication: Developing a comprehensive protocol before starting the review is essential. Publishing the protocol in repositories like PROSPERO or Cochrane Library promotes transparency and helps avoid deviations from the planned approach, thereby minimizing the risk of bias.
  • Transparent reporting: Adhering to reporting guidelines, such as PRISMA, ensures that all essential aspects of the review are adequately documented, increasing the reader’s confidence in the transparency and completeness of systematic review reporting.
  • Dual independent review: Employing two or more reviewers independently at multiple stages of the review process (study selection, data extraction, quality assessment) minimizes bias. Any disagreements between reviewers should be resolved through discussion or by consulting a third reviewer. This approach reduces the impact of individual reviewers’ subjective interpretations or errors.
  • Rigorous quality assessment: Assessing the methodological quality of included studies is crucial for minimizing bias in the review findings. Using standardized critical appraisal tools and checklists helps identify potential biases within individual studies, such as selection bias, performance bias, attrition bias, and detection bias.
  • Searching beyond published literature: Explore sources of “grey literature” such as conference proceedings, unpublished reports, theses, and ongoing clinical trial databases.
  • Contacting experts in the field : Researchers can reach out to authors and investigators to inquire about unpublished or ongoing studies
  • Considering language bias : Expanding the search to include studies published in languages other than English can help reduce language bias, although this may increase the complexity and cost of the review.

Reading List

  • Galante, J., Galante, I., Bekkers, M. J., & Gallacher, J. (2014). Effect of kindness-based meditation on health and well-being: a systematic review and meta-analysis .  Journal of consulting and clinical psychology ,  82 (6), 1101.
  • Schneider, M., & Preckel, F. (2017). Variables associated with achievement in higher education: A systematic review of meta-analyses .  Psychological bulletin ,  143 (6), 565.
  • Murray, J., Farrington, D. P., & Sekol, I. (2012). Children’s antisocial behavior, mental health, drug use, and educational performance after parental incarceration: a systematic review and meta-analysis .  Psychological bulletin ,  138 (2), 175.
  • Roberts, B. W., Luo, J., Briley, D. A., Chow, P. I., Su, R., & Hill, P. L. (2017). A systematic review of personality trait change through intervention .  Psychological bulletin ,  143 (2), 117.
  • Chu, C., Buchman-Schmitt, J. M., Stanley, I. H., Hom, M. A., Tucker, R. P., Hagan, C. R., … & Joiner Jr, T. E. (2017). The interpersonal theory of suicide: A systematic review and meta-analysis of a decade of cross-national research.   Psychological bulletin ,  143 (12), 1313.
  • McLeod, S., Berry, K., Hodgson, C., & Wearden, A. (2020). Attachment and social support in romantic dyads: A systematic review .  Journal of clinical psychology ,  76 (1), 59-101.

Print Friendly, PDF & Email

RMIT University

Teaching and Research guides

Systematic reviews.

  • Starting the review
  • About systematic reviews

Develop your research question

Types of questions, pico framework, spice, spider and eclipse.

  • Plan your search
  • Sources to search
  • Search example
  • Screen and analyse
  • Further help

A systematic review is an in-depth attempt to answer a specific, focused question in a methodical way.

Start with a clearly defined, researchable  question , that should accurately and succinctly sum up the review's line of inquiry.

A well formulated review question will help determine your inclusion and exclusion criteria, the creation of your search strategy, the collection of data and the presentation of your findings.

It is important to ensure the question:

  • relates to what you really need to know about your topic
  • is answerable, specific and focused
  • should strike a suitable balance between being too broad or too narrow in scope
  • has been formulated with care so as to avoid missing relevant studies or collecting a potentially biased result set

Is the research question justified?

  • Are healthcare providers, consumers, researchers, and policy makers requiring this evidence for their healthcare decisions?
  • Is there a gap in the current literature? The question should be worthy of an answer.
  • Has a similar review been done before?

Question types

To help in focusing the question and determining the most appropriate type of evidence consider the type of question. Is there is a study design (eg. Randomized Controlled Trials, Meta-Analysis) that would provide the best answer.

Is your research question to focus on:

  • Diagnosis : How to select and interpret diagnostic tests
  • Intervention/Therapy : How to select treatments to offer patients that do more good than harm and that are worth the efforts and costs of using them
  • Prediction/Prognosis : How to estimate the patient’s likely clinical course over time and anticipate likely complications of disease
  • Exploration/Etiology : How to identify causes for disease, including genetics

If appropriate, use a  framework  to help in the development of your research question. A framework will assist in identifying the important concepts in your question.

A good question will combine several concepts. Identifying the relevant concepts is crucial to successful development and execution of your systematic search. Your research question should provide you with a checklist for the main concepts to be included in your search strategy.

Using a framework to aid in the development of a research question can be useful. The more you understand your question the more likely you are to obtain relevant results for your review. There are a number of different frameworks available.

A technique often used in research for formulating a  clinical research question  is the PICO   model. PICO is explored in more detail in this guide. Slightly different versions of this concept are used to search for quantitative and qualitative reviews.

For quantitative reviews-

PICO  = Population, Intervention, Comparison, Outcome

Population, Patient or Problem Intervention or Indicator Comparison or Control Outcome

Who or what is the question about?

What is the problem you are looking at?

Is there a specific population you need to focus on?

Describe the most important characteristics of the patient, population or problem.

What treatment or changes are you looking to explore?

What do you want to do with this patient?

What factor may influence the prognosis of the patient?

Is there a comparison treatment to be considered?

The comparison may be with another medication, another form of treatment, or no treatment at all.

Your clinical question does not have to always have a specific comparison. Use if you are comparing multiple interventions. Use a if you are comparing an intervention to no intervention.

What are you trying to accomplish, measure, improve or affect?

What are you trying to do for the patient? Relieve or eliminate the symptoms? Reduce the number of adverse events? Improve function or test scores?

What results will you consider to determine if, or how well, the intervention is working?

For qualitative reviews-

= Population or Problem, Interest, Context
Population or Problem Interest Context

What are the characteristics of the Population or the Patient?

What is the Problem, condition or disease you are interested in?

Interest relates to a defined event, activity, experience or process

Context is the setting or distinct characteristics

For qualitative evidence-

= Setting, Perspective, Intervention or Exposure or Interest, Comparison, Evaluation

Setting Perspective Intervention, Exposure or Interest Comparison Evaluation

Setting is the context for the question - 

Perspective is the users, potential users, or stakeholders of the service - 

Intervention is the action taken for the users, potential users, or stakeholders - 

Comparison is the alternative actions or outcomes - 

Evaluation is the result or measurement that will determine the success of the intervention - 

  • Booth, A. (2006). Clear and present questions: Formulating questions for evidence based practice. Library hi tech, 24(3), 355-368.

= Sample, Phenomenon of Interest, Design, Evaluation, Research Type

Sample Phenomenon of Interest Design Evaluation Research Type

Sample size may very if qualitative and quantitative studies

Phenomena of Interest include behaviours, experiences and interventions

Design influences the strength of the study analysis and finding

Evaluation outcomes may include more subjective outcomes - such as views, attitudes, etc.

Research types include qualitative, quantitative or mixed method studies

  • Cooke, A., Smith, D., & Booth, A. (2012). Beyond PICO: The SPIDER tool for qualitative evidence synthesis. Qualitative Health Research, 22(10), 1435-1443.

= Expectation, Client, Location, Impact, Professionals, Service

Expectation Client Location Impact ProfessionalsType Service

Improvement or information or innovation

At whom the service is aimed

Where is the service located?

Outcomes

Who is involved in providing/improving the service?

For which service are you looking for information?

  • Wildridge, V., & Bell, L. (2002). How CLIP became ECLIPSE: A mnemonic to assist in searching for health policy/management information. Health Information & Libraries Journal, 19(2), 113-115.
  • << Previous: About systematic reviews
  • Next: Protocol >>

Creative Commons license: CC-BY-NC.

  • Last Updated: Jul 17, 2024 10:40 AM
  • URL: https://rmit.libguides.com/systematicreviews

Covidence website will be inaccessible as we upgrading our platform on Monday 23rd August at 10am AEST, / 2am CEST/1am BST (Sunday, 15th August 8pm EDT/5pm PDT) 

How to formulate the review question using PICO. 5 steps to get you started.

Home | Blog | How To | How to formulate the review question using PICO. 5 steps to get you started.

Covidence covers five key steps to formulate your review question using PICO

You’ve decided to go ahead. You have identified a gap in the evidence and you know that conducting a systematic review, with its explicit methods and replicable search, is the best way to fill it – great choice 🙌. 

The review will produce useful information to enable informed decision-making and to improve patient care. Your review team’s first job is to capture exactly what you need to know in a well-formulated review question.

At this stage there is a lot to plan. You might be recruiting people to your review team, thinking about the time-frame for completion and considering what software to use. It’s tempting to get straight on to the search for studies 🏃. 

Take it slowly: it’s vital to get the review question right. A clear and precise question will ensure that you gather the appropriate data to answer your question. Time invested up-front to consider every aspect of the question will pay off once the review is underway. The review question will shape all the subsequent stages in the review, particularly setting the criteria for including and excluding studies, the search strategy, and the way you choose to present the results. So it’s worth taking the time to get this right!

Let’s take a look at five key steps in formulating the question for a standard systematic review of interventions. It’s a process that requires careful thought from a range of stakeholders and meticulous planning. But what if, once you have started the review, you find that you need to tweak the question anyway? Don’t worry, we’ll cover that too ✅.

📌 Consider the audience of the review

Who will use this review? What do they want to know? How do they measure effectiveness? Good review teams partner with the people who will use the evidence and make sure that their research plan (or protocol) asks a question that is relevant and important for patients.

📌 Think about what you already know

How much do you need to know about the topic area at this stage? Ideally, enough to come up with a relevant, useful question but not so much that your knowledge influences the way in which you phrase it. Why? Because setting a review question when you are already familiar with the data can introduce bias by allowing you to direct the question in favour of achieving a particular result. In practice, the review team is very likely to have some knowledge of relevant studies and some preconceived ideas about how the treatments work. That’s fine – and it’s useful – but it’s also good practice to recognise the influence this knowledge and these ideas might have on the choice of question. Issues of bias will come up again as we work through the rest of these steps.

If not enough is known about the subject area to ask a useful question, you might undertake a scoping review . This is a separate exercise from a systematic review and is sometimes used by researchers to map the literature and highlight gaps in the evidence before they start work on a systematic review. 

📌 Use a framework

Faced with a heady mixture of concepts, ideas, aims and outcomes, researchers in every field have come up with question frameworks (and some great backronyms ) to help them. Question frameworks impose order on a complex thought process by breaking down a question into its component parts. A commonly used framework in clinical medicine is PICO:

👦 P opulation (or patients) refers to the characteristics of the people that you want to study. For example, the review might look at children with nocturnal enuresis.

💊 I ntervention is the treatment you are investigating. For example, the review might look at the effectiveness of enuresis alarms.

💊 C omparison, if you decide to use one, is the treatment you want to compare the intervention with. For example, the review might look at the effectiveness of enuresis alarms versus the effectiveness of drug therapy. 

📏 O utcomes are the measures used to assess the effectiveness of the treatment. It’s particularly important to select outcomes that matter to the end users of the review. In this example, a useful outcome might be bedwetting. (Helpfully, some clinical areas use standardised sets of outcomes in their clinical trials to facilitate the comparison of data between studies 👏.) 

how to formulate a research question systematic review

But back to bedwetting. In our example, a PICO review question would look something like this:

“In children with nocturnal enuresis (population), how effective are alarms (intervention) versus drug treatments (comparison) for the prevention of bedwetting (outcome)?”

PICO is suitable for reviews of interventions. If you plan to review prognostic or qualitative data, or diagnostic test accuracy, PICO is unlikely to be a suitable framework for your question. In Covidence you can save your PICO for easy reference throughout the screening, extraction and quality assessment phases of your review.

how to formulate a research question systematic review

📌 Set the scope

The scope of a review question requires careful thought. To answer the example PICO question above, the review would compare one treatment (alarms) with another (drug therapy). A broader question might consider all the available treatments for nocturnal enuresis in children. The broad scope of this question would still allow the review team to drill down and separate the data into groups of specific treatments later in the review process. And to minimise bias, the intended grouping of data would be pre-specified and justified in the protocol or research plan.

Broader systematic reviews are great because they summarise all the evidence on a given topic in one place. A potential disadvantage is that they can produce a large volume of data that is difficult to manage. 

If the size of the review has started to escalate beyond your comfort zone, you might consider narrowing the scope. This can make the size of the review more manageable, both for the review team and for the reader. But it’s worth examining the motivations for narrowing the scope more closely. Suppose we wanted to define a smaller population in the example PICO question. Is there a good reason (other than to reduce the review team’s workload) to restrict the population to boys with nocturnal enuresis? Or to children under 10 years old? On the basis of what is already known, could the treatment effect be expected to differ by sex or age of the study participants? 🤔 Be prepared to explain your choices and to demonstrate that they are legitimate. 

Some reviews with a narrow scope retrieve only a small number of studies. If this happens, there is a risk that the data collected from these studies might not be enough to produce a useful synthesis or to guide decision-making. It can be frustrating for review teams who have spent time defining the question, planning the methods, and conducting an extensive search to find that their question is unanswerable. This is another reason why it is useful for the review team to have prior knowledge of the subject area and some familiarity with the existing evidence. The Cochrane Handbook contains some useful contingencies for dealing with sparse data .

Covidence can help review teams to save time whatever the scope and size of the review. In Covidence, data can be grouped to the review team’s exact specification for seamless export into data analysis software. The intuitive workflow makes collaboration simple so if one reviewer spots a problem, they can alert the rest of the team quickly and easily.

ata extraction in covidence

📌 Adjust if necessary

Systematic reviews follow explicit, pre-specified methods. So it’s no surprise to learn that the review question needs to be considered carefully and explained in detail before the review gets underway. But what about the unknown unknowns – those issues that the review teams will have to deal with later in the process but that they cannot foresee at the outset, no matter how much time they spend on due diligence? 

Clearly, reviews need the agility to control for issues that the project plan did not anticipate – strict adherence to the pre-specified process when a good reason to deviate has come to light would carry its own risks for the quality of the review. So if an initial scan of, for example, the search results indicates that it would be sensible to modify the question, this can be done. The research plan might make explicit the process for dealing with these types of changes. It might also contain plans for sensitivity analysis , to examine whether these choices have any effect on the findings of the review. As mentioned above with regard to scope, it might be difficult to defend a data-driven change to the question. And as before, the issue is the risk of bias and the danger of producing a spurious result.

how to formulate a research question systematic review

(Figure 4. Image from Eshun‐Wilson  I, Siegfried  N, Akena  DH, Stein  DJ, Obuku  EA, Joska  JA. Antidepressants for depression in adults with HIV infection. Cochrane Database of Systematic Reviews 2018, Issue 1. Art. No.: CD008525. DOI: 10.1002/14651858.CD008525.pub3. Accessed 27 May 2021.)

This blog post is part of the Covidence series on how to write a systematic review. 

Sign up for a free trial of Covidence today!

Picture of Laura Mellor. Portsmouth, UK

Laura Mellor. Portsmouth, UK

Perhaps you'd also like....

how to formulate a research question systematic review

Top 5 Tips for High-Quality Systematic Review Data Extraction

Data extraction can be a complex step in the systematic review process. Here are 5 top tips from our experts to help prepare and achieve high quality data extraction.

how to formulate a research question systematic review

How to get through study quality assessment Systematic Review

Find out 5 tops tips to conducting quality assessment and why it’s an important step in the systematic review process.

how to formulate a research question systematic review

How to extract study data for your systematic review

Learn the basic process and some tips to build data extraction forms for your systematic review with Covidence.

Better systematic review management

Head office, working for an institution or organisation.

Find out why over 350 of the world’s leading institutions are seeing a surge in publications since using Covidence!

Request a consultation with one of our team members and start empowering your researchers: 

By using our site you consent to our use of cookies to measure and improve our site’s performance. Please see our Privacy Policy for more information. 

Jump to navigation

Home

Cochrane Training

Chapter 2: determining the scope of the review and the questions it will address.

James Thomas, Dylan Kneale, Joanne E McKenzie, Sue E Brennan, Soumyadeep Bhaumik

Key Points:

  • Systematic reviews should address answerable questions and fill important gaps in knowledge.
  • Developing good review questions takes time, expertise and engagement with intended users of the review.
  • Cochrane Reviews can focus on broad questions, or be more narrowly defined. There are advantages and disadvantages of each.
  • Logic models are a way of documenting how interventions, particularly complex interventions, are intended to ‘work’, and can be used to refine review questions and the broader scope of the review.
  • Using priority-setting exercises, involving relevant stakeholders, and ensuring that the review takes account of issues relating to equity can be strategies for ensuring that the scope and focus of reviews address the right questions.

Cite this chapter as: Thomas J, Kneale D, McKenzie JE, Brennan SE, Bhaumik S. Chapter 2: Determining the scope of the review and the questions it will address. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

2.1 Rationale for well-formulated questions

As with any research, the first and most important decision in preparing a systematic review is to determine its focus. This is best done by clearly framing the questions the review seeks to answer. The focus of any Cochrane Review should be on questions that are important to people making decisions about health or health care. These decisions will usually need to take into account both the benefits and harms of interventions (see MECIR Box 2.1.a ). Good review questions often take time to develop, requiring engagement with not only the subject area, but with a wide group of stakeholders (Section 2.4.2 ).

Well-formulated questions will guide many aspects of the review process, including determining eligibility criteria, searching for studies, collecting data from included studies, structuring the syntheses and presenting findings (Cooper 1984, Hedges 1994, Oliver et al 2017) . In Cochrane Reviews, questions are stated broadly as review ‘Objectives’, and operationalized in terms of the studies that will be eligible to answer those questions as ‘Criteria for considering studies for this review’. As well as focusing review conduct, the contents of these sections are used by readers in their initial assessments of whether the review is likely to be directly relevant to the issues they face.

The FINER criteria have been proposed as encapsulating the issues that should be addressed when developing research questions. These state that questions should be F easible, I nteresting, N ovel, E thical, and R elevant (Cummings et al 2007). All of these criteria raise important issues for consideration at the outset of a review and should be borne in mind when questions are formulated.

A feasible review is one that asks a question that the author team is capable of addressing using the evidence available. Issues concerning the breadth of a review are discussed in Section 2.3.1 , but in terms of feasibility it is important not to ask a question that will result in retrieving unmanageable quantities of information; up-front scoping work will help authors to define sensible boundaries for their reviews. Likewise, while it can be useful to identify gaps in the evidence base, review authors and stakeholders should be aware of the possibility of asking a question that may not be answerable using the existing evidence (i.e. that will result in an ‘empty’ review, see also Section 2.5.3 ).

Embarking on a review that authors are interested in is important because reviews are a significant undertaking and review authors need sufficient commitment to see the work through to its conclusion.

A novel review will address a genuine gap in knowledge, so review authors should be aware of any related or overlapping reviews. This reduces duplication of effort, and also ensures that authors understand the wider research context to which their review will contribute. Authors should check for pre-existing syntheses in the published research literature and also for ongoing reviews in the PROSPERO register of systematic reviews before beginning their own review.

Given the opportunity cost involved in undertaking an activity as demanding as a systematic review, authors should ensure that their work is relevant by: (i) involving relevant stakeholders in defining its focus and the questions it will address; and (ii) writing up the review in such a way as to facilitate the translation of its findings to inform decisions. The GRADE framework aims to achieve this, and should be considered throughout the review process, not only when it is being written up (see Chapter 14 and Chapter 15 ).

Consideration of opportunity costs is also relevant in terms of the ethics of conducting a review, though ethical issues should also be considered primarily in terms of the questions that are prioritized for answering and the way that they are framed. Research questions are often not value-neutral, and the way that a given problem is approached can have political implications which can result in, for example, the widening of health inequalities (whether intentional or not). These issues are explored in Section 2.4.3 and Chapter 16 .

MECIR Box 2.1.a Relevant expectations for conduct of intervention reviews

Formulating review questions ( )

Cochrane Reviews are intended to support clinical practice and policy, not just scientific curiosity. The needs of consumers play a central role in Cochrane Reviews and they can play an important role in defining the review question. Qualitative research, i.e. studies that explore the experience of those involved in providing and receiving interventions, and studies evaluating factors that shape the implementation of interventions, might be used in the same way.

Considering potential adverse effects ( )

It is important that adverse effects are addressed in order to avoid one-sided summaries of the evidence. At a minimum, the review will need to highlight the extent to which potential adverse effects have been evaluated in any included studies. Sometimes data on adverse effects are best obtained from non-randomized studies, or qualitative research studies. This does not mean however that all reviews must include non-randomized studies.

2.2 Aims of reviews of interventions

Systematic reviews can address any question that can be answered by a primary research study. This Handbook focuses on a subset of all possible review questions: the impact of intervention(s) implemented within a specified human population. Even within these limits, systematic reviews examining the effects of intervention(s) can vary quite markedly in their aims. Some will focus specifically on evidence of an effect of an intervention compared with a specific alternative, whereas others may examine a range of different interventions. Reviews that examine multiple interventions and aim to identify which might be the most effective can be broader and more challenging than those looking at single interventions. These can also be the most useful for end users, where decision making involves selecting from a number of intervention options. The incorporation of network meta-analysis as a core method in this edition of the Handbook (see Chapter 11 ) reflects the growing importance of these types of reviews.

As well as looking at the balance of benefit and harm that can be attributed to a given intervention, reviews within the ambit of this Handbook might also aim to investigate the relationship between the size of an intervention effect and other characteristics, such as aspects of the population, the intervention itself, how the outcome is measured, or the methodology of the primary research studies included. Such approaches might be used to investigate which components of multi-component interventions are more or less important or essential (and when). While it is not always necessary to know how an intervention achieves its effect for it to be useful, many reviews will aim to articulate an intervention’s mechanisms of action (see Section 2.5.1 ), either by making this an explicit aim of the review itself (see Chapter 17 and Chapter 21 ), or when describing the scope of the review. Understanding how an intervention works (or is intended to work) can be an important aid to decision makers in assessing the applicability of the review to their situation. These investigations can be assisted by the incorporation of results from process evaluations conducted alongside trials (see Chapter 21 ). Further, many decisions in policy and practice are at least partially constrained by the resource available, so review authors often need to consider the economic context of interventions (see Chapter 20 ).

2.3 Defining the scope of a review question

Studies comparing healthcare interventions, notably randomized trials, use the outcomes of participants to compare the effects of different interventions. Statistical syntheses (e.g. meta-analysis) focus on comparisons of interventions, such as a new intervention versus a control intervention (which may represent conditions of usual practice or care), or the comparison of two competing interventions. Throughout the Handbook we use the terminology experimental intervention versus comparator intervention. This implies a need to identify one of the interventions as experimental, and is used only for convenience since all methods apply to both controlled and head-to-head comparisons. The contrast between the outcomes of two groups treated differently is known as the ‘effect’, the ‘treatment effect’ or the ‘intervention effect’; we generally use the last of these throughout the Handbook .

A statement of the review’s objectives should begin with a precise statement of the primary objective, ideally in a single sentence ( MECIR Box 2.3.a ). Where possible the style should be of the form ‘To assess the effects of [ intervention or comparison ] for [ health problem ] in [ types of people, disease or problem and setting if specified ]’. This might be followed by one or more secondary objectives, for example relating to different participant groups, different comparisons of interventions or different outcome measures. The detailed specification of the review question(s) requires consideration of several key components (Richardson et al 1995, Counsell 1997) which can often be encapsulated by the ‘PICO’ mnemonic, an acronym for P opulation, I ntervention, C omparison(s) and O utcome. Equal emphasis in addressing, and equal precision in defining, each PICO component is not necessary. For example, a review might concentrate on competing interventions for a particular stage of breast cancer, with stage and severity of the disease being defined very precisely; or alternately focus on a particular drug for any stage of breast cancer, with the treatment formulation being defined very precisely.

Throughout the Handbook we make a distinction between three different stages in the review at which the PICO construct might be used. This division is helpful for understanding the decisions that need to be made:

  • The review PICO (planned at the protocol stage) is the PICO on which eligibility of studies is based (what will be included and what excluded from the review).
  • The PICO for each synthesis (also planned at the protocol stage) defines the question that each specific synthesis aims to answer, determining how the synthesis will be structured, specifying planned comparisons (including intervention and comparator groups, any grouping of outcome and population subgroups).
  • The PICO of the included studies (determined at the review stage) is what was actually investigated in the included studies.

Reaching the point where it is possible to articulate the review’s objectives in the above form – the review PICO – requires time and detailed discussion between potential authors and users of the review. It is important that those involved in developing the review’s scope and questions have a good knowledge of the practical issues that the review will address as well as the research field to be synthesized. Developing the questions is a critical part of the research process. As such, there are methodological issues to bear in mind, including: how to determine which questions are most important to answer; how to engage stakeholders in question formulation; how to account for changes in focus as the review progresses; and considerations about how broad (or narrow) a review should be.

MECIR Box 2.3 . a Relevant expectations for conduct of intervention reviews

Predefining objectives ( )

Objectives give the review focus and must be clear before appropriate eligibility criteria can be developed. If the review will address multiple interventions, clarity is required on how these will be addressed (e.g. summarized separately, combined or explicitly compared).

2.3.1 Broad versus narrow reviews

The questions addressed by a review may be broad or narrow in scope. For example, a review might address a broad question regarding whether antiplatelet agents in general are effective in preventing all thrombotic events in humans. Alternatively, a review might address whether a particular antiplatelet agent, such as aspirin, is effective in decreasing the risks of a particular thrombotic event, stroke, in elderly persons with a previous history of stroke. Increasingly, reviews are becoming broader, aiming, for example, to identify which intervention – out of a range of treatment options – is most effective, or to investigate how an intervention varies depending on implementation and participant characteristics.

Overviews of reviews (see  Chapter V ), in which multiple reviews are summarized, can be one way of addressing the need for breadth when synthesizing the evidence base, since they can summarize multiple reviews of different interventions for the same condition, or multiple reviews of the same intervention for different types of participants. It may be considered desirable to plan a series of reviews with a relatively narrow scope, alongside an Overview to summarize their findings. Alternatively, it may be more useful – particularly given the growth in support for network meta-analysis – to combine comparisons of different treatment options within the same review (see Chapter 11 ). When deciding whether or not an overview might be the most appropriate approach, review authors should take account of the breadth of the question being asked and the resources available. Some questions are simply too broad for a review of all relevant primary research to be practicable, and if a field has sufficient high-quality reviews, then the production of another review of primary research that duplicates the others might not be a sensible use of resources.

Some of the advantages and disadvantages of broad and narrow reviews are summarized in Table 2.3.a . While having a broad scope in terms of the range of participants has the potential to increase generalizability, the extent to which findings are ultimately applicable to broader (or different) populations will depend on the participants who have actually been recruited into research studies. Likewise, heterogeneity can be a disadvantage when the expectation is for homogeneity of effects between studies, but an advantage when the review question seeks to understand differential effects (see Chapter 10 ).A distinction should be drawn between the scope of a review and the precise questions within, since it is possible to have a broad review that addresses quite narrow questions. In the antiplatelet agents for preventing thrombotic events example, a systematic review with a broad scope might include all available treatments. Rather than combining all the studies into one comparison though, specific treatments would be compared with one another in separate comparisons, thus breaking a heterogeneous set of treatments into narrower, more homogenous groups. This relates to the three levels of PICO, outlined in Section 2.3 . The review PICO defines the broad scope of the review, and the PICO for comparison defines the specific treatments that will be compared with one another; Chapter 3 elaborates on the use of PICOs.

In practice, a Cochrane Review may start (or have started) with a broad scope, and be divided up into narrower reviews as evidence accumulates and the original review becomes unwieldy. This may be done for practical and logistical reasons, for example to make updating easier as well as to make it easier for readers to see which parts of the evidence base are changing. Individual review authors must decide if there are instances where splitting a broader focused review into a series of more narrowly focused reviews is appropriate and implement appropriate methods to achieve this. If a major change is to be undertaken, such as splitting a broad review into a series of more narrowly focused reviews, a new protocol must be written for each of the component reviews that documents the eligibility criteria for each one.

Ultimately, the selected breadth of a review depends upon multiple factors including perspectives regarding a question’s relevance and potential impact; supporting theoretical, biologic and epidemiological information; the potential generalizability and validity of answers to the questions; and available resources. As outlined in Section 2.4.2 , authors should consider carefully the needs of users of the review and the context(s) in which they expect the review to be used when determining the most optimal scope for their review.

Table 2.3.a Some advantages and disadvantages of broad versus narrow reviews

 

e.g. corticosteroid injection for shoulder tendonitis (narrow) or corticosteroid injection for any tendonitis (broad)

:

Comprehensive summary of the evidence.

Opportunity to explore consistency of findings (and therefore generalizability) across different types of participants.

Manageability for review team.

Ease of reading.

 

:

Searching, data collection, analysis and writing may require more resources.

Interpretation may be difficult for readers if the review is large and lacks a clear rationale (such as examining consistency of findings) for including diverse types of participants.

Evidence may be sparse.

Unable to explore whether an intervention operates differently in other settings or populations (e.g. inability to explore differential effects that could lead to inequity).

Increased burden for decision makers if multiple reviews must be accessed (e.g. if evidence is sparse for the population of interest).

Scope could be chosen by review authors to produce a desired result.

e.g. supervised running for depression (narrow) or any exercise for depression (broad)

:

Comprehensive summary of the evidence.

Opportunity to explore consistency of findings across different implementations of the intervention.

:

Manageability for review team.

Ease of reading.

 

:

Searching, data collection, analysis and writing may require more resources.

Interpretation may be difficult for readers if the review is large and lacks a clear rationale (such as examining consistency of findings) for including different modes of an intervention.

:

Evidence may be sparse.

Unable to explore whether different modes of an intervention modify the intervention effects.

Increased burden for decision makers if multiple reviews must be accessed (e.g. if evidence is sparse for a specific mode).

Scope could be chosen by review authors to produce a desired result.

e.g. oxybutynin compared with desmopressin for preventing bed-wetting (narrow) or interventions for preventing bed-wetting (broad)

:

Comprehensive summary of the evidence.

Opportunity to compare the effectiveness of a range of different intervention options.

:

Manageability for review team.

Relative simplicity of objectives and ease of reading.

 

:

Searching, data collection, analysis and writing may require more resources.

May be unwieldy, and more appropriate to present as an Overview of reviews (see ).

:

Increased burden for decision makers if not included in an Overview since multiple reviews may need to be accessed.

2.3.2 ‘Lumping’ versus ‘splitting’

It is important not to confuse the issue of the breadth of the review (determined by the review PICO) with concerns about between-study heterogeneity and the legitimacy of combining results from diverse studies in the same analysis (determined by the PICOs for comparison).

Broad reviews have been criticized as ‘mixing apples and oranges’, and one of the inventors of meta-analysis, Gene Glass, has responded “Of course it mixes apples and oranges… comparing apples and oranges is the only endeavour worthy of true scientists; comparing apples to apples is trivial” (Glass 2015). In fact, the two concepts (‘broad reviews’ and ‘mixing apples and oranges’) are different issues. Glass argues that broad reviews, with diverse studies, provide the opportunity to ask interesting questions about the reasons for differential intervention effects.

The ‘apples and oranges’ critique refers to the inappropriate mixing of studies within a single comparison, where the purpose is to estimate an average effect. In situations where good biologic or sociological evidence suggests that various formulations of an intervention behave very differently or that various definitions of the condition of interest are associated with markedly different effects of the intervention, the uncritical aggregation of results from quite different interventions or populations/settings may well be questionable.

Unfortunately, determining the situations where studies are similar enough to combine with one another is not always straightforward, and it can depend, to some extent, on the question being asked. While the decision is sometimes characterized as ‘lumping’ (where studies are combined in the same analysis) or ‘splitting’ (where they are not) (Squires et al 2013), it is better to consider these issues on a continuum, with reviews that have greater variation in the types of included interventions, settings and populations, and study designs being towards the ‘lumped’ end, and those that include little variation in these elements being towards the ‘split’ end (Petticrew and Roberts 2006).

While specification of the review PICO sets the boundary for the inclusion and exclusion of studies, decisions also need to be made when planning the PICO for the comparisons to be made in the analysis as to whether they aim to address broader (‘lumped’) or narrower (‘split’) questions (Caldwell and Welton 2016). The degree of ‘lumping’ in the comparisons will be primarily driven by the review’s objectives, but will sometimes be dictated by the availability of studies (and data) for a particular comparison (see Chapter 9 for discussion of the latter). The former is illustrated by a Cochrane Review that examined the effects of newer-generation antidepressants for depressive disorders in children and adolescents (Hetrick et al 2012).

Newer-generation antidepressants include multiple different compounds (e.g. paroxetine, fluoxetine). The objectives of this review were to (i) estimate the overall effect of newer-generation antidepressants on depression, (ii) estimate the effect of each compound, and (iii) examine whether the compound type and age of the participants (children versus adolescents) is associated with the intervention effect. Objective (i) addresses a broad, ‘in principle’ (Caldwell and Welton 2016), question of whether newer-generation antidepressants improve depression, where the different compounds are ‘lumped’ into a single comparison. Objective (ii) seeks to address narrower, ‘split’, questions that investigate the effect of each compound on depression separately. Answers to both questions can be identified by setting up separate comparisons for each compound, or by subgrouping the ‘lumped’ comparison by compound ( Chapter 10, Section 10.11.2 ). Objective (iii) seeks to explore factors that explain heterogeneity among the intervention effects, or equivalently, whether the intervention effect varies by the factor. This can be examined using subgroup analysis or meta-regression ( Chapter 10, Section 10.11 ) but, in the case of intervention types, is best achieved using network meta-analysis (see Chapter 11 ).

There are various advantages and disadvantages to bear in mind when defining the PICO for the comparison and considering whether ‘lumping’ or ‘splitting’ is appropriate. Lumping allows for the investigation of factors that may explain heterogeneity. Results from these investigations may provide important leads as to whether an intervention operates differently in, for example, different populations (such as in children and adolescents in the example above). Ultimately, this type of knowledge is useful for clinical decision making. However, lumping is likely to introduce heterogeneity, which will not always be explained by a priori specified factors, and this may lead to a combined effect that is clinically difficult to interpret and implement. For example, when multiple intervention types are ‘lumped’ in one comparison (as in objective (i) above), and there is unexplained heterogeneity, the combined intervention effect would not enable a clinical decision as to which intervention should be selected. Splitting comparisons carries its own risk of there being too few studies to yield a useful synthesis. Inevitably, some degree of aggregation across the PICO elements is required for a meta-analysis to be undertaken (Caldwell and Welton 2016).

2.4 Ensuring the review addresses the right questions

Since systematic reviews are intended for use in healthcare decision making, review teams should ensure not only the application of robust methodology, but also that the review question is meaningful for healthcare decision making. Two approaches are discussed below:

  • Using results from existing research priority-setting exercises to define the review question.
  • In the absence of, or in addition to, existing research priority-setting exercises, engaging with stakeholders to define review questions and establish their relevance to policy and practice.

2.4.1 Using priority-setting exercises to define review questions

A research priority-setting exercise is a “collective activity for deciding which uncertainties are most worth trying to resolve through research; uncertainties considered may be problems to be understood or solutions to be developed or tested; across broad or narrow areas” (Sandy Oliver, referenced in Nasser 2018). Using research priority-setting exercises to define the scope of a review helps to prevent the waste of scarce resources for research by making the review more relevant to stakeholders (Chalmers et al 2014).

Research priority setting is always conducted in a specific context, setting and population with specific principles, values and preferences (which should be articulated). Different stakeholders’ interpretation of the scope and purpose of a ‘research question’ might vary, resulting in priorities that might be difficult to interpret. Researchers or review teams might find it necessary to translate the research priorities into an answerable PICO research question format, and may find it useful to recheck the question with the stakeholder groups to determine whether they have accurately reflected their intentions.

While Cochrane Review teams are in most cases reviewing the effects of an intervention with a global scope, they may find that the priorities identified by important stakeholders (such as the World Health Organization or other organizations or individuals in a representative health system) are informative in planning the review. Review authors may find that differences between different stakeholder groups’ views on priorities and the reasons for these differences can help them to define the scope of the review. This is particularly important for making decisions about excluding specific populations or settings, or being inclusive and potentially conducting subgroup analyses.

Whenever feasible, systematic reviews should be based on priorities identified by key stakeholders such as decision makers, patients/public, and practitioners. Cochrane has developed a list of priorities for reviews in consultation with key stakeholders, which is available on the Cochrane website. Issues relating to equity (see Chapter 16 and Section 2.4.3 ) need to be taken into account when conducting and interpreting the results from priority-setting exercises. Examples of materials to support these processes are available (Viergever et al 2010, Nasser et al 2013, Tong et al 2017).

The results of research priority-setting exercises can be searched for in electronic databases and via websites of relevant organizations. Examples are: James Lind Alliance , World Health Organization, organizations of health professionals including research disciplines, and ministries of health in different countries (Viergever 2010). Examples of search strategies for identifying research priority-setting exercises are available (Bryant et al 2014, Tong et al 2015).

Other sources of questions are often found in ‘implications for future research’ sections of articles in journals and clinical practice guidelines. Some guideline developers have prioritized questions identified through the guideline development process (Sharma et al 2018), although these priorities will be influenced by the needs of health systems in which different guideline development teams are working.

2.4.2 Engaging stakeholders to help define the review questions

In the absence of a relevant research priority-setting exercise, or when a systematic review is being conducted for a very specific purpose (for example, commissioned to inform the development of a guideline), researchers should work with relevant stakeholders to define the review question. This practice is especially important when developing review questions for studying the effectiveness of health systems and policies, because of the variability between countries and regions; the significance of these differences may only become apparent through discussion with the stakeholders.

The stakeholders for a review could include consumers or patients, carers, health professionals of different kinds, policy decision makers and others ( Chapter 1, Section 1.3.1 ). Identifying the stakeholders who are critical to a particular question will depend on the question, who the answer is likely to affect, and who will be expected to implement the intervention if it is found to be effective (or to discontinue it if not).

Stakeholder engagement should, optimally, be an ongoing process throughout the life of the systematic review, from defining the question to dissemination of results (Keown et al 2008). Engaging stakeholders increases relevance, promotes mutual learning, improves uptake and decreases research waste (see Chapter 1, Section 1.3.1 and Section 1.3.2 ). However, because such engagement can be challenging and resource intensive, a one-off engagement process to define the review question might only be possible. Review questions that are conceptualized and refined by multiple stakeholders can capture much of the complexity that should be addressed in a systematic review.

2.4.3 Considering issues relating to equity when defining review questions

Deciding what should be investigated, who the participants should be, and how the analysis will be carried out can be considered political activities, with the potential for increasing or decreasing inequalities in health. For example, we now know that well-intended interventions can actually widen inequalities in health outcomes since researchers have chosen to investigate this issue (Lorenc et al 2013). Decision makers can now take account of this knowledge when planning service provision. Authors should therefore consider the potential impact on disadvantaged groups of the intervention(s) that they are investigating on disadvantaged groups, and whether socio-economic inequalities in health might be affected depending on whether or how they are implemented.

Health equity is the absence of avoidable and unfair differences in health (Whitehead 1992). Health inequity may be experienced across characteristics defined by PROGRESS-Plus (Place of residence, Race/ethnicity/culture/language, Occupation, Gender/sex, Religion, Education, Socio-economic status, Social capital, and other characteristics (‘Plus’) such as sexual orientation, age, and disability) (O’Neill et al 2014). Issues relating to health equity should be considered when review questions are developed ( MECIR Box 2.4.a ). Chapter 16 presents detailed guidance on this issue for review authors.

MECIR Box 2.4 . a Relevant expectations for conduct of intervention reviews

Considering equity and specific populations ( )

Where possible reviews should include explicit descriptions of the effect of the interventions not only upon the whole population, but also on the disadvantaged, and/or the ability of the interventions to reduce socio-economic inequalities in health, and to promote use of the interventions to the community.

2.5 Methods and tools for structuring the review

It is important for authors to develop the scope of their review with care: without a clear understanding of where the review will contribute to existing knowledge – and how it will be used – it may be at risk of conceptual incoherence. It may mis-specify critical elements of how the intervention(s) interact with the context(s) within which they operate to produce specific outcomes, and become either irrelevant or possibly misleading. For example, in a systematic review about smoking cessation interventions in pregnancy, it was essential for authors to take account of the way that health service provision has changed over time. The type and intensity of ‘usual care’ in more recent evaluations was equivalent to the interventions being evaluated in older studies, and the analysis needed to take this into account. This review also found that the same intervention can have different effects in different settings depending on whether its materials are culturally appropriate in each context (Chamberlain et al 2017).

In order to protect the review against conceptual incoherence and irrelevance, review authors need to spend time at the outset developing definitions for key concepts and ensuring that they are clear about the prior assumptions on which the review depends. These prior assumptions include, for example, why particular populations should be considered inside or outside the review’s scope; how the intervention is thought to achieve its effect; and why specific outcomes are selected for evaluation. Being clear about these prior assumptions also requires review authors to consider the evidential basis for these assumptions and decide for themselves which they can place more or less reliance on. When considered as a whole, this initial conceptual and definitional work states the review’s conceptual framework . Each element of the review’s PICO raises its own definitional challenges, which are discussed in detail in the Chapter 3 .

In this section we consider tools that may help to define the scope of the review and the relationships between its key concepts; in particular, articulating how the intervention gives rise to the outcomes selected. In some situations, long sequences of events are expected to occur between an intervention being implemented and an outcome being observed. For example, a systematic review examining the effects of asthma education interventions in schools on children’s health and well-being needed to consider: the interplay between core intervention components and their introduction into differing school environments; different child-level effect modifiers; how the intervention then had an impact on the knowledge of the child (and their family); the child’s self-efficacy and adherence to their treatment regime; the severity of their asthma; the number of days of restricted activity; how this affected their attendance at school; and finally, the distal outcomes of education attainment and indicators of child health and well-being (Kneale et al 2015).

Several specific tools can help authors to consider issues raised when defining review questions and planning their review; these are also helpful when developing eligibility criteria and classifying included studies. These include the following.

  • Taxonomies: hierarchical structures that can be used to categorize (or group) related interventions, outcomes or populations.
  • Generic frameworks for examining and structuring the description of intervention characteristics (e.g. TIDieR for the description of interventions (Hoffmann et al 2014), iCAT_SR for describing multiple aspects of complexity in systematic reviews (Lewin et al 2017)).
  • Core outcome sets for identifying and defining agreed outcomes that should be measured for specific health conditions (described in more detail in Chapter 3 ).

Unlike these tools, which focus on particular aspects of a review, logic models provide a framework for planning and guiding synthesis at the review level (see Section 2.5.1 ).

2.5.1 Logic models

Logic models (sometimes referred to as conceptual frameworks or theories of change) are graphical representations of theories about how interventions work. They depict intervention components, mechanisms (pathways of action), outputs, and outcomes as sequential (although not necessarily linear) chains of events. Among systematic review authors, they were originally proposed as a useful tool when working with evaluations of complex social and population health programmes and interventions, to conceptualize the pathways through which interventions are intended to change outcomes (Anderson et al 2011).

In reviews where intervention complexity is a key consideration (see Chapter 17 ), logic models can be particularly helpful. For example, in a review of psychosocial group interventions for those with HIV, a logic model was used to show how the intervention might work (van der Heijden et al 2017). The review authors depicted proximal outcomes, such as self-esteem, but chose only to include psychological health outcomes in their review. In contrast, Bailey and colleagues included proximal outcomes in their review of computer-based interventions for sexual health promotion using a logic model to show how outcomes were grouped (Bailey et al 2010). Finally, in a review of slum upgrading, a logic model showed the broad range of interventions and their interlinkages with health and socio-economic outcomes (Turley et al 2013), and enabled the review authors to select a specific intervention category (physical upgrading) on which to focus the review. Further resources provide further examples of logic models, and can help review authors develop and use logic models (Anderson et al 2011, Baxter et al 2014, Kneale et al 2015, Pfadenhauer et al 2017, Rohwer et al 2017).

Logic models can vary in their emphasis, with a distinction sometimes made between system-based and process-oriented logic models (Rehfuess et al 2018). System-based logic models have particular value in examining the complexity of the system (e.g. the geographical, epidemiological, political, socio-cultural and socio-economic features of a system), and the interactions between contextual features, participants and the intervention (see Chapter 17 ). Process-oriented logic models aim to capture the complexity of causal pathways by which the intervention leads to outcomes, and any factors that may modify intervention effects. However, this is not a crisp distinction; the two types are interrelated; with some logic models depicting elements of both systems and process models simultaneously.

The way that logic models can be represented diagrammatically (see Chapter 17 for an example) provides a valuable visual summary for readers and can be a communication tool for decision makers and practitioners. They can aid initially in the development of a shared understanding between different stakeholders of the scope of the review and its PICO, helping to support decisions taken throughout the review process, from developing the research question and setting the review parameters, to structuring and interpreting the results. They can be used in planning the PICO elements of a review as well as for determining how the synthesis will be structured (i.e. planned comparisons, including intervention and comparator groups, and any grouping of outcome and population subgroups). These models may help review authors specify the link between the intervention, proximal and distal outcomes, and mediating factors. In other words, they depict the intervention theory underpinning the synthesis plan.

Anderson and colleagues note the main value of logic models in systematic review as (Anderson et al 2011):

  • refining review questions;
  • deciding on ‘lumping’ or ‘splitting’ a review topic;
  • identifying intervention components;
  • defining and conducting the review;
  • identifying relevant study eligibility criteria;
  • guiding the literature search strategy;
  • explaining the rationale behind surrogate outcomes used in the review;
  • justifying the need for subgroup analyses (e.g. age, sex/gender, socio-economic status);
  • making the review relevant to policy and practice;
  • structuring the reporting of results;
  • illustrating how harms and feasibility are connected with interventions; and
  • interpreting results based on intervention theory and systems thinking (see Chapter 17 ).

Logic models can be useful in systematic reviews when considering whether failure to find a beneficial effect of an intervention is due to a theory failure, an implementation failure, or both (see Chapter 17 and Cargo et al 2018). Making a distinction between implementation and intervention theory can help to determine whether and how the intervention interacts with (and potentially changes) its context (see Chapter 3 and Chapter 17 for further discussion of context). This helps to elucidate situations in which variations in how the intervention is implemented have the potential to affect the integrity of the intervention and intended outcomes.

Given their potential value in conceptualizing and structuring a review, logic models are increasingly published in review protocols. Logic models may be specified a priori and remain unchanged throughout the review; it might be expected, however, that the findings of reviews produce evidence and new understandings that could be used to update the logic model in some way (Kneale et al 2015). Some reviews take a more staged approach, pre-specifying points in the review process where the model may be revised on the basis of (new) evidence (Rehfuess et al 2018) and a staged logic model can provide an efficient way to report revisions to the synthesis plan. For example, in a review of portion, package and tableware size for changing selection or consumption of food and other products, the authors presented a logic model that clearly showed changes to their original synthesis plan (Hollands et al 2015).

It is preferable to seek out existing logic models for the intervention and revise or adapt these models in line with the review focus, although this may not always be possible. More commonly, new models are developed starting with the identification of outcomes and theorizing the necessary pre-conditions to reach those outcomes. This process of theorizing and identifying the steps and necessary pre-conditions continues, working backwards from the intended outcomes, until the intervention itself is represented. As many mechanisms of action are invisible and can only be ‘known’ through theory, this process is invaluable in exposing assumptions as to how interventions are thought to work; assumptions that might then be tested in the review. Logic models can be developed with stakeholders (see Section 2.5.2 ) and it is considered good practice to obtain stakeholder input in their development.

Logic models are representations of how interventions are intended to ‘work’, but they can also provide a useful basis for thinking through the unintended consequences of interventions and identifying potential adverse effects that may need to be captured in the review (Bonell et al 2015). While logic models provide a guiding theory of how interventions are intended to work, critiques exist around their use, including their potential to oversimplify complex intervention processes (Rohwer et al 2017). Here, contributions from different stakeholders to the development of a logic model may be able to articulate where complex processes may occur; theorizing unintended intervention impacts; and the explicit representation of ambiguity within certain parts of the causal chain where new theory/explanation is most valuable.

2.5.2 Changing review questions

While questions should be posed in the protocol before initiating the full review, these questions should not prevent exploration of unexpected issues. Reviews are analyses of existing data that are constrained by previously chosen study populations, settings, intervention formulations, outcome measures and study designs. It is generally not possible to formulate an answerable question for a review without knowing some of the studies relevant to the question, and it may become clear that the questions a review addresses need to be modified in light of evidence accumulated in the process of conducting the review.

Although a certain fluidity and refinement of questions is to be expected in reviews as a fuller understanding of the evidence is gained, it is important to guard against bias in modifying questions. Data-driven questions can generate false conclusions based on spurious results. Any changes to the protocol that result from revising the question for the review should be documented at the beginning of the Methods section. Sensitivity analyses may be used to assess the impact of changes on the review findings (see Chapter 10, Section 10.14 ). When refining questions it is useful to ask the following questions.

  • What is the motivation for the refinement?
  • Could the refinement have been influenced by results from any of the included studies?
  • Does the refined question require a modification to the search strategy and/or reassessment of any decisions regarding study eligibility?
  • Are data collection methods appropriate to the refined question?
  • Does the refined question still meet the FINER criteria discussed in Section 2.1 ?

2.5.3 Building in contingencies to deal with sparse data

The ability to address the review questions will depend on the maturity and validity of the evidence base. When few studies are identified, there will be limited opportunity to address the question through an informative synthesis. In anticipation of this scenario, review authors may build contingencies into their protocol analysis plan that specify grouping (any or multiple) PICO elements at a broader level; thus potentially enabling synthesis of a larger number of studies. Broader groupings will generally address a less specific question, for example:

  • ‘the effect of any antioxidant supplement on …’ instead of ‘the effect of vitamin C on …’;
  • ‘the effect of sexual health promotion on biological outcomes ’ instead of ‘the effect of sexual health promotion on sexually transmitted infections ’; or
  • ‘the effect of cognitive behavioural therapy in children and adolescents on …’ instead of ‘the effect of cognitive behavioural therapy in children on …’.

However, such broader questions may be useful for identifying important leads in areas that lack effective interventions and for guiding future research. Changes in the grouping may affect the assessment of the certainty of the evidence (see Chapter 14 ).

2.5.4 Economic data

Decision makers need to consider the economic aspects of an intervention, such as whether its adoption will lead to a more efficient use of resources. Economic data such as resource use, costs or cost-effectiveness (or a combination of these) may therefore be included as outcomes in a review. It is useful to break down measures of resource use and costs to the level of specific items or categories. It is helpful to consider an international perspective in the discussion of costs. Economics issues are discussed in detail in Chapter 20 .

2.6 Chapter information

Authors: James Thomas, Dylan Kneale, Joanne E McKenzie, Sue E Brennan, Soumyadeep Bhaumik

Acknowledgements: This chapter builds on earlier versions of the Handbook . Mona Nasser, Dan Fox and Sally Crowe contributed to Section 2.4 ; Hilary J Thomson contributed to Section 2.5.1 .

Funding: JT and DK are supported by the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care North Thames at Barts Health NHS Trust. JEM is supported by an Australian National Health and Medical Research Council (NHMRC) Career Development Fellowship (1143429). SEB’s position is supported by the NHMRC Cochrane Collaboration Funding Program. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, the Department of Health or the NHMRC.

2.7 References

Anderson L, Petticrew M, Rehfuess E, Armstrong R, Ueffing E, Baker P, Francis D, Tugwell P. Using logic models to capture complexity in systematic reviews. Research Synthesis Methods 2011; 2 : 33–42.

Bailey JV, Murray E, Rait G, Mercer CH, Morris RW, Peacock R, Cassell J, Nazareth I. Interactive computer-based interventions for sexual health promotion. Cochrane Database of Systematic Reviews 2010; 9 : CD006483.

Baxter SK, Blank L, Woods HB, Payne N, Rimmer M, Goyder E. Using logic model methods in systematic review synthesis: describing complex pathways in referral management interventions. BMC Medical Research Methodology 2014; 14 : 62.

Bonell C, Jamal F, Melendez-Torres GJ, Cummins S. ‘Dark logic’: theorising the harmful consequences of public health interventions. Journal of Epidemiology and Community Health 2015; 69 : 95–98.

Bryant J, Sanson-Fisher R, Walsh J, Stewart J. Health research priority setting in selected high income countries: a narrative review of methods used and recommendations for future practice. Cost Effectiveness and Resource Allocation 2014; 12 : 23.

Caldwell DM, Welton NJ. Approaches for synthesising complex mental health interventions in meta-analysis. Evidence-Based Mental Health 2016; 19 : 16–21.

Cargo M, Harris J, Pantoja T, Booth A, Harden A, Hannes K, Thomas J, Flemming K, Garside R, Noyes J. Cochrane Qualitative and Implementation Methods Group guidance series-paper 4: methods for assessing evidence on intervention implementation. Journal of Clinical Epidemiology 2018; 97 : 59–69.

Chalmers I, Bracken MB, Djulbegovic B, Garattini S, Grant J, Gülmezoglu AM, Howells DW, Ioannidis JPA, Oliver S. How to increase value and reduce waste when research priorities are set. Lancet 2014; 383 : 156–165.

Chamberlain C, O’Mara-Eves A, Porter J, Coleman T, Perlen S, Thomas J, McKenzie J. Psychosocial interventions for supporting women to stop smoking in pregnancy. Cochrane Database of Systematic Reviews 2017; 2 : CD001055.

Cooper H. The problem formulation stage. In: Cooper H, editor. Integrating Research: A Guide for Literature Reviews . Newbury Park (CA) USA: Sage Publications; 1984.

Counsell C. Formulating questions and locating primary studies for inclusion in systematic reviews. Annals of Internal Medicine 1997; 127 : 380–387.

Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing Clinical Research: An Epidemiological Approach . 4th ed. Philadelphia (PA): Lippincott Williams & Wilkins; 2007. p. 14–22.

Glass GV. Meta-analysis at middle age: a personal history. Research Synthesis Methods 2015; 6 : 221–231.

Hedges LV. Statistical considerations. In: Cooper H, Hedges LV, editors. The Handbook of Research Synthesis . New York (NY): USA: Russell Sage Foundation; 1994.

Hetrick SE, McKenzie JE, Cox GR, Simmons MB, Merry SN. Newer generation antidepressants for depressive disorders in children and adolescents. Cochrane Database of Systematic Reviews 2012; 11 : CD004851.

Hoffmann T, Glasziou P, Boutron I. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014; 348: g1687.

Hollands GJ, Shemilt I, Marteau TM, Jebb SA, Lewis HB, Wei Y, Higgins JPT, Ogilvie D. Portion, package or tableware size for changing selection and consumption of food, alcohol and tobacco. Cochrane Database of Systematic Reviews 2015; 9 : CD011045.

Keown K, Van Eerd D, Irvin E. Stakeholder engagement opportunities in systematic reviews: Knowledge transfer for policy and practice. Journal of Continuing Education in the Health Professions 2008; 28 : 67–72.

Kneale D, Thomas J, Harris K. Developing and optimising the use of logic models in systematic reviews: exploring practice and good practice in the use of programme theory in reviews. PloS One 2015; 10 : e0142187.

Lewin S, Hendry M, Chandler J, Oxman AD, Michie S, Shepperd S, Reeves BC, Tugwell P, Hannes K, Rehfuess EA, Welch V, McKenzie JE, Burford B, Petkovic J, Anderson LM, Harris J, Noyes J. Assessing the complexity of interventions within systematic reviews: development, content and use of a new tool (iCAT_SR). BMC Medical Research Methodology 2017; 17 : 76.

Lorenc T, Petticrew M, Welch V, Tugwell P. What types of interventions generate inequalities? Evidence from systematic reviews. Journal of Epidemiology and Community Health 2013; 67 : 190–193.

Nasser M, Ueffing E, Welch V, Tugwell P. An equity lens can ensure an equity-oriented approach to agenda setting and priority setting of Cochrane Reviews. Journal of Clinical Epidemiology 2013; 66 : 511–521.

Nasser M. Setting priorities for conducting and updating systematic reviews [PhD Thesis]: University of Plymouth; 2018.

O’Neill J, Tabish H, Welch V, Petticrew M, Pottie K, Clarke M, Evans T, Pardo Pardo J, Waters E, White H, Tugwell P. Applying an equity lens to interventions: using PROGRESS ensures consideration of socially stratifying factors to illuminate inequities in health. Journal of Clinical Epidemiology 2014; 67 : 56–64.

Oliver S, Dickson K, Bangpan M, Newman M. Getting started with a review. In: Gough D, Oliver S, Thomas J, editors. An Introduction to Systematic Reviews . London (UK): Sage Publications Ltd.; 2017.

Petticrew M, Roberts H. Systematic Reviews in the Social Sciences: A Practical Guide . Oxford (UK): Blackwell; 2006.

Pfadenhauer L, Gerhardus A, Mozygemba K, Lysdahl KB, Booth A, Hofmann B, Wahlster P, Polus S, Burns J, Brereton L, Rehfuess E. Making sense of complexity in context and implementation: the Context and Implementation of Complex Interventions (CICI) framework. Implementation Science 2017; 12 : 21.

Rehfuess EA, Booth A, Brereton L, Burns J, Gerhardus A, Mozygemba K, Oortwijn W, Pfadenhauer LM, Tummers M, van der Wilt GJ, Rohwer A. Towards a taxonomy of logic models in systematic reviews and health technology assessments: a priori, staged, and iterative approaches. Research Synthesis Methods 2018; 9 : 13–24.

Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP Journal Club 1995; 123 : A12–13.

Rohwer A, Pfadenhauer L, Burns J, Brereton L, Gerhardus A, Booth A, Oortwijn W, Rehfuess E. Series: Clinical epidemiology in South Africa. Paper 3: Logic models help make sense of complexity in systematic reviews and health technology assessments. Journal of Clinical Epidemiology 2017; 83 : 37–47.

Sharma T, Choudhury M, Rejón-Parrilla JC, Jonsson P, Garner S. Using HTA and guideline development as a tool for research priority setting the NICE way: reducing research waste by identifying the right research to fund. BMJ Open 2018; 8 : e019777.

Squires J, Valentine J, Grimshaw J. Systematic reviews of complex interventions: framing the review question. Journal of Clinical Epidemiology 2013; 66 : 1215–1222.

Tong A, Chando S, Crowe S, Manns B, Winkelmayer WC, Hemmelgarn B, Craig JC. Research priority setting in kidney disease: a systematic review. American Journal of Kidney Diseases 2015; 65 : 674–683.

Tong A, Sautenet B, Chapman JR, Harper C, MacDonald P, Shackel N, Crowe S, Hanson C, Hill S, Synnot A, Craig JC. Research priority setting in organ transplantation: a systematic review. Transplant International 2017; 30 : 327–343.

Turley R, Saith R, Bhan N, Rehfuess E, Carter B. Slum upgrading strategies involving physical environment and infrastructure interventions and their effects on health and socio-economic outcomes. Cochrane Database of Systematic Reviews 2013; 1 : CD010067.

van der Heijden I, Abrahams N, Sinclair D. Psychosocial group interventions to improve psychological well-being in adults living with HIV. Cochrane Database of Systematic Reviews 2017; 3 : CD010806.

Viergever RF. Health Research Prioritization at WHO: An Overview of Methodology and High Level Analysis of WHO Led Health Research Priority Setting Exercises . Geneva (Switzerland): World Health Organization; 2010.

Viergever RF, Olifson S, Ghaffar A, Terry RF. A checklist for health research priority setting: nine common themes of good practice. Health Research Policy and Systems 2010; 8 : 36.

Whitehead M. The concepts and principles of equity and health. International Journal of Health Services 1992; 22 : 429–25.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

Banner

Systematic review

  • How we can help
  • What is a systematic review?
  • Should I conduct a systematic review or a scoping review?
  • How do I develop a research question for systematic review?
  • Checking for existing systematic reviews
  • Do I need to register a protocol?
  • What sources should I search?
  • How do I develop a search strategy?
  • How can I evaluate the quality of my search strategy?
  • How can I record my search strategy?
  • How can I manage the research?
  • How can I find the full text of journal articles?
  • Recommended reading

How do I develop a research question for systematic reveiw?

Once you have identified a topic to investigate you need to define a clear and answerable question for the systematic review. Some initial scoping searches will be needed to check the question is:

  • Manageable - not too broad or too narrow 
  • Answerable - relevant literature is available 
  • Has not already been done recently  

You may need to explore a few different versions of the question depending on what you find during these initial searches. The question must be clearly defined and it may be useful to use a research question framework such as PICO (population, intervention, comparison, outcome) or SPICE (setting, perspective, intervention, comparison, evaluation) to help structure both the question and the search terms. For more information and examples of research question frameworks visit our guide to research question frameworks. 

  • Research question frameworks
  • << Previous: Should I conduct a systematic review or a scoping review?
  • Next: Checking for existing systematic reviews >>
  • Last Updated: Jul 23, 2024 12:19 PM
  • URL: https://gcu.libguides.com/systematicreview

Banner

Systematic Reviews & Meta-Analysis

Identifying your research question.

  • Developing Your Protocol
  • Conducting Your Search
  • Screening & Selection
  • Data Extraction & Appraisal
  • Meta-Analyses
  • Writing the Systematic Review
  • Suggested Readings

The first step in performing a Systematic Review is to develop your research question. The guidance provided on how to develop your research question for literature reviews will still apply here. The difference with a systematic review research question is that you must have a clearly defined question and consider what problem are you trying to address by conducting the review. The most important point is that you focus your question and design the question so that it is answerable by the research that you will be systematically examining.

Once you have developed your research question, it should not be changed once the review process has begun, as the review protocol needs to be formed around the question. 

Literature Review Question Systematic Review Question
Can be broad; highlight only particular pieces of literature, or support a particular viewpoint. Requires the question to be well-defined and focused so it is possible to answer.

To help develop and focus your research question you may use one of the question frameworks below.

Methods for Refining a Research Topic

PICO questions can be useful in the health or social sciences. PICO stands for:

  • Patient, Population, or Problem : What are the characteristics of the patient(s) or population, i.e. their ages, genders, or other demographics? What is the situation, disease, etc., that you are interested in?
  • Intervention or Exposure : What do you want to do with the patient, person, or population (i.e. observe, diagnose, treat)?
  • Comparison : What is the alternative to the intervention (i.e. a different drug, a different assignment in a classroom)?
  • Outcome : What are the relevant outcomes (i.e. complications, morbidity, grades)?

Additionally, the following are variations to the PICO framework:

  • PICO(T) : The 'T' stands for Timing, where you would define the duration of treatment and the follow-up schedule that matter to patients. Consider both long- and short-term outcomes.
  • PICO(S) : The 'S' stands for Study type (eg. randomized controlled trial), sometimes S can be used to stand for Setting or Sample Size

PPAARE is a useful question framework for patient care:

Problem -  Description of the problem related to the disease or condition

Patient - Description of the patient related to their demographics and risk factors

Action  - Description of the action related to the patient’s diagnosis, diagnostic test, etiology, prognosis, treatment or therapy, harm, prevention, patient ed.

Alternative - Description of the alternative to the action when there is one. (Not required)

Results   -   Identify the patient’s result of the action to produce, improve, or reduce the outcome for the patient

Evidence   -   Identify the level of evidence available after searching

SPIDER is a useful question framework for qualitative evidence synthesis:

Sample  - The group of participants, population, or patients being investigated. Qualitative research is not easy to generalize, which is why sample is preferred over patient.

Phenomenon of Interest - The reasons for behavior and decisions, rather than an intervention.

Design  - The research method and study design used for the research, such as interview or survey.

Evaluation  - The end result of the research or outcome measures.

Research type -   The research type; Qualitative, quantitative and/or mixed methods.

SPICE is a particularly useful method in the social sciences. It stands for

  • Setting (e.g. United States)
  • Perspective (e.g. adolescents)
  • Intervention (e.g. text message reminders)
  • Comparisons (e.g. telephone message reminders)
  • Evaluation (e.g. number of homework assignments turned in after text message reminder compared to the number of assignments turned in after a telephone reminder)

CIMO is useful method in the social sciences or organisational context. It stands for

  • Context - Which individuals, relationships, institutional settings, or wider systems are being studied?
  • Intervention - The effects of what event, action, or activity are being studied?
  • Mechanism - What are the mechanisms that explain the relationship between interventions and outcomes? Under what circumstances are these mechanisms activated or not activated?
  • Outcomes - What are the effects of the intervention? How will the outcomes be measured? What are the intended and unintended effects?

Has Your Systematic Review Already Been Done?

Once you have a reasonably well defined research question, it is important to check if your question has already been asked, or if there are other systematic reviews that are similar to that which you're preparing to do.

In the context of conducting a review, even if you do find one on your topic, it may be sufficiently out of date or you may find other defendable reasons to undertake a new or updated one. In addition, locating an existing systematic reviews may also provide a starting point for selecting a review topic, it may help you refocus your question, or redirect your research toward other gaps in the literature.

You may locate existing systematic reviews or protocols on the following resources:

  • Cochrane Library This link opens in a new window The Cochrane Library is a database collection containing high-quality, independent evidence, including systematic reviews and controlled trials, to inform healthcare decision-making. Terms of Use .
  • MEDLINE (EBSCO) This link opens in a new window Medline (EBSCO) produced by the U.S. National Library of Medicine is the premier database of biomedicine and health sciences, covering life sciences including biology, environmental science, marine biology, plant and animal science, biophysics and chemistry. Terms of Use . Coverage: 1950-present.

Open Access

  • PsycINFO This link opens in a new window Contains over 5 million citations and summaries of peer-reviewed journal articles, book chapters, and dissertations from the behavioral and social sciences in 29 languages from 50 countries. Terms of Use . Coverage: 1872-present.
  • << Previous: Systematic Reviews
  • Next: Developing Your Protocol >>
  • Last Updated: Jun 28, 2024 10:04 AM
  • URL: https://libguides.chapman.edu/systematic_reviews

Mel wright-bevans

  • Keele University

What is best practice for conducting narrative synthesis for systematic literature review?

Top contributors to discussions in this field.

Dr.Mohammad Ahmad Abdalla

  • Tikrit University

Dimitar Vasileff Georgieff

  • Al-Furat Al-Awsat Technical University

Sharun Khan

  • Indian Veterinary Research Institute

George Hershell

  • Institute for Industrial Research & Toxicology

Get help with your research

Join ResearchGate to ask questions, get input, and advance your work.

All Answers (3)

how to formulate a research question systematic review

Similar questions and discussions

  • Asked 10 April 2024

Adama M. Kolley

  • Asked 23 February 2024

Md Al Mustanjid

  • Shall we update the time intervals?
  • Should we update and rewrite both the findings and the article?
  • Asked 27 November 2023

Anna Martirosyan

  • Asked 2 May 2024

Abdul Salaam Al-Moosawi

  • Asked 3 November 2016

Jerome Korzelius

  • Asked 1 August 2017

Gerit Arne Linneweber

  • Asked 17 August 2017

Gautham Yepuri

  • Asked 14 October 2017

Samantha Kutner

  • Asked 18 October 2017

Mohammed Rahman

Related Publications

Claudio Quagliarotti

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
  • Open access
  • Published: 22 July 2024

Artificial intelligence in total and unicompartmental knee arthroplasty

  • Umile Giuseppe Longo   ORCID: orcid.org/0000-0003-4063-9821 1 , 2 ,
  • Sergio De Salvatore 3 , 4 ,
  • Federica Valente 2 ,
  • Mariajose Villa Corta 2 ,
  • Bruno Violante 5 &
  • Kristian Samuelsson 2  

BMC Musculoskeletal Disorders volume  25 , Article number:  571 ( 2024 ) Cite this article

Metrics details

The application of Artificial intelligence (AI) and machine learning (ML) tools in total (TKA) and unicompartmental knee arthroplasty (UKA) emerges with the potential to improve patient-centered decision-making and outcome prediction in orthopedics, as ML algorithms can generate patient-specific risk models. This review aims to evaluate the potential of the application of AI/ML models in the prediction of TKA outcomes and the identification of populations at risk.

An extensive search in the following databases: MEDLINE, Scopus, Cinahl, Google Scholar, and EMBASE was conducted using the PIOS approach to formulate the research question. The PRISMA guideline was used for reporting the evidence of the data extracted. A modified eight-item MINORS checklist was employed for the quality assessment. The databases were screened from the inception to June 2022.

Forty-four out of the 542 initially selected articles were eligible for the data analysis; 5 further articles were identified and added to the review from the PUBMED database, for a total of 49 articles included. A total of 2,595,780 patients were identified, with an overall average age of the patients of 70.2 years ± 7.9 years old. The five most common AI/ML models identified in the selected articles were: RF, in 38.77% of studies; GBM, in 36.73% of studies; ANN in 34.7% of articles; LR, in 32.65%; SVM in 26.53% of articles.

This systematic review evaluated the possible uses of AI/ML models in TKA, highlighting their potential to lead to more accurate predictions, less time-consuming data processing, and improved decision-making, all while minimizing user input bias to provide risk-based patient-specific care.

Peer Review reports

Introduction

Artificial intelligence (AI) and Machine learning (ML) tools in knee arthroplasty (KA) have the potential to improve patient-centered decision-making and outcome prediction in orthopedics. The application of ML in KA has been useful for predicting implant size, reconstructing data, and assisting with component positioning and alignment. ML implementation enhances surgical precision and can help predict parameters such as length of hospitalization, healthcare costs, and discharge disposition [ 1 , 2 , 3 ]. 

Additionally, ML algorithms have been proven, in more recent studies, to be useful when selecting the right drugs to treat prosthetic joint infection (PJI) to have a more patient specific approach to medicine; this was possible due to the development of a Random Forest (RF) model able to take notice of several risk variables, such as patients’ characteristics and comorbidities and using the, for the selection [ 4 ]. In data science theory, the quantity and quality of input parameters are crucial; therefore, the previously mentioned variables, if not selected by relevance to the topic of each study, although beneficial in theory, may hinder the full potential of ML algorithms for KA. This is because, analyzing all underlying relations between variables, with a large number of inputs the models may highlight irrelevant patterns, leading to a greater risk of overfitting: the algorithms perform significantly better with the training data in respect to the newly presented one [ 4 , 5 ].

Moreover, patient satisfaction following primary KA is one of many outcome measures currently used to assess the efficacy of this procedure. Patients’ satisfaction is dependent on many factors such as age, gender, and the presence of comorbidities. Therefore, it is essential to understand the relationship between the variables underlying satisfaction to provide the best care and optimized postoperative care for KA patients. ML algorithms, capable of generating patient-specific risk models, appear to be very effective means to achieve this goal [ 6 ].

Overall, the application and use of ML and AI in orthopaedics are beneficial not only for the previously mentioned situations, but also for the identification of possible patients that are at high risk for severe walking limitations post-total knee arthroplasty [ 7 ], and the selection of high-risk patients who will require a blood transfusion after KA [ 8 ].

This review will focus on investigating which predictions are achievable by using AI and ML models in knee arthroplasty, identifying prerequisites for the effective use of this new approach. Moreover, the second aim is to highlight the latest findings of these technologies in predicting outcomes after KA.

Materials and methods

Study selection.

The research question was defined by using a PIO approach: Population (P); Intervention (I); Comparison (C); Outcome (O). The objective of this systematic review was to investigate which outcomes can be assessed by using AI or ML models (I) in patients with knee osteoarthritis who underwent total (TKA) or unicompartmental (UKA) knee replacement (P). The following outcomes were considered: complications, costs, functional outcomes, revision rate, and postoperative satisfaction (O).

Inclusion criteria

Only articles that evaluated AI/ML-based applications in clinical decision-making in knee arthroplasty were considered. Only original clinical studies written in English, Spanish, or Italian were screened.

Exclusion criteria

Studies that did not evaluate AI/ML applications in KA. Studies with nonhuman subjects. Medical imaging analysis studies without explicit reference or application to KA. Inaccessible articles, conference abstracts, reviews, and editorials. No limits were placed on the level of evidence or publication date of the study.

Following the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) guidelines, a thorough literature search was conducted using the following string: ((((total) OR (unicompartmental or unicondylar)) AND (knee replacement)) AND (((artificial intelligence) OR (machine learning)) OR (algorithm))) AND ((((((((complications) OR ((blood) AND ((transfusion) OR (loss)))) OR (functional outcomes)) OR (revision)) OR (satisfaction)) OR (surgical technique)) OR ((length of stay) OR (hospitalization))) OR ((costs) OR (economic analysis))). The use of keywords was both combined and isolated. The following databases were used: MEDLINE (Medical Literature Analysis and Retrieval System Online), Scopus, Cinahl, Google Scholar, PUBMED, and EMBASE (Excerpta Medica Database). The reference lists of selected systematic reviews [ 2 , 5 ] were searched for the selection of further studies. The authors (F.V. and M.V.C.) searched from June of 2022 to January 2024. The databases were screened from the inception to January 2024.

Data collection process

Two independent reviewers (F.V. and M.V.C.) collected the data, and mutual approval resolved differences. A third reviewer (S.D.S) was consulted in case of any disagreement. Title and abstract screening were the first steps, followed by the full-text evaluation of the selected articles. The inclusion and exclusion of the reviewed studies were displayed in the PRISMA flowchart, seen in Fig.  1 .

figure 1

Prisma flowchart

A database was developed by collecting and categorizing the general study characteristics from the selected articles, which comprised: primary author, year of publication, study design, level of evidence, study duration, AI/ML methods, data source, input variables, output variables, sample size, average patient age, percentage of female patients, Area Under the Receiving Operating Characteristic Curve (AUC-ROC), accuracy, sensitivity, specificity.

Risk of bias assessment

For the quality assessment, a modified eight-item Methodological Index for Non-Randomized Studies (MINORS) checklist was employed to evaluate the selected articles. The eight-item checklist included: disclosure, study aim, input feature, output feature, validation method, dataset distribution, performance metric, and AI model. Each item was scored using the following binary scale: 0 (not reported or unclear) and 1 (reported and adequate). The following criteria were used as a guide when assessing the quality of each publication: 

Disclosure: Scored 1 if clearly reported possible conflicts of interest, funding, or ethical considerations, scored 0 if not reported or unclear. Study aim: scored 1 if the research question and/or objective were clearly reported, scored 0 if unclear or not reported. Input feature: scored 1 if variables were clearly reported, scored 0 if unclear or not reported. Output feature: scored 1 if clearly reported, scored 0 if unclear or not reported. The validation method involves the evaluation of the AI/ML model’s performance by specific methods: scored 1 if the tools external validation, cross-validation, and/or bootstrapping were used and clearly reported, scored 0 if not reported nor used. Dataset distribution: scored 1 if the phases of training, testing, and validation for the AI/ML methods were clearly reported, scored 0 if unclear or not reported. Performance metric: scored 1 if the study clearly reported the metrics accuracy, sensitivity, specificity, and/or AUC-ROC for assessing the AI/ML model performance, scored 0 if unclear or not reported. AI model: scored 1 if clearly stated the specific AI/ML algorithm used by the study, scored 0 if not clearly stated.

Compared to the original MINORS checklist, this modified version, proposed by [ 9 ], provides a more accurate grading tool for studies focused on applying AI/ML methods in medical research and diagnostic studies within the medical field. Two independent reviewers (F.V. and M.V.C.) evaluated each publication individually.

The initial search identified 654 studies. After the duplicate removal, 479 studies were screened from which 402 articles were excluded after the title/abstract examination, resulting in 77 records for the full-text evaluation. After the full-text assessment, 49 studies were included in the data analysis (Fig.  1 ). Of these excluded articles, 9 studies did not evaluate AI/ML application in knee arthroplasty, 8 were medical imaging analysis studies without explicit reference or applications to knee arthroplasty, 2 used non-human objects, and 9 were inaccessible articles or systematic reviews.

Study characteristics

A total of 2,595,780 patients were identified from 48 of the 49 studies included, with one study [ 10 ] not providing the sample size. Thirty-seven of the 49 studies stated the percentage of female patients, adding up to 1,435,218 female patients, which account for 55.29% of the total patients. The overall average age of the patients was 70.2 years ± 7.9 years old, with 33 out of 49 articles providing an average age of the study population. The study which had the highest number of patients was Hyer et al., 2020 [ 11 ] with 1,049,160 patients (40.41% of all the patients included in the studies). All the study characteristics are reported in Table  1 .

The five most common AI/ML models used were: RF, used in 19 articles; Gradient Boosting Machine (GBM), used in 18 articles (including less generalized versions such as Extreme Gradient Boosting (XGBoost) and Stochastic Gradient Boosting (SGB)); Artificial Neural Network (ANN) used in 17 articles; Logistic regression (LR), used in 16 articles (together with less generalized versions such as Elastic-net penalized logistic regression (EPLR)); and Support Vector Machine (SVM) used in 13 articles.

Regarding the variables reported, the most common input variables were: Age [ 38 , 41 , 45 , 47 , 49 , 50 , 52 ] (44 articles), Sex (33 articles), Comorbidities (29 articles), BMI (27 articles), Race/ ethnicity (26 articles), ASA classification (10 articles). The most common output variables provided by the studies were: post-surgical complications (11 articles), Probability of TKA (7 articles), and length of stay (4 articles).

This review included studies with level of evidence II-IV. Level of evidence II studies consist of Randomized controlled trials (RCTs) and are considered one of the strongest study designs, second only to reviews and meta-analysis which are considered as level of evidence I; Level of evidence III studies are composed of non-randomized controlled trials; the last category of evidence included in the review is Level IV: Case–control studies assessing associations between exposure and outcome.

The following level of evidence was included in the selected articles: 37 level III retrospective cohort studies [ 6 , 8 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 40 , 48 , 51 ]; three level III diagnostic studies [ 20 , 21 , 22 , 54 ]; three level II prospective cohort studies [ 4 , 39 , 53 ]; one level II comparative studies [ 46 ]; three level IV cohort pilot studies [ 42 , 44 , 55 ], one level III multi-center retrospective study [ 47 ]. One study [ 43 ] did not present the level of evidence. All the characteristics are reported in Tables 1 , 2 and 3 .

AI and ML methods

The following section reports the AI and ML methods identified in the reviewed articles. Each section includes the number of articles that used each AI or ML method, its corresponding AUC value, and the evaluated output variable. Table 4 classifies each article regarding the output variable studies and presents the highest AUC score for the respective article.

Random forest

RF is a decision trees-based algorithm introduced in the 2000s and capable of handling a variety of data types; its implementation in many medical fields is sustained by its high performance with large datasets and its ability to integrate both clinical and imaging data to achieve more accurate predictions compared to older models such as LR. This ML method operates by constructing and averaging a multitude of decision tress, a simpler ML method, with each of the tress randomly analyzing selected subset variations of the original data, the model is capable to analyze large and complex subset of data, resulting in a more resistant model to overfitting, while also adding diversity in the analysis. It was the most common AI method, applied in 38.77% of the reviewed articles. Mainly it was used to evaluate outcomes, one of them being a technical outcome: TKA component size prediction (femoral and tibial) [ 35 ]. Eight publications implemented RF for the evaluation of clinical outcomes, some of them being: achievement of Minimal Clinically Important Differences (MCIDs), prediction of Patient Reported Outcomes (PROs), prolonged postoperative opioid prescription, improvement of Knee injury and Osteoarthritis Outcome Score (KOOS) to one-year, dissatisfaction, assessment of sensitization in patients with chronic pain after TKA, etc. [ 4 , 6 , 20 , 26 , 32 , 33 , 45 , 53 ]. Only one article evaluated the post-walking limitation with RF, under the functional outcome category [ 7 , 56 ].

RF was also utilized to analyze the surgical technique by two articles [ 15 , 49 ], which considered the following outputs respectively: characterization of anatomical tissues and surgical corrections, the latter presenting the highest AUC (0.89) for this ML method. Postoperative length of stay (LOS) was predicted using RF only by one article [ 57 ], which presented an AUC of 0.71.

Another application of RF was regarding possible complications such as major complications after primary TKA, blood transfusion, surgical site infection, and disposition of patients at discharge [ 15 , 25 , 38 ]. Lastly, two reviewed articles implemented RF for predicting TKA risk depending on knee OA, evaluating both risk and time [ 16 , 27 ].

Gradient boosting machine

The ML model GBM gained popularity in the 2000s due to the model’s high predictive accuracy even in settings with mixed data types and missing values. GBM works by building decision tress sequentially, rather than in parallel like RF, with each of the tress correcting the predicting errors made by the previous ones. This results in the model being able to analyse complex relationships in data and producing an accurate prediction, even if lacking the randomized selection or diversity of the RF model. It can be used for both classification and regression due to its ability to produce new decision trees by correcting the errors of the previous predictions, gaining more accuracy than popularly used models such as SVM.

It was used by 18 studies, one employing it to predict TKA component size [ 35 ]. The highest AUC value was applied by an article that evaluated the development of acute kidney infection (AKI) after TKA, AUC: 0.89 [ 34 ]. Other studies that evaluated complications with GBM comprised the following outputs: major complications after primary TKA, blood transfusion after TKA, surgical site infection, and disposition of patients at discharge [ 8 , 17 , 38 ]. One study used GBM for the prediction of LOS after TKA [ 37 ], a different study employed this method to evaluate functional outcome: post-TKA walking limitations [ 7 ].

In addition, GB was used by 7 articles to evaluate clinical outcomes: prediction of patient satisfaction, achievement of MCIDs in KOOS 1 year after TKA, prediction of PROs, extended prescription of postoperative opioids, MCIDs attainment 2 years after TKA [ 6 , 18 , 19 , 22 , 26 , 32 , 33 , 53 ]. Only one study evaluated the use of SGB to predict the risk of TKA in comparison to other ML models, resulting in the highest performance together with RF among the algorithms observed, with an AUC: 0.83 [ 16 ].

Artificial Neural Network (ANN) /Multilayer perceptron

Although it originated in the 1940s, the ANN model gained prominence in the 2010s due to the application of deep learning in modeling complex relationships, making it suitable for a wide range of applications. ANN is a computational algorithm consisting of interconnected nodes organized in sequential layers, each analyzing the data to pass the prediction to the following one, mimicking the functioning of human neural network. This model was applied by 17 studies, one of them being for the prediction of LOS, inpatient charges, and discharge disposition before primary TKA [ 43 ]. Five articles analyzed clinical outcomes, the one having the highest AUC for this method (0.86) was regarding the prediction of PROs [ 26 ]; other outputs under this category were: prolonged postoperative opioid prescription, dissatisfaction after TKA, prediction of same-day discharge in patients undergoing TKA [ 6 , 32 , 33 , 50 ]. One article applied ANN for TKA component size prediction (femoral and tibial) [ 44 ], and another study applied it for procedural cost prediction for TKA [ 31 , 58 ].

Regarding complications, ANN was applied to evaluate the disposition of patients at discharge, post-surgical complications such as surgical site infection, and blood transfusion [ 38 ]. Additionally, two articles used this ML method to characterize tissues and surgical corrections based on patient-specific intra-operative assessment [ 15 , 49 ]. Another application of ANN, by four other articles, was related to future clinical intervention outputs: effect of opioid use in risk of knee revision and manipulation in the first year after primary TKA [ 59 ]; identification of influential factors before surgery, and prediction of the risk of TKA surgery [ 23 , 60 ].

Logistic regression

LR is a simply interpretable model for binary classification developed in the early twentieth century; being one of the oldest predictive models, its role is well established in the medical setting to estimate the probability of occurrence of different events. Although, it is to be considered that the advent of newer algorithms able to form wider and more complex associations between inputs and outputs causes this model to be more frequently relegated to a comparator role. The algorithm was used by 16 out of 49 articles. Four articles evaluated complications, which comprised the following outputs: disposition of patient at discharge, predictors of Allogenic Blood Transfusion (ALBT), and post-surgical complications [ 17 , 25 , 38 ]. The future clinical intervention was studied by three articles, specifically regarding the risk and time for a TKA in a patient presenting knee OA [ 27 ]. One article used this machine learning method for TKA component size prediction [ 35 ], and a different publication used it to evaluate post-TKA walking limitations, a type of functional outcome [ 7 ].

Regarding clinical outcome, LR was applied by 7 articles to study: achievement of MCIDs in KOOS 1 year after TKA, extended opioid prescription post-surgery, dissatisfaction after TKA, assessment of sensitization in patients with chronic pain after TKA, prediction of same-day discharge in patients undergoing TKA, and prediction of PROs [ 6 , 22 , 26 , 32 , 33 , 45 , 50 ]. The article that presented the highest AUC (0.88) evaluated the probability of TKA within 5 years [ 47 ].

Support vector machine

SVM is an effective model which can be used for both classification and regression; developed in the 1960s it still is one of the most popular algorithms used to classify disease progression based on imaging data. However, due to its low accuracy in performances with noisy datasets, newly developed algorithms such as K-Nearest Neighbors (kNN) are gaining prominence in this role. SVM is particularly effective when the number of features exceeds the number of samples in the data, being able to handle both linear and non-linear relationships in data. It was used by 13 articles, one of them evaluating the prediction of LOS and complications after TKA [ 29 , 51 ]. Mainly to assess clinical outcomes such as: prolonged postoperative opioid prescription [ 32 ]; improvement of KOOS one year after TKA [ 33 ]; dissatisfaction after TKA [ 6 ]; attainment of MCIDs 2 years after TKA [ 20 , 53 ]. SVM was also employed to analyze subtask segmentation of the TUG test for perioperative TKA [ 24 ]; Risk and Time of TKA in patients with knee OA [ 16 , 27 ]; surgical corrections based on patient-specific intra-operative evaluation [ 49 ]. Additionally, one article used the algorithm to evaluate the characterization of tissues [ 15 , 60 ] while another applied SVM in component sizing for TKA [ 35 ].

Other AI models

Two AI models were employed to evaluate major complications after primary TKA [ 17 ]: AutoPrognosis (AP) and AdaBoost. The ML method Decision tree was utilized in two studies for the analysis of the following outputs: gait comparison between UKA and TKA patient [ 30 ], and subtask segmentation of TUG test for perioperative TKA, the latter also being assessed by the methods: AdaBoost, kNN, Naïve Bayes Classifier (NB) [ 24 ].

Regarding the analysis of post-TKA walking limitation, the model SuperLearner was used [ 7 ]. Both the Cox-PH model and DeepSurv model were used to predict the risk and time of TKA in patients with knee osteoarthritis [ 27 ]; an Ensemble Deep Learning (DL) model based on the use of MRI and radiograph was also compared with traditional ML algorithms to predict the risk of TKA, obtaining promising results [ 40 ]. The prediction of PROs was assessed by the models: NB, kNN, and Multi-Step Adaptive Elastic-Net (MSAENET) [ 26 ].

The models Quadratic Discriminant Analysis (QDA) and LASSO regression were employed to evaluate MCIDs attainment after TKA in different periods. One of the studies made the assessment 1 year after TKA [ 22 ], other two articles made the evaluation 2 years after TKA [ 20 , 53 ]. LASSO regression was also used to analyze mortality and complication after TKA, such as respiratory, cardiovascular, and nervous system and renal complications [ 21 ]. Regarding the prediction of clinical outcomes, the new Skeletal Oncology Research Group Machine Learning Algorithm (SORG-MLA) was validated for the identification of patients at risk of prolonged postoperative opioid use after TKA, obtaining an AUC: 0.75 [ 48 ].

Moreover, the models Linear Discriminant Analysis (LDA), Recursive Partitioning (RP), and NB were employed for the assessment of sensitization in patients with chronic pain after TKA [ 1 ]. The prediction of procedural cost after TKA, the DenseNet was used, presenting an AUC score of 0.813 [ 31 ].

Natural Language Processing Method (NLPM) was utilized to assess surgical technique, using the following outputs: category of surgery, implant model, presence of patellar resurfacing, constraint type, and laterality of surgery [ 46 ]. NLPM was also used to estimate ITS data [ 4 ] and analyze the alteration that opioid use can have in risk of knee revision and manipulation in the first year after primary TKA [ 12 ].

Lastly, the Stochastic Hill Climbing Complexity score was for the prediction of surgical 90-day morbidity, mortality, and complications [ 11 ]. NB was employed to analyze inpatient cost and LOS after TKA [ 32 , 45 ].

Quality assessment by modified MINORS

All 49 of the reviewed articles were evaluated following the modified MINORS checklist to assess quality and risk of bias. All 49 articles clearly reported the study aim, however, 11 studies failed to report the performance metric. Two publications did not report the output feature, while 46 of the studies clearly stated the input feature, and 45 of the articles indicated disclosure. Regarding the item AI model, 45 of the reviewed articles fulfilled this criterion. These findings showed a relatively high degree of quality and low likelihood of bias, only two of the reviewed articles received a score of 5/8, five articles with 6/8 as a score, and the majority, 42 out of 49 publications, scored 7/8 and higher (Table  5 ).

This systematic review evaluated the possible uses of AI/ML models in TKA, highlighting their potential in improving decision-making, component sizing, inpatient costs, perioperative planning, and streamlining the surgical workflow. Implementing these prediction models in TKA can ultimately lead to more accurate predictions, less time-consuming data processing, and higher precision in identifying patterns, all while minimizing user input bias to provide risk-based patient-specific care.

A key finding was the benefits of RF in aiding surgical decision-making when applied in intraoperatively collected surface models and patient-specific intraoperative assessments. RF outperformed both ANN and SVM not only when categorizing various types of anatomical tissue [ 15 ], but also when identifying populations at risk for TKA [ 16 ], and assessing balance and alignment during TKA surgery, aiding the surgeon regarding the optimal choice for the suitable bone recut or soft tissue adjustment [ 49 , 61 ]. This review highlights how the application of RF in all the steps leading to TKA, perioperative and postoperative care can lead to optimal clinical and surgical outcomes, while reducing complications thanks to patient-specific planning. Moreover, by streamlining the surgical workflow and helping to select surgical corrections, this AI model can overcome the risk of data overload and the challenge of data interpretation, while being fast, cost-efficient, and accurate.

The SGB model presented promising results in the Kunze et al. (2021) study, by outranking RF, SVM, and EPLR for the prediction of the component sizing of the implant used in TKA. This model demonstrated the best overall performance regarding minimizing prediction error and maximizing accuracy for both femoral and tibial implant component size prediction. A potential benefit is an ability to predict final component sizes of the prosthetic without reliance on digital or manual templating, therefore being faster than traditional methods. Also, showing good performance across different TKA component manufacturers, streamlining component selection processes, improving inventory control, and reducing shipping costs [ 35 , 62 ].

Regarding prediction models for allogenic blood transfusion, the highest AUC score was reported by the RF and SVM-based models [ 25 ]. With a slightly lower difference of 0.038 in the AUC score, the ANN-based model was still significantly higher than the classic prediction models [ 38 ]. Overall, these results show how the implementation of various ML-based models can result in an improvement of peri-operative complications predictions, ensuring that the identified population at risk, for blood transfusion, receives proper care while also optimizing the operative process and reducing the risk of prolonged LOS, caused by complications, such as blood transfusion, during TKA.

A further finding is the already established importance of LR models when used in healthcare settings, which can lead to the development of patient-specific care and peri-operative planning. The most successful result of LR (AUC 0.88) was achieved by its implementation, together with DenseNet, in identifying a population at higher risk of TKA within 5 years, particularly at less advanced stages of OA [ 47 ]; although, in the more recent study published by Crawford et al. in 2023, compared to other models such as SGB and RF (AUC: 0.83), EPLR scored a lower performance in identification of population at risk of TKA [ 16 ]. Additionally, implementing LR with other models, like the ML-based remote patient monitoring system, can reduce the need for TKA revision, while acquiring continuous data for patients undergoing TKA, in terms of mobility and rehabilitation compliance. This patient monitoring system proved to be reliable, low-maintenance, and a well-received platform for the patient recovering from TKA [ 42 ]. Implementing LR models would result in higher objectivity, cost-effectiveness, and ability to acquire continuous data, together with higher accuracy in identifying at-risk population, overall increasing the success rate for TKA.

Financial aspects are to be considered when proposing a treatment plan to patients, as complications can arise during the surgery and recovery, drastically changing the cost expected beforehand. Although it was shown to be an important element to consider when planning peri-operative care during TKA, the cost-prediction outcome was only analysed in one article. Demonstrating high accuracy when used in clinical medicine, the DenseNet model [ 31 , 63 ] can optimize and provide a cost-efficient organization of resources that can benefit the medical staff by reducing their workload and improving the quality of the arrangement of resources. Simultaneously, this method can identify populations at risk for complications, a benefit that would help reduce the higher cost of the procedure after TKA, making it possible to implement patient-specific payment plans benefitting both patients and healthcare providers.

Going over the performances of the GBM model analysed in different articles, we can observe how this algortihm is simple and efficient, it has been validated to improve both short- and long-term prognoses of TKA patients. Ko et al. successfully used this AI model for the prediction of the development of postoperative AKI after TKA, which can not only increase LOS but also be life-threatening [ 34 ]; while TreeNet GBM proved to be the most successful method when applied for predictors regarding patient satisfaction [ 18 ]. Additionally, GBM showed great results when predicting the disposition of patients at discharge [ 38 ], therefore the model’s implementation could improve the overall patient satisfaction and recovery rate post-TKA, while also assuring patient-specific peri-operative care is applied to prevent and manage possible complications.

Looking at more novel models less implemented up until recently in the healthcare settings, the following AI/ML models: DL-TL-MT, SVM, Deep Surv, and Cox-PH, proved to be of great use to individuate the population of patients at risk and develop patient-specific care. The DL-TL-MT model successfully predicted the risk of OA progression based on knee radiographs in patients that previously underwent TKA [ 36 ]. Presenting the same AUC level (of 0.87), the methods SVM, Deep Surv, and Cox-PH were successfully employed to predict the risk and time of TKA of an OA knee [ 27 ]. The implementation methods prove to be indispensable in predicting the progression of OA, even at an early stage. This ML-based model has great potential as a diagnostic tool for physicians when determining the prognosis for patient at all stages of OA, allowing for early intervention through TKA where needed, therefore reducing the risk of complications and of TKA revision.

The SVM predictor model showed also a very promising results when applied in the different settings, and especially for the segmentation of the TUG test and extraction of information from each subtask perioperative to TKA, solving the problems regarding subjectiveness and other biases [ 24 , 64 ]. The benefits that come with the usage of this AI model would be a more precise segmentation and therefore data extraction, which results in further understanding and classification of improvements in patients, leading to the employment of patient-specific treatments and rehabilitation models.

Looking at the results of the different articles involved in the review, the emergence of ML models in the medical setting becomes an evident matter: most data corroborates the idea that novel AI models present better results and predictive powers, compared to traditional models, when identifying predictors of TKA and analyzing multiple outcomes simultaneously. In the prediction of complications after primary TKA, Devana et al. prove the superiority of AP, compared to traditional models, regarding the discriminative ability and the capability to suggest nonlinear relationships between variables in the outcomes of TKA. Consequently, AP can be a versatile tool that may be utilized for the identification of crucial patient characteristics when predicting outcomes across a variety of datasets, thereby improving the patient outcomes [ 17 ]. Additionally, Harris et al. demonstrated how AI can produce preoperative prediction models for one-year improvement in pain and functioning after TKA; and how the GBM model, which performs well in important interactions, and the QDA model, which performs better in nonlinear association, can be applied to produce an easy-to-use predictive model able to achieve similar or better accuracy with far fewer inputs in respect to traditional predictive models [ 22 ].

Lastly, the NLPM model presents great potential as a newly emerging algorithm, in particular when applied in clinical settings for the interpretation of a text, which has been applied in different studies for the classification of patient satisfaction [ 14 ], knee revisions after TKA due to preoperative opioid use [ 12 ], and for the processing of clinical free text from electronic health records [ 46 ]. The strength of this ML-based model relies on its ability to automate the extraction of embedded information in perioperative notes and patient-centered surveys, decreasing the need for costly manual chart reviewing and improving data quality while being less time-consuming. The use of this model would improve patient feedback and perioperative notes to better patient-specific risk-based care resulting in higher patient satisfaction and a reduction in costs for the healthcare system due to possible lawsuits [ 65 ], together with the reduction of the cost due to manual chart reviews [ 46 ].

Like both the Hinterwimmer et al. 2021, and the Lee et al. 2022 review, this systematic review confirms the great potential of AI/ML methods and their application in orthopedics for cost predictions, diagnostic applications, and identification of risk factors, while also clearing the doubts regarding the inaccuracy and lack of sufficient evaluation of these models. In comparison, this review analyzed 49 articles, including the publications already examined in previous reviews. This more extensive research concluded that not only is it possible to implement these models in the prediction of TKA perioperative care, disease progression of OA, and distinct outcomes applying specific data, but also the prediction of more complex outcomes is now feasible through the application of more novel AI/ML algorithms [ 13 , 17 , 21 , 22 , 27 , 30 ]. Although, as mentioned in several studies, further research may enhance the reliability of AI/ML models and allow for their use in patient preoperative and perioperative care [ 8 , 11 , 19 , 21 , 43 , 50 ].

Limitations

The main limitation of this review derives from the possible bias of information regarding the performance of the different AI models, which, as highlighted by the MINORS table, results as the most at-risk parameter due to the omission by several articles of either AUC score or Accuracy score for the different predictive models examined. Moreover, many of the studies included in this review are retrospective studies obtaining the data, regarding the patients for the testing of the AI/ML prediction models, from national databases and electronic health recordings; limitations by the lack of detailed clinical information, potential misclassification of data, and in many cases a small cohort of patients presenting limited characteristics from which to derive input and compare outputs, which may lead to the results not being generalizable to all patient populations [ 11 , 19 , 21 ]. Validation of analyzed predictive models on larger populations of patients is needed. Lastly, due to the heterogeneity between data, it was not possible to perform a meta-analysis.

Regarding the implementation of AI/ML models in TKA, the articles in this review mostly consider these predictive models to be helpful and suggest that their application in medical settings for perioperative TKA clinical decision-making and prediction of the progression of OA into TKA may result in an improvement of patient satisfaction, risk managing, and cost efficiency. Among the best qualities, for which the AI/ML models outperform the traditional prediction models, frequently reported higher accuracy, cost efficiency, simple application, lack of subjectiveness, and overall reduction of time consumption thanks to the automation of tasks. Therefore, it is possible to conclude that, although the results of the reviewed articles should be further validated by their testing on larger cohorts of patients, the findings of these articles highlight the great potentials that derive from the inclusion of AI/ML predictive models in a further branch of medicine.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Adaptive Boosting

Artificial Intelligence

Acute Kidney Infection

Allogenic Blood Transfusion

Artificial Neural Network

Artificial Neural Network – Transfer Learning – multitask

AutoPrognosis

All patient refined

American society of anesthesiologists’ physical status

America Society of Anesthesiologists

Area Under the Receiver Operating Characteristic Curve

Body mass index

Blood pressure

Blood transfusion

Cartilage and bone classification

Charlson comorbidity score

Centers for Medicare & Medicaids Services

Cox proportional hazards

Cohort pilot study

Comparative study

Deep Convolutional Neural Network

Discharge disposition

Densely Connected Convolutional Network

Deep Learning

Disposition of patient

Decision tree

Elastic-net Penalized Logistic Regression

European Society of Sports Traumatology, Knee Surgery and Arthroscopy

Gradient Boosting Machine

Heamoglobin

Home exercise program

Hip-knee-angle

Impatient charges/costs

Knee Artrhoplasty

K-Nearest Neighbors

Knee Injury and Osteoarthritis Outcome Score

Knee injury and Osteoarthritis Outcome Score for Joint Replacement

Knee society score function

Knee society score

Least Absolute Shrinkage and Selection Operator

Linear Discriminant Analysis

Length of stay

Logistic Regression

Muscle, cartilage, bone classification

Minimally clinically important differences

Multi center retrospective study

Mechanical circulatory support

Methodological Index for Non-Randomized Studies

  • Machine Learning

Multilayer perceptron

Multi-Step Adaptive Elastic NETwork

Muscle, tendon, bone, cartilage classification

Multi-Task Logistic Regression

Näive-Bayes

National Health Service

National (Nationwide) Inpatient Sample database

Natural Language Processing Method

Neural Network

Non-Linear-Group Method of Data Handling

Non-steroidal anti-inflammatory drugs

National Surgical Quality Improvement Program

Osteoarthritis

Osteoarthritis initiative

Orthopedic Minimal Data Set database

Office of Statewide Health Planning and Development

Physical activity scale for the Elderly

Posterior cruciate ligament

Physical component summary

Preoperative patient-reported health state

Population Intervention Comparison Outcome

Prosthetic Joint Infection

Preferred Reporting Items for Systematic reviews and Meta-Analysis

Patient reported outcome measures

Prediction of patient-reported outcomes

Pilot study

Quadrant Discriminant Analysis

Quality of Life

Retrospective cohort study

Randomized control trial

Random Forest

Recursive Partitioning

Remote Patient Monitoring

Standard analytical files

Short form – physical component summary

Stochastic Gradient Boosting

Stochastic Hill Climbing

Skeletal Oncology Research Group Machine Learning Algorithm

State-wide Planning and Research Cooperative System

Support Vector Machines

Total Knee Arthroplasty

Timed Up-and-Go

University of California Los Angeles

Unicompartmental Knee Arthroplasty

Veteran’s affairs

Visual analog scale

World Health Organization

Western Ontario and McMaster Universities Osteoarthritis Index

EXtreme Gradient Boosting

Cabitza F, Locoro A, Banfi G. Machine learning in orthopedics: a literature review. Front Bioeng Biotechnol. 2018;6:75.

Article   PubMed   PubMed Central   Google Scholar  

Lee LS, Chan PK, Wen C, Fung WC, Cheung A, Chan VWK, Cheung MH, Fu H, Yan CH, Chiu KY. Artificial intelligence in diagnosis of knee osteoarthritis and prediction of arthroplasty outcomes: a review. Arthroplasty. 2022;4(1):16.

Martín Noguerol T, Paulano-Godino F, Martín-Valdivia MT, Menias CO, Luna A. Strengths, weaknesses, opportunities, and threats analysis of artificial intelligence and machine learning applications in radiology. J Am Coll Radiol. 2019;16(9 Pt B):1239–47.

Article   PubMed   Google Scholar  

Shohat N, Goswami K, Tan TL, Yayac M, Soriano A, Sousa R, Wouthuyzen-Bakker M, Parvizi J. (NINJA) ESGoIAIEatNINoJA: 2020 Frank Stinchfield award: identifying who will fail following irrigation and debridement for prosthetic joint infection. Bone Joint J. 2020;102-B(7_Supple_B):11–9.

Hinterwimmer F, Lazic I, Suren C, Hirschmann MT, Pohlig F, Rueckert D, Burgkart R, von Eisenhart-Rothe R. Machine learning in knee arthroplasty: specific data are key-a systematic review. Knee Surg Sports Traumatol Arthrosc. 2022;30(2):376–88.

Kunze KN, Polce EM, Sadauskas AJ, Levine BR. Development of machine learning algorithms to predict patient dissatisfaction after primary total knee arthroplasty. J Arthroplasty. 2020;35(11):3117–22.

Pua YH, Kang H, Thumboo J, Clark RA, Chew ES, Poon CL, Chong HC, Yeo SJ. Machine learning methods are comparable to logistic regression techniques in predicting severe walking limitation following total knee arthroplasty. Knee Surg Sports Traumatol Arthrosc. 2020;28(10):3207–16.

Jo C, Ko S, Shin WC, Han HS, Lee MC, Ko T, Ro DH. Transfusion after total knee arthroplasty can be predicted using the machine learning algorithm. Knee Surg Sports Traumatol Arthrosc. 2020;28(6):1757–64.

Ogink PT, Groot OQ, Karhade AV, Bongers MER, Oner FC, Verlaan JJ, Schwab JH. Wide range of applications for machine-learning prediction models in orthopedic surgical outcome: a systematic review. Acta Orthop. 2021;92(5):526–31.

Bonakdari H, Pelletier JP, Martel-Pelletier J. A reliable time-series method for predicting arthritic disease outcomes: new step from regression toward a nonlinear artificial intelligence method. Comput Methods Programs Biomed. 2020;189:105315.

Hyer JM, White S, Cloyd J, Dillhoff M, Tsung A, Pawlik TM, Ejaz A. Can we improve prediction of adverse surgical outcomes? Development of a surgical complexity score using a novel machine learning technique. J Am Coll Surg. 2020;230(1):43–52.e41.

Ben-Ari A, Chansky H, Rozet I. Preoperative opioid use is associated with early revision after total knee arthroplasty: a study of male patients treated in the veterans affairs system. J Bone Joint Surg Am. 2017;99(1):1–9.

Bloomfield RA, Williams HA, Broberg JS, Lanting BA, McIsaac KA, Teeter MG. Machine learning groups patients by early functional improvement likelihood based on wearable sensor instrumented preoperative timed-up-and-go tests. J Arthroplasty. 2019;34(10):2267–71.

Bovonratwet P, Shen TS, Islam W, Ast MP, Haas SB, Su EP. Natural language processing of patient-experience comments after primary total knee arthroplasty. J Arthroplasty. 2021;36(3):927–34.

Chan B, Rudan JF, Mousavi P, Kunz M. Intraoperative integration of structured light scanning for automatic tissue classification: a feasibility study. Int J Comput Assist Radiol Surg. 2020;15(4):641–9.

Crawford AM, Karhade AV, Agaronnik ND, Lightsey HM, Xiong GX, Schwab JH, Schoenfeld AJ, Simpson AK. Development of a machine learning algorithm to identify surgical candidates for hip and knee arthroplasty without in-person evaluation. Arch Orthop Trauma Surg. 2023;143(9):5985–92.

Devana SK, Shah AA, Lee C, Roney AR, van der Schaar M, SooHoo NF. A novel, potentially universal machine learning algorithm to predict complications in total knee arthroplasty. Arthroplast Today. 2021;10:135–43.

Farooq H, Deckard ER, Ziemba-Davis M, Madsen A, Meneghini RM. Predictors of patient satisfaction following primary total knee arthroplasty: results from a traditional statistical model and a machine learning algorithm. J Arthroplasty. 2020;35(11):3123–30.

Farooq H, Deckard ER, Arnold NR, Meneghini RM. Machine learning algorithms identify optimal sagittal component position in total knee arthroplasty. J Arthroplasty. 2021;36(7S):S242–9.

Fontana MA, Lyman S, Sarker GK, Padgett DE, MacLean CH. Can machine learning algorithms predict which patients will achieve minimally clinically important differences from total joint arthroplasty? Clin Orthop Relat Res. 2019;477(6):1267–79.

Harris AHS, Kuo AC, Weng Y, Trickey AW, Bowe T, Giori NJ. Can machine learning methods produce accurate and easy-to-use prediction models of 30-day complications and mortality after knee or hip arthroplasty? Clin Orthop Relat Res. 2019;477(2):452–60.

Harris AHS, Kuo AC, Bowe TR, Manfredi L, Lalani NF, Giori NJ. Can machine learning methods produce accurate and easy-to-use preoperative prediction models of one-year improvements in pain and functioning after knee arthroplasty? J Arthroplasty. 2021;36(1):112–117.e116.

Heisinger S, Hitzl W, Hobusch GM, Windhager R, Cotofana S. Predicting total knee replacement from symptomology and radiographic structural change using artificial neural networks-data from the osteoarthritis initiative (OAI). J Clin Med. 2020;9(5):1298.

Hsieh CY, Huang HY, Liu KC, Chen KH, Hsu SJ, Chan CT. Subtask segmentation of timed up and go test for mobility assessment of perioperative total knee arthroplasty. Sensors (Basel). 2020;20(21):6302.

Huang Z, Huang C, Xie J, Ma J, Cao G, Huang Q, Shen B, Byers Kraus V, Pei F. Analysis of a large data set to identify predictors of blood transfusion in primary total hip and knee arthroplasty. Transfusion. 2018;58(8):1855–62.

Huber M, Kurz C, Leidl R. Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning. BMC Med Inform Decis Mak. 2019;19(1):3.

Jamshidi A, Pelletier JP, Labbe A, Abram F, Martel-Pelletier J, Droit A. Machine learning-based individualized survival prediction model for total knee replacement in osteoarthritis: data from the osteoarthritis initiative. Arthritis Care Res (Hoboken). 2021;73(10):1518–27.

Jayakumar P, Moore MG, Furlough KA, Uhler LM, Andrawis JP, Koenig KM, Aksan N, Rathouz PJ, Bozic KJ. Comparison of an artificial intelligence-enabled patient decision aid vs educational material on decision quality, shared decision-making, patient experience, and functional outcomes in adults with knee osteoarthritis: a randomized clinical trial. JAMA Netw Open. 2021;4(2):e2037107.

Johannesdottir KB, Kehlet H, Petersen PB, Aasvang EK, Sørensen HBD, Jørgensen CC, Group CfF-tHaKRC. Machine learning classifiers do not improve prediction of hospitalization > 2 days after fast-track hip and knee arthroplasty compared with a classical statistical risk model. Acta Orthop. 2022;93:117–23.

Jones GG, Kotti M, Wiik AV, Collins R, Brevadt MJ, Strachan RK, Cobb JP. Gait comparison of unicompartmental and total knee arthroplasties with healthy controls. Bone Joint J. 2016;98-B(10 Supple B):16–21.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Karnuta JM, Navarro SM, Haeberle HS, Helm JM, Kamath AF, Schaffer JL, Krebs VE, Ramkumar PN. Predicting inpatient payments prior to lower extremity arthroplasty using deep learning: which model architecture is best? J Arthroplasty. 2019;34(10):2235-2241.e2231.

Katakam A, Karhade AV, Schwab JH, Chen AF, Bedair HS. Development and validation of machine learning algorithms for postoperative opioid prescriptions after TKA. J Orthop. 2020;22:95–9.

Katakam A, Karhade AV, Collins A, Shin D, Bragdon C, Chen AF, Melnic CM, Schwab JH, Bedair HS. Development of machine learning algorithms to predict achievement of minimal clinically important difference for the KOOS-PS following total knee arthroplasty. J Orthop Res. 2022;40(4):808–15.

Ko S, Jo C, Chang CB, Lee YS, Moon YW, Youm JW, Han HS, Lee MC, Lee H, Ro DH. A web-based machine-learning algorithm predicting postoperative acute kidney injury after total knee arthroplasty. Knee Surg Sports Traumatol Arthrosc. 2022;30(2):545–54.

Kunze KN, Polce EM, Patel A, Courtney PM, Levine BR. Validation and performance of a machine-learning derived prediction guide for total knee arthroplasty component sizing. Arch Orthop Trauma Surg. 2021;141(12):2235–44.

Leung K, Zhang B, Tan J, Shen Y, Geras KJ, Babb JS, Cho K, Chang G, Deniz CM. Prediction of total knee replacement and diagnosis of osteoarthritis by using deep learning on knee radiographs: data from the osteoarthritis initiative. Radiology. 2020;296(3):584–93.

Li H, Jiao J, Zhang S, Tang H, Qu X, Yue B. Construction and comparison of predictive models for length of stay after total knee arthroplasty: regression model and machine learning analysis based on 1,826 cases in a single Singapore center. J Knee Surg. 2022;35(1):7–14.

Mohammed H, Huang Y, Memtsoudis S, Parks M, Ma Y. Utilization of machine learning methods for predicting surgical outcomes after total knee arthroplasty. PLoS One. 2022;17(3):e0263897.

Navarro SM, Wang EY, Haeberle HS, Mont MA, Krebs VE, Patterson BM, Ramkumar PN. Machine learning and primary total knee arthroplasty: patient forecasting for a patient-specific payment model. J Arthroplasty. 2018;33(12):3617–23.

Rajamohan HR, Wang T, Leung K, Chang G, Cho K, Kijowski R, Deniz CM. Prediction of total knee replacement using deep learning analysis of knee MRI. Sci Rep. 2023;13(1):6922.

Ramazanian T, Yan S, Rouzrokh P, Wyles CC, O Byrne TJ, Taunton MJ, Maradit Kremers H. Distribution and correlates of hip-knee-ankle angle in early osteoarthritis and preoperative total knee arthroplasty patients. J Arthroplasty. 2022;37(6S):S170–5.

Ramkumar PN, Haeberle HS, Ramanathan D, Cantrell WA, Navarro SM, Mont MA, Bloomfield M, Patterson BM. Remote patient monitoring using mobile health for total knee arthroplasty: validation of a wearable and machine learning-based surveillance platform. J Arthroplasty. 2019;34(10):2253–9.

Ramkumar PN, Karnuta JM, Navarro SM, Haeberle HS, Scuderi GR, Mont MA, Krebs VE, Patterson BM. Deep learning preoperatively predicts value metrics for primary total knee arthroplasty: development and validation of an artificial neural network model. J Arthroplasty. 2019;34(10):2220–2227.e2221.

Rexwinkle JT, Werner NC, Stoker AM, Salim M, Pfeiffer FM. Investigating the relationship between proteomic, compositional, and histologic biomarkers and cartilage biomechanics using artificial neural networks. J Biomech. 2018;80:136–43.

Sachau J, Otto JC, Kirchhofer V, Larsen JB, Kennes LN, Hüllemann P, Arendt-Nielsen L, Baron R. Development of a bedside tool-kit for assessing sensitization in patients with chronic osteoarthritis knee pain or chronic knee pain after total knee replacement. Pain. 2022;163(2):308–18.

Sagheb E, Ramazanian T, Tafti AP, Fu S, Kremers WK, Berry DJ, Lewallen DG, Sohn S, Maradit Kremers H. Use of natural language processing algorithms to identify common data elements in operative notes for knee arthroplasty. J Arthroplasty. 2021;36(3):922–6.

Tolpadi AA, Lee JJ, Pedoia V, Majumdar S. Deep learning predicts total knee replacement from magnetic resonance images. Sci Rep. 2020;10(1):6371.

Tsai CC, Huang CC, Lin CW, Ogink PT, Su CC, Chen SF, Yen MH, Verlaan JJ, Schwab JH, Wang CT, et al. The Skeletal Oncology Research Group Machine Learning Algorithm (SORG-MLA) for predicting prolonged postoperative opioid prescription after total knee arthroplasty: an international validation study using 3,495 patients from a Taiwanese cohort. BMC Musculoskelet Disord. 2023;24(1):553.

Verstraete MA, Moore RE, Roche M, Conditt MA. The application of machine learning to balance a total knee arthroplasty. Bone Jt Open. 2020;1(6):236–44.

Wei C, Quan T, Wang KY, Gu A, Fassihi SC, Kahlenberg CA, Malahias MA, Liu J, Thakkar S, Gonzalez Della Valle A, et al. Artificial neural network prediction of same-day discharge following primary total knee arthroplasty based on preoperative and intraoperative variables. Bone Joint J. 2021;103(8):1358–66.

Yeo I, Klemt C, Robinson MG, Esposito JG, Uzosike AC, Kwon YM. The use of artificial neural networks for the prediction of surgical site infection following TKA. J Knee Surg. 2023;36(6):637–43.

Yi PH, Wei J, Kim TK, Sair HI, Hui FK, Hager GD, Fritz J, Oni JK. Automated detection & classification of knee arthroplasty using deep learning. Knee. 2020;27(2):535–42.

Zhang S, Lau BPH, Ng YH, Wang X, Chua W. Machine learning algorithms do not outperform preoperative thresholds in predicting clinically meaningful improvements after total knee arthroplasty. Knee Surg Sports Traumatol Arthrosc. 2022;30(8):2624–30.

Longo UG, De Salvatore S, Intermesoli G, Pirato F, Piergentili I, Becker R, Denaro V. Metaphyseal cones and sleeves are similar in improving short- and mid-term outcomes in total knee arthroplasty revisions. Knee Surg Sports Traumatol Arthrosc. 2023;31(3):861–82.

Hinterwimmer F, Lazic I, Langer S, Suren C, Charitou F, Hirschmann MT, Matziolis G, Seidl F, Pohlig F, Rueckert D, et al. Prediction of complications and surgery duration in primary TKA with high accuracy using machine learning with arthroplasty-specific data. Knee Surg Sports Traumatol Arthrosc. 2023;31(4):1323–33.

Longo UG, Maffulli N, Denaro V. Minimally invasive total knee arthroplasty. N Engl J Med. 2009;361(6):633–4 author reply 634.

Article   PubMed   CAS   Google Scholar  

Lambrechts A, Wirix-Speetjens R, Maes F, Van Huffel S. Artificial intelligence based patient-specific preoperative planning algorithm for total knee arthroplasty. Front Robot AI. 2022;9:840282.

Berton A, Longo UG, Candela V, Fioravanti S, Giannone L, Arcangeli V, Alciati V, Berton C, Facchinetti G, Marchetti A, et al. Virtual Reality, Augmented Reality, Gamification, and Telerehabilitation: Psychological Impact on Orthopedic Patients' Rehabilitation. J Clin Med. 2020;9(8).

Goplen CM, Kang SH, Randell JR, et al. Effect of preoperative long-term opioid therapy on patient outcomes after total knee arthroplasty: an analysis of multicentre population-based administrative data. Can J Surg. 2021;64(2):E135–43. https://doi.org/10.1503/cjs.007319 .

Google Scholar  

Longo UG, Ciuffreda M, Mannering N, D’Andrea V, Cimmino M, Denaro V. Patellar resurfacing in total knee arthroplasty: systematic review and meta-analysis. J Arthroplasty. 2018;33(2):620–32.

Bravi M, Longo UG, Laurito A, Greco A, Marino M, Maselli M, Sterzi S, Santacaterina F. Supervised versus unsupervised rehabilitation following total knee arthroplasty: a systematic review and meta-analysis. Knee. 2023;40:71–89.

Longo UG, Silva S, Perdisa F, Salvatore G, Filardo G, Berton A, Piergentili I, Denaro V. Gender related results in total knee arthroplasty: a 15-year evaluation of the Italian population. Arch Orthop Trauma Surg. 2023;143(3):1185–92.

Longo UG, Loppini M, Trovato U, Rizzello G, Maffulli N, Denaro V. No difference between unicompartmental versus total knee arthroplasty for the management of medial osteoarthtritis of the knee in the same patient: a systematic review and pooling data analysis. Br Med Bull. 2015;114(1):65–73.

Longo UG, Ciuffreda M, Mannering N, D’Andrea V, Locher J, Salvatore G, Denaro V. Outcomes of posterior-stabilized compared with cruciate-retaining total knee arthroplasty. J Knee Surg. 2018;31(4):321–40.

Stelfox HT, Gandhi TK, Orav EJ, Gustafson ML. The relation of patient satisfaction with complaints against physicians and malpractice lawsuits. Am J Med. 2005;118(10):1126–33.

Download references

Acknowledgements

Not applicable.

This research received no external funding.

Author information

Authors and affiliations.

Fondazione Policlinico Universitario Campus Bio-Medico, Via Alvaro del Portillo, Rome, 200 - 00128, Italy

Umile Giuseppe Longo

Department of Medicine and Surgery, Research Unit of Orthopaedic and Trauma Surgery, Università Campus Bio-Medico Di Roma, Via Alvaro del Portillo, Rome, 21 - 00128, Italy

Umile Giuseppe Longo, Federica Valente, Mariajose Villa Corta & Kristian Samuelsson

IRCCS Ospedale Pediatrico Bambino Gesù, Rome, Italy

Sergio De Salvatore

Orthopedic Unit, Department of Surgery, Bambino Gesù Children’s Hospital, Rome, Italy

Orthopaedic Department, Clinical Institute Sant’Ambrogio, IRCCS - Galeazzi, Milan, Italy

Bruno Violante

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, U.G.L., S.D.S.; methodology, U.G.L., F.V; software.B.V, K.S; validation, U.G.L., B.V.; formal analysis, M.V.C, K.S, F.V; resources, U.G.L., S.DS; data curation, F.V ; writing—original draft preparation M.V.C, B.V., K.S., F.V.; writing—review and editing, M.V.C, U.G.L., ; visualization, K.S., F.M ; supervision, U.G.L.,  All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Umile Giuseppe Longo .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Longo, U.G., De Salvatore, S., Valente, F. et al. Artificial intelligence in total and unicompartmental knee arthroplasty. BMC Musculoskelet Disord 25 , 571 (2024). https://doi.org/10.1186/s12891-024-07516-9

Download citation

Received : 16 October 2023

Accepted : 13 May 2024

Published : 22 July 2024

DOI : https://doi.org/10.1186/s12891-024-07516-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence
  • Orthopaedics
  • Joint replacement
  • Knee replacement

BMC Musculoskeletal Disorders

ISSN: 1471-2474

how to formulate a research question systematic review

IMAGES

  1. When Formulating a Research Question a Researcher Should

    how to formulate a research question systematic review

  2. Formulate a specific question

    how to formulate a research question systematic review

  3. A Step by Step Guide for Conducting a Systematic Review

    how to formulate a research question systematic review

  4. How to Write a Research Question in 2024: Types, Steps, and Examples

    how to formulate a research question systematic review

  5. Systematic Literature Review Methodology

    how to formulate a research question systematic review

  6. How to write a systematic literature review with examples from a TDH

    how to formulate a research question systematic review

VIDEO

  1. How to come up with semi structured interview questions for qualitative research

  2. Formulate Research Problems and Research Objectives: Undergraduate Research Methodology Course

  3. A Comprehensive Guide to Systematic Literature Review (SLR)

  4. How to formulate a review question I Nature Presentation I Episode 2

  5. Research Process-Research Idea, Objective, Questions and Hypothesis

  6. Systematic Literature Review

COMMENTS

  1. 1. Formulate the Research Question

    Step 1. Formulate the Research Question. A systematic review is based on a pre-defined specific research question (Cochrane Handbook, 1.1).The first step in a systematic review is to determine its focus - you should clearly frame the question(s) the review seeks to answer (Cochrane Handbook, 2.1).It may take you a while to develop a good review question - it is an important step in your review.

  2. Formulating a research question

    A systematic review should either specify definitions and boundaries around these elements at the outset, or be clear about which elements are undefined. ... Some mnemonics that sometimes help to formulate research questions, set the boundaries of question and inform a search strategy. Intervention effects. PICO Population ...

  3. Formulate Question

    A narrow and specific research question is required in order to conduct a systematic review. The goal of a systematic review is to provide an evidence synthesis of ALL research performed on one particular topic. Your research question should be clearly answerable from the studies included in your review. Another consideration is whether the ...

  4. Step 1: Formulating the research question

    The first stage in a review is formulating the research question. The research question accurately and succinctly sums up the review's line of inquiry. This page outlines approaches to developing a research question that can be used as the basis for a review. ... A modified approach to systematic review guidelines can be used for rapid reviews ...

  5. Systematic reviews: Formulate your question

    Defining the question. Defining the research question and developing a protocol are the essential first steps in your systematic review. The success of your systematic review depends on a clear and focused question, so take the time to get it right. A framework may help you to identify the key concepts in your research question and to organise ...

  6. LibGuides: Systematic Reviews: 2. Develop a Research Question

    Systematic Reviews. 2. Develop a Research Question. A well-developed and answerable question is the foundation for any systematic review. This process involves: Using the PICO framework can help team members clarify and refine the scope of their question. For example, if the population is breast cancer patients, is it all breast cancer patients ...

  7. Systematic Reviews: Formulating Your Research Question

    evidence-based practice process. One way to streamline and improve the research process for nurses and researchers of all backgrounds is to utilize the PICO search strategy. PICO is a format for developing a good clinical research question prior to starting one's research. It is a mnemonic used to describe the four elements

  8. Systematic Reviews: Formulate your question and protocol

    This video illustrates how to use the PICO framework to formulate an effective research question, and it also shows how to search a database using the search terms identified. ... Having a focused and specific research question is especially important when undertaking a systematic review. If your search question is too broad you will retrieve ...

  9. Step 1

    If a systematic review, covering the question you are considering, has already been published or has been registered and it is in the process of being completed. If that is the case, you need to modify your research question. If the systematic review was completed over five years ago, you can perform an update of the same question.

  10. Systematic Reviews: Develop & Refine Your Research Question

    Develop & Refine Your Research Question. A clear, well-defined, and answerable research question is essential for any systematic review, meta-analysis, or other form of evidence synthesis. The question must be answerable. Spend time refining your research question. PICO Worksheet.

  11. Guidance to best tools and practices for systematic reviews

    We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team's awareness of how to prevent research and resource waste [84, 130] and to stimulate careful contemplation of the scope of the review . Authors ...

  12. 1. Formulating the research question

    Systematic review vs. other reviews. Systematic reviews required a narrow and specific research question. The goal of a systematic review is to provide an evidence synthesis of ALL research performed on one particular topic. So, your research question should be clearly answerable from the data you gather from the studies included in your review.

  13. How to Do a Systematic Review: A Best Practice Guide for Conducting and

    Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question.

  14. Developing a Research Question

    After developing the research question, it is necessary to confirm that the review has not previously been conducted (or is currently in progress). Make sure to check for both published reviews and registered protocols (to see if the review is in progress). Do a thorough search of appropriate databases; if additional help is needed, consult a ...

  15. Library Guides: Systematic reviews: Formulate the question

    General principles. "A good systematic review is based on a well formulated, answerable question. The question guides the review by defining which studies will be included, what the search strategy to identify the relevant primary studies should be, and which data need to be extracted from each study." A systematic review question needs to be.

  16. Question frameworks (e.g PICO)

    Your systematic review or systematic literature review will be defined by your research question. A well formulated question will help: Frame your entire research process. Determine the scope of your review. Provide a focus for your searches. Help you identify key concepts. Guide the selection of your papers.

  17. Framing a Research Question

    The process for developing a research question There are many ways of framing questions depending on the topic, discipline, or type of questions. Try Elicit to generate a few options for your initial research topic and narrow it down to a specific population, geographical location, disease, etc.

  18. How to do a systematic review

    A systematic review aims to bring evidence together to answer a pre-defined research question. This involves the identification of all primary research relevant to the defined review question, the critical appraisal of this research, and the synthesis of the findings.13. Systematic reviews may combine data from different.

  19. Doing a Systematic Review: A Student's Guide

    Step 2: formulate a research question . Developing a focused research question is crucial for a systematic review, as it underpins every stage of the review process. The question defines the review's nature and scope, guides the identification of relevant studies, and shapes the data extraction and synthesis processes.

  20. Research question

    Develop your research question. A systematic review is an in-depth attempt to answer a specific, focused question in a methodical way. Start with a clearly defined, researchable question, that should accurately and succinctly sum up the review's line of inquiry. A well formulated review question will help determine your inclusion and exclusion ...

  21. How to formulate the review question using PICO. 5 steps to get you

    Covidence covers five key steps to formulate your review question using PICO. You've decided to go ahead. You have identified a gap in the evidence and you know that conducting a systematic review, with its explicit methods and replicable search, is the best way to fill it - great choice . The review will produce useful information to ...

  22. Chapter 2: Determining the scope of the review and the questions it

    2.2 Aims of reviews of interventions. Systematic reviews can address any question that can be answered by a primary research study. This Handbook focuses on a subset of all possible review questions: the impact of intervention(s) implemented within a specified human population. Even within these limits, systematic reviews examining the effects of intervention(s) can vary quite markedly in ...

  23. How do I develop a research question for systematic review

    The question must be clearly defined and it may be useful to use a research question framework such as PICO (population, intervention, comparison, outcome) or SPICE (setting, perspective, intervention, comparison, evaluation) to help structure both the question and the search terms.

  24. Identifying Your Research Question

    The first step in performing a Systematic Review is to develop your research question. The guidance provided on how to develop your research question for literature reviews will still apply here. The difference with a systematic review research question is that you must have a clearly defined question and consider what problem are you trying to ...

  25. What is a Systematic Literature Review?

    A systematic literature review is a structured, organized and transparent process for identifying, selecting, and critically appraising relevant research studies to answer a specific research question. Systematic reviews apply predefined criteria for selecting studies, assessing their quality, and synthesizing their findings.

  26. What is best practice for conducting narrative synthesis for systematic

    Best practices for conducting a narrative synthesis in a systematic literature review involve several key steps: defining a clear review question and objectives, developing a detailed protocol ...

  27. How to Conduct a Literature Review

    To formulate a research question, consider the main issues or problems within your area of interest. Reflect on what you aim to discover or understand through your research. A specific and well-defined research question will guide your literature search and analysis, making the review process more manageable and effective.

  28. Artificial intelligence in total and unicompartmental knee arthroplasty

    This review aims to evaluate the potential of the application of AI/ML models in the prediction of TKA outcomes and the identification of populations at risk. An extensive search in the following databases: MEDLINE, Scopus, Cinahl, Google Scholar, and EMBASE was conducted using the PIOS approach to formulate the research question.

  29. How to Write the Introduction of a Literature Review?

    4. State the research question. Clearly state your research question. It provides a clear direction for your review and helps in organizing the literature reviewed. It also makes it easier for researchers to follow your argument. A well-defined question or thesis statement is the backbone for your literature review and guides the selection and ...

  30. Risk and threat assessment instruments for violent extremism: A

    One important task for research in this area concerns the definitions of extremism and terrorism, which exposed a greater challenge than one might expect. For example, Schmid (2014) summarized multiple research approaches to define violent extremism. Based on his work, violent extremism refers to an ideation far from the ordinary, in which pluralism, the common good of all people, legal rules ...