Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Guide to Experimental Design | Overview, Steps, & Examples

Guide to Experimental Design | Overview, 5 steps & Examples

Published on December 3, 2019 by Rebecca Bevans . Revised on June 21, 2023.

Experiments are used to study causal relationships . You manipulate one or more independent variables and measure their effect on one or more dependent variables.

Experimental design create a set of procedures to systematically test a hypothesis . A good experimental design requires a strong understanding of the system you are studying.

There are five key steps in designing an experiment:

  • Consider your variables and how they are related
  • Write a specific, testable hypothesis
  • Design experimental treatments to manipulate your independent variable
  • Assign subjects to groups, either between-subjects or within-subjects
  • Plan how you will measure your dependent variable

For valid conclusions, you also need to select a representative sample and control any  extraneous variables that might influence your results. If random assignment of participants to control and treatment groups is impossible, unethical, or highly difficult, consider an observational study instead. This minimizes several types of research bias, particularly sampling bias , survivorship bias , and attrition bias as time passes.

Table of contents

Step 1: define your variables, step 2: write your hypothesis, step 3: design your experimental treatments, step 4: assign your subjects to treatment groups, step 5: measure your dependent variable, other interesting articles, frequently asked questions about experiments.

You should begin with a specific research question . We will work with two research question examples, one from health sciences and one from ecology:

To translate your research question into an experimental hypothesis, you need to define the main variables and make predictions about how they are related.

Start by simply listing the independent and dependent variables .

Research question Independent variable Dependent variable
Phone use and sleep Minutes of phone use before sleep Hours of sleep per night
Temperature and soil respiration Air temperature just above the soil surface CO2 respired from soil

Then you need to think about possible extraneous and confounding variables and consider how you might control  them in your experiment.

Extraneous variable How to control
Phone use and sleep in sleep patterns among individuals. measure the average difference between sleep with phone use and sleep without phone use rather than the average amount of sleep per treatment group.
Temperature and soil respiration also affects respiration, and moisture can decrease with increasing temperature. monitor soil moisture and add water to make sure that soil moisture is consistent across all treatment plots.

Finally, you can put these variables together into a diagram. Use arrows to show the possible relationships between variables and include signs to show the expected direction of the relationships.

Diagram of the relationship between variables in a sleep experiment

Here we predict that increasing temperature will increase soil respiration and decrease soil moisture, while decreasing soil moisture will lead to decreased soil respiration.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

critical thinking experimental design

Now that you have a strong conceptual understanding of the system you are studying, you should be able to write a specific, testable hypothesis that addresses your research question.

Null hypothesis (H ) Alternate hypothesis (H )
Phone use and sleep Phone use before sleep does not correlate with the amount of sleep a person gets. Increasing phone use before sleep leads to a decrease in sleep.
Temperature and soil respiration Air temperature does not correlate with soil respiration. Increased air temperature leads to increased soil respiration.

The next steps will describe how to design a controlled experiment . In a controlled experiment, you must be able to:

  • Systematically and precisely manipulate the independent variable(s).
  • Precisely measure the dependent variable(s).
  • Control any potential confounding variables.

If your study system doesn’t match these criteria, there are other types of research you can use to answer your research question.

How you manipulate the independent variable can affect the experiment’s external validity – that is, the extent to which the results can be generalized and applied to the broader world.

First, you may need to decide how widely to vary your independent variable.

  • just slightly above the natural range for your study region.
  • over a wider range of temperatures to mimic future warming.
  • over an extreme range that is beyond any possible natural variation.

Second, you may need to choose how finely to vary your independent variable. Sometimes this choice is made for you by your experimental system, but often you will need to decide, and this will affect how much you can infer from your results.

  • a categorical variable : either as binary (yes/no) or as levels of a factor (no phone use, low phone use, high phone use).
  • a continuous variable (minutes of phone use measured every night).

How you apply your experimental treatments to your test subjects is crucial for obtaining valid and reliable results.

First, you need to consider the study size : how many individuals will be included in the experiment? In general, the more subjects you include, the greater your experiment’s statistical power , which determines how much confidence you can have in your results.

Then you need to randomly assign your subjects to treatment groups . Each group receives a different level of the treatment (e.g. no phone use, low phone use, high phone use).

You should also include a control group , which receives no treatment. The control group tells us what would have happened to your test subjects without any experimental intervention.

When assigning your subjects to groups, there are two main choices you need to make:

  • A completely randomized design vs a randomized block design .
  • A between-subjects design vs a within-subjects design .

Randomization

An experiment can be completely randomized or randomized within blocks (aka strata):

  • In a completely randomized design , every subject is assigned to a treatment group at random.
  • In a randomized block design (aka stratified random design), subjects are first grouped according to a characteristic they share, and then randomly assigned to treatments within those groups.
Completely randomized design Randomized block design
Phone use and sleep Subjects are all randomly assigned a level of phone use using a random number generator. Subjects are first grouped by age, and then phone use treatments are randomly assigned within these groups.
Temperature and soil respiration Warming treatments are assigned to soil plots at random by using a number generator to generate map coordinates within the study area. Soils are first grouped by average rainfall, and then treatment plots are randomly assigned within these groups.

Sometimes randomization isn’t practical or ethical , so researchers create partially-random or even non-random designs. An experimental design where treatments aren’t randomly assigned is called a quasi-experimental design .

Between-subjects vs. within-subjects

In a between-subjects design (also known as an independent measures design or classic ANOVA design), individuals receive only one of the possible levels of an experimental treatment.

In medical or social research, you might also use matched pairs within your between-subjects design to make sure that each treatment group contains the same variety of test subjects in the same proportions.

In a within-subjects design (also known as a repeated measures design), every individual receives each of the experimental treatments consecutively, and their responses to each treatment are measured.

Within-subjects or repeated measures can also refer to an experimental design where an effect emerges over time, and individual responses are measured over time in order to measure this effect as it emerges.

Counterbalancing (randomizing or reversing the order of treatments among subjects) is often used in within-subjects designs to ensure that the order of treatment application doesn’t influence the results of the experiment.

Between-subjects (independent measures) design Within-subjects (repeated measures) design
Phone use and sleep Subjects are randomly assigned a level of phone use (none, low, or high) and follow that level of phone use throughout the experiment. Subjects are assigned consecutively to zero, low, and high levels of phone use throughout the experiment, and the order in which they follow these treatments is randomized.
Temperature and soil respiration Warming treatments are assigned to soil plots at random and the soils are kept at this temperature throughout the experiment. Every plot receives each warming treatment (1, 3, 5, 8, and 10C above ambient temperatures) consecutively over the course of the experiment, and the order in which they receive these treatments is randomized.

Prevent plagiarism. Run a free check.

Finally, you need to decide how you’ll collect data on your dependent variable outcomes. You should aim for reliable and valid measurements that minimize research bias or error.

Some variables, like temperature, can be objectively measured with scientific instruments. Others may need to be operationalized to turn them into measurable observations.

  • Ask participants to record what time they go to sleep and get up each day.
  • Ask participants to wear a sleep tracker.

How precisely you measure your dependent variable also affects the kinds of statistical analysis you can use on your data.

Experiments are always context-dependent, and a good experimental design will take into account all of the unique considerations of your study system to produce information that is both valid and relevant to your research question.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

  • A testable hypothesis
  • At least one independent variable that can be precisely manipulated
  • At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

  • How you will manipulate the variable(s)
  • How you will control for any potential confounding variables
  • How many subjects or samples will be included in the study
  • How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.

In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.

The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 21). Guide to Experimental Design | Overview, 5 steps & Examples. Scribbr. Retrieved September 16, 2024, from https://www.scribbr.com/methodology/experimental-design/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, random assignment in experiments | introduction & examples, quasi-experimental design | definition, types & examples, how to write a lab report, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

  • Open access
  • Published: 16 August 2023

Training doctoral students in critical thinking and experimental design using problem-based learning

  • Michael D. Schaller 1 ,
  • Marieta Gencheva 1 ,
  • Michael R. Gunther 1 &
  • Scott A. Weed 1  

BMC Medical Education volume  23 , Article number:  579 ( 2023 ) Cite this article

2207 Accesses

4 Citations

Metrics details

Traditionally, doctoral student education in the biomedical sciences relies on didactic coursework to build a foundation of scientific knowledge and an apprenticeship model of training in the laboratory of an established investigator. Recent recommendations for revision of graduate training include the utilization of graduate student competencies to assess progress and the introduction of novel curricula focused on development of skills, rather than accumulation of facts. Evidence demonstrates that active learning approaches are effective. Several facets of active learning are components of problem-based learning (PBL), which is a teaching modality where student learning is self-directed toward solving problems in a relevant context. These concepts were combined and incorporated in creating a new introductory graduate course designed to develop scientific skills (student competencies) in matriculating doctoral students using a PBL format.

Evaluation of course effectiveness was measured using the principals of the Kirkpatrick Four Level Model of Evaluation. At the end of each course offering, students completed evaluation surveys on the course and instructors to assess their perceptions of training effectiveness. Pre- and post-tests assessing students’ proficiency in experimental design were used to measure student learning.

The analysis of the outcomes of the course suggests the training is effective in improving experimental design. The course was well received by the students as measured by student evaluations (Kirkpatrick Model Level 1). Improved scores on post-tests indicate that the students learned from the experience (Kirkpatrick Model Level 2). A template is provided for the implementation of similar courses at other institutions.

Conclusions

This problem-based learning course appears effective in training newly matriculated graduate students in the required skills for designing experiments to test specific hypotheses, enhancing student preparation prior to initiation of their dissertation research.

Peer Review reports

Introduction

For over a decade there have been calls to reform biomedical graduate education. There are two main problems that led to these recommendations and therefore two different prescriptions to solve these problems. The first major issue is the pursuit of non-traditional (non-academic) careers by doctorates and concerns of adequate training [ 1 , 2 ]. The underlying factors affecting career outcomes are the number of PhDs produced relative to the number of available academic positions [ 1 , 3 , 4 , 5 ], and the changing career interests of doctoral students [ 6 , 7 , 8 , 9 ]. One aspect in the proposed reformation to address this problem is incorporation of broader professional skills training and creating awareness of a greater diversity of careers into the graduate curriculum [ 1 , 4 , 5 ]. The second issue relates to the curricula content and whether content knowledge or critical scientific skills should be the core of the curriculum [ 10 , 11 ]. The proposed reformation to address this issue is creation of curricula focusing upon scientific skills, e.g. reasoning, experimental design and communication, while simultaneously reducing components of the curricula that build a foundational knowledge base [ 12 , 13 ]. Components of these two approaches are not mutually exclusive, where incorporation of select specialized expertise in each area has the potential to concurrently address both issues. Here we describe the development, implementation and evaluation of a new problem-based learning (PBL)-based graduate course that provides an initial experience in introducing the scientific career-relevant core competencies of critical thinking and experimental design to incoming biomedical doctoral students. The purpose of this course is to address these issues by creating a vehicle to develop professional skills (communication) and critical scientific skills (critical thinking and experimental design) for first year graduate students.

One approach that prioritizes the aggregate scientific skill set required for adept biomedical doctorates is the development of core competencies for doctoral students [ 5 , 14 , 15 ], akin to set milestones that must be met by medical residents and fellows [ 16 ]. Key features of these competencies include general and field-specific scientific knowledge, critical thinking, experimental design, evaluation of outcomes, scientific rigor, ability to work in teams, responsible conduct of research, and effective communication [ 5 , 14 , 15 ]. Such competencies provide clear benchmarks to evaluate the progress of doctoral students’ development into an independent scientific professional and preparedness for the next career stage. Historically, graduate programs relied on traditional content-based courses and supervised apprenticeship in the mentor’s laboratory to develop such competencies. An alternative to this approach is to modify the graduate student curriculum to provide a foundation for these competencies early in the curriculum in a more structured way. This would provide a base upon which additional coursework and supervised dissertation research could build to develop competencies in doctoral students.

Analyses of how doctoral students learn scientific skills suggest a threshold model, where different skillsets are mastered (a threshold reached), before subsequent skillsets can be mastered [ 17 , 18 ]. Skills like using the primary literature, experimental design and placing studies in context are earlier thresholds than identifying alternatives, limitations and data analysis [ 18 ]. Timmerman et al. recommend revision of graduate curricula to sequentially build toward these thresholds using evidence-based approaches [ 18 ]. Several recent curricular modifications are aligned with these recommendations. One program, as cited above, offers courses to develop critical scientific skills early in the curriculum with content knowledge provided in later courses [ 12 , 13 ]. A second program has built training in experimental design into the coursework in the first semester of the curriculum. Improvements in students experimental design skills and an increase in self-efficacy in experimental design occurred over the course of the semester [ 19 ]. Other programs have introduced exercises into courses and workshops to develop experimental design skills using active learning. One program developed interactive sessions on experimental design, where students give chalk talks about an experimental plan to address a problem related to course content and respond to challenges from their peers [ 20 ]. Another program has developed a workshop drawing upon principles from design thinking to build problem solving skills and creativity, and primarily uses active learning and experiential learning approaches [ 21 ]. While these programs are well received by students, the outcomes of training have not been reported. Similar undergraduate curricula that utilize literature review with an emphasis on scientific thought and methods report increased performance in critical thinking, scientific reasoning and experimental design [ 22 , 23 ].

It is notable that the changes these examples incorporate into the curriculum are accompanied with a shift from didactic teaching to active learning. Many studies have demonstrated that active learning is more effective than a conventional didactic curriculum in STEM education [ 24 ]. Problem-based learning (PBL) is one active learning platform that the relatively new graduate program at the Van Andel Institute Graduate School utilizes for delivery of the formal curriculum [ 25 ]. First developed for medical students [ 26 ], the PBL learning approach has been adopted in other educational settings, including K-12 and undergraduate education [ 27 , 28 ]. A basic tenet of PBL is that student learning is self-directed [ 26 ]. Students are tasked to solve an assigned problem and are required to find the information necessary for the solution (self-directed). In practice, learning occurs in small groups where a faculty facilitator helps guide the students in identifying gaps in knowledge that require additional study [ 29 ]. As such, an ideal PBL course is “well organized” but “poorly structured”. The lack of a traditional restrictive structure allows students to pursue and evaluate different solutions to the problem.

The premise for PBL is that actively engaging in problem solving enhances learning in several ways [ 29 , 30 ]. First, activation of prior knowledge, as occurs in group discussions, aids in learning by providing a framework to incorporate new knowledge. Second, deep processing of material while learning, e.g. by answering questions or using the knowledge, enhances the ability to later recall key concepts. Third, learning in context, e.g. learning the scientific basis for clinical problems in the context of clinical cases, enables and improves recall. These are all effective strategies to enhance learning [ 31 ]. PBL opponents argue that acquisition of knowledge is more effective in a traditional didactic curriculum. Further, development of critical thinking skills requires the requisite foundational knowledge to develop realistic solutions to problems [ 32 ].

A comprehensive review of PBL outcomes from K-12 through medical school indicated that PBL students perform better in the application of knowledge and reasoning, but not in other areas like basic knowledge [ 33 ]. Other recent meta-analyses support the conclusion that PBL, project-based learning and other small group teaching modalities are effective in education from primary school to university, including undergraduate courses in engineering and technology, and pharmacology courses for professional students in health sciences [ 34 , 35 , 36 , 37 , 38 , 39 ]. While the majority of the studies reported in these meta-analyses demonstrate that PBL results in better academic performance, there are contrasting studies that demonstrate that PBL is ineffective. This prompts additional investigation to determine the salient factors that distinguish the two outcomes to establish best practices for better results using the PBL platform. Although few studies report the outcomes of PBL based approaches in graduate education, this platform may be beneficial in training biomedical science doctoral students for developing and enhancing critical thinking and practical problem-solving skills.

At our institution, biomedical doctoral students enter an umbrella program and take a core curriculum in the first semester prior to matriculating into one of seven biomedical sciences doctoral programs across a wide range of scientific disciplines in the second semester. Such program diversity created difficulty in achieving consensus on the necessary scientific foundational knowledge for a core curriculum. Common ground was achieved during a recent curriculum revision through the development of required core competencies for all students, regardless of field of study. These competencies and milestones for biomedical science students at other institutions [ 5 , 14 , 15 ], along with nontraditional approaches to graduate education [ 12 , 25 ], were used as guidelines for curriculum modification.

Course design

A course was created to develop competencies required by all biomedical sciences doctoral students regardless of their program of interest [ 14 ]. As an introductory graduate level course, this met the needs of all our seven diverse biomedical sciences doctoral programs where our first-year doctoral students matriculate. A PBL platform was chosen for the course to engage the students in an active learning environment [ 25 ]. The process of problem solving in small teams provided the students with experience in establishing working relationships and how to operate in teams. The students gained experience in researching material from the literature to establish scientific background, find current and appropriate experimental approaches and examples of how results are analyzed. This small group approach allowed each team to develop different hypotheses, experimental plans and analyses based upon the overall interests of the group. The course was designed following discussions with faculty experienced in medical and pharmacy school PBL, and considering course design principles from the literature [ 27 , 40 ]. The broad learning goals are similar to the overall objectives in another doctoral program using PBL as the primary course format [ 25 ], and are aligned with recommended core competencies for PhD scientists [ 14 ]. These goals are to:

Develop broad, general scientific knowledge (core competency 1 [ 14 ]).

Develop familiarity with technical approaches specific to each problem.

Practice critical thinking/experimental design incorporating rigor and reproducibility,

including: formulation of hypotheses, detailed experimental design, interpretation of data, statistical analysis (core competencies 3 and 4 [ 14 ]).

Practice communication skills: written and verbal communication skills (core competency 8 [ 14 ]).

Develop collaboration and team skills (core competency 6 [ 14 ]).

Practice using the literature.

Students were organized into groups of four or five based on their scientific background. Student expertise in each group was deliberately mixed to provide different viewpoints during discussion. A single faculty facilitator was assigned to each student group, which met formally in 13 separate sessions (Appendix II). In preparation for each session, the students independently researched topics using the literature (related to goal 6) and met informally without facilitator oversight to coordinate their findings and organize the discussion for each class session. During the formal one-hour session, one student served as the group leader to manage the discussion. The faculty facilitator guided the discussion to ensure coverage of necessary topics and helped the students identify learning issues, i.e. areas that required additional development, for the students to research and address for the subsequent session. At the end of each session, teams previewed the leading questions for the following class and organized their approach to address these questions prior to the next session. The whole process provided experiences related to goal 5.

As the course was developed during the COVID-19 pandemic, topics related to SARS-CoV2 and COVID-19 were selected as currently relevant problems in society. Session 1 prepared the students to work in teams by discussing about how to work in teams and manage conflict (related to goal 5). In session 2, the students met in their assigned groups to get to know each other, discuss problem-based learning and establish ground rules for the group. Sessions 3 and 4 laid the course background by focusing on the SARS-CoV2 virus and COVID-19-associated pathologies (related to goal 1). The subsequent nine sessions were organized into three separate but interrelated three-session blocks: one on COVID-19 and blood clotting, one on COVID-19 and loss of taste, and one on SARS-CoV2 and therapeutics. The first session in each of these blocks was devoted to covering background information (blood clotting, neurosensation and drug application)(related to goal 1). The second session of each block discussed hypothesis development (mechanisms that SARS-CoV2 infection might utilize to alter blood clotting, the sense of taste, and identification of therapeutic targets to attenuate SARS-CoV2 infection)(related to goal 3). In the second sessions the students also began to design experiments to test the hypothesis. The final session of each block fleshed out the details of the experimental design (related to goals 2 and 3).

The process was iterative, where the students had three opportunities to discuss hypothesis development, experimental design and analysis during sessions with their facilitators. Written and oral presentation assignments (Appendix V) provided additional opportunities to articulate a hypothesis, describe experimental approaches to test the hypotheses, propose analysis of experimental results and develop communication skills (related to goal 4).

Rigor and reproducibility was incorporated into the course. This was an important component given the emphasis recently placed on rigor and reproducibility by federal agencies. As the students built the experimental design to address the hypothesis, recurring questions were posed to encourage them to consider rigor. Examples include: “ Are the methods and experimental approaches rigorous? How could they be made more rigorous? ” “ Discuss how your controls validate the outcome of the experiment. What additional controls could increase confidence in your result? ” The facilitators were instructed to direct discussion to topics related to the rigor of the experimental design. The students were asked about numbers of replicates, number of animals, additional methods that could be applied to support the experiment, and other measurements to address the hypothesis in a complementary fashion. In the second iteration of the course, we introduced an exercise on rigor and reproducibility for the students using the NIH Rigor and Reproducibility Training Modules (see Appendix III). In this exercise, the students read a short introduction to rigor and reproducibility and viewed a number of short video modules to introduce lessons on rigor. The students were also provided the link to the National Institute of General Medical Sciences clearinghouse of training modules on rigor and reproducibility as reference for experimental design in their future (see Appendix III).

The first delivery of the course was during the COVID-19 pandemic and sessions were conducted on the Zoom platform. The thirteen PBL sessions, and two additional sessions dedicated to oral presentations, were spaced over the course of the first semester of the biomedical sciences doctoral curriculum. The second iteration of the course followed the restructuring of the graduate first year curriculum and the thirteen PBL sessions, plus one additional session devoted to oral presentations, were held during the first three and a half weeks of the first-year curriculum. During this period in the semester, this was the only course commitment for the graduate students. Due to this compressed format, only one written assignment and a single oral presentation were assigned. As the small group format worked well via Zoom in the first iteration of the course, the small groups continued to meet using this virtual platform.

IRB Approval. The West Virginia University Institutional Review Board approved the study (WVU IRB Protocol#: 2008081739). Informed consent was provided by the participants in writing and all information was collected anonymously.

Surveys. Evaluation of training effectiveness was measured in two ways corresponding to the first two levels of the Kirkpatrick Model of Evaluation [ 41 ]. First, students completed a questionnaire upon completion of the course to capture their perceptions of training (Appendix VII). Students were asked their level of agreement/disagreement on a Likert scale with 10 statements about the course and 7 statements about their facilitator. Second, students took a pre- and post-test to measure differences in their ability to design experiments before and after training (Appendix VIII). The pre- and post-tests were identical, asking the students to design an experiment to test a specific hypothesis, include controls, plan analyses, and state possible results and interpretation. Five questions were provided for the pre- and post-test, where each question posed a hypothesis from a different biomedical discipline, e.g. cancer biology or neuroscience. Students were asked to choose one of the five questions to answer.

Peer-to-peer evaluations were collected to provide feedback on professionalism and teamwork. This survey utilized a Goldilocks scale ranging from 1 to 7, with 4 being the desired score. An example peer question asked about accountability, where responses included not accountable, e.g. always late (score = 1), accountable, e.g. punctual, well prepared, follows up (score = 4) and controlling, e.g. finds fault in others (score = 7). Each student provided a peer-to-peer evaluation for each student in their group. (see Appendix VII). In the second course iteration, Goldilocks surveys were collected three times over the three-week course period due to the compressed time frame. This was necessary to provide rapid feedback to the students about their performance during the course in order to provide opportunities to address and rectify any deficits before making final performance assessments.

Evaluating Pre- and Post-Tests. All pre- and post-test answers were evaluated by three graders in a blind fashion, where the graders were unaware if an answer came from a pre- or post-test. Prior to grading, each grader made up individual answer keys based upon the question(s) on the tests. The graders then met to compare and deliberate these preliminary keys, incorporating changes and edits to produce a single combined key to use for rating answers. While the students were asked to answer one question, some students chose to answer several questions. Superfluous answers were used as a training dataset for the graders. The graders independently scored each answer, then met to review the results and discuss modification of the grading key. The established final grading key, with a perfect score of 16, was utilized by the graders in independently evaluating the complete experimental dataset consisting of all pre- and post-test answers (Appendix IX). To assess the ability of student cohorts to design experiments before and after the course, three of the authors graded all of the pre- and post-test answers. Grading was performed in a blind fashion and the scores of the three raters were averaged for each answer.

Statistical analysis. To measure the interrater reliability of the graders, the intraclass correlation coefficient (ICC) was calculated. A two-way mixed effects model was utilized to evaluate consistency between multiple raters/measurements. The ICC for grading the training dataset was 0.82, indicating a good inter-rater agreement. The ICC for grading the experimental dataset was also 0.82. For comparison of pre-test vs. post-test performance, the scores of the three raters were averaged for each answer. Since answers were anonymous, the analyses compared responses between individuals. Most, but not all scores, exhibited a Gaussian distribution and therefore a nonparametric statistic, a one-tailed Mann Whitney U test, was used for comparison. The pre-test and post-test scores for 2020 and 2021 could not be compared due to the different format used for the course in each year.

Thirty students participated in the course in the first offering, while 27 students were enrolled in the second year. The students took pre- and post-tests to measure their ability to design an experiment before and after training (Appendix VIII). As the course progressed, students were surveyed on their views of the professionalism of other students in their group (Appendix VII). At the end of the course, students were asked to respond to surveys evaluating the course and their facilitator (see Appendix VII).

Student reception of the course (Kirkpatrick Level 1) . In the first year, 23 students responded to the course evaluation (77% response rate) and 26 students submitted facilitator evaluations (87% response rate), whereas in the second year there were 25 responses to the course evaluation (93% response rate) and 26 for facilitators (96% response rate). Likert scores for the 2020 and 2021 course evaluations are presented in Fig.  1 . The median score for each question was 4 on a scale of 5 in 2020. In 2021, the median scores for the questions about active learning and hypothesis testing were 5 and the median score of the other questions was 4. The students appreciated the efforts of the facilitators in the course, based upon their evaluations of the facilitators. The median score for every facilitator across all survey questions is shown in Fig.  2 . The median score for a single question in 2020 and 2021 was 4.5 and the median score for all other questions was 5. The results of the peer-to-peer evaluations are illustrated in Fig.  3 . The average score for each student were plotted, with scores further from the desired score of 4 indicating perceived behaviors that were not ideal. The wide range of scores in the 2020 survey were noted. The students completed three peer-to-peer surveys during the 2021 course. The range of scores in the 2021 peer-to-peer evaluation was narrower than the range in the 2020 survey. The range of scores was expected to narrow from the first (initial) to third (final) survey as students learned and implemented improvements in their professional conduct based upon peer feedback. The narrow range of scores in the initial survey left little room for improvement.

figure 1

Results of Course Evaluations by Students. Student evaluations of the course were collected at the end of each offering. The evaluation surveys are in Appendix VII. Violin plots showing the distribution and median score for each question in the 2020 survey (A) and the 2021 survey (B) are shown. The survey used a Likert scale (1 – low to 5 – high)

figure 2

Results of Facilitator Evaluations by Students. Student evaluations of the facilitators were collected at the end of each offering of the course. The evaluation surveys are in Appendix VII. Violin plots showing the distribution and median score for each question in the 2020 survey (A) and the 2021 survey (B) are shown. The survey used a Likert scale (1 – low to 5 – high)

figure 3

Results of Student Peer-to-Peer Evaluations. Student peer-to-peer evaluations were collected at the end of the course in year 1 (A) , and at the beginning (B) , the middle (C) and the end (D) of the course in year 2. Each student evaluated the professionalism of each other student in their group using the evaluation survey shown in Appendix VII. The average score for each student is plotted as a data point. The survey used a Goldilocks scale (range of 1 to 7) where the desired professional behavior is reflected by a score of 4

Student learning (Kirkpatrick Level 2). Twenty-six students completed the pre-test in each year and consented to participate in this study (87% response in the first year and 96% response in the second year). Eighteen students completed the post-test at the end of the first year (60%) and 26 students completed the test at the end of the second year (96%). Question selection (excluding students that misunderstood the assignment and answered all questions) is shown in Table  1 . The most frequently selected questions were Question 1 (45 times) and Question 2 (23 times). Interestingly, the results in Table  1 also indicate that students did not necessarily choose the same question to answer on the pre-test and post-test.

Average scores on pre-tests and post-tests were compared using a one-tailed Mann Whitney U test. Since the format of the course was different in the two iterations, comparison of test results between the two years could not be made. The average scores of the pre- and post-test in 2020 were not statistically different (p = 0.0673), although the post-test scores trended higher. In contrast, the difference between the pre- and post-test in 2021 did reach statistical significance (p = 0.0329). The results collectively indicate an overall improvement in student ability in experimental design (Fig.  4 ).

figure 4

Pre- and Post-Test Scores. At the beginning and end of each offering, the students completed a test to measure their ability to design an experiment (see Appendix VIII for the details of the exam). Three faculty graded every answer to the pre- and post-test using a common grading rubric (see Appendix IX). The maximum possible score was 16. The average score for each individual answer on the pre-test and post-test is represented as a single data point. The bar indicates the mean score across all answers +/- SD. The average scores of the pre- and post-test scores were compared using a one-tailed Mann Whitney U test. For the 2020 data (A) , p = 0.0673, and for the 2021 data (B) , p = 0.0329

This course was created in response to biomedical workforce training reports recommending increased training in general professional skills and scientific skills, e.g. critical thinking and experimental design. The course utilizes a PBL format, which is not extensively utilized in graduate education, to incorporate active learning throughout the experience. It was well received by students and analysis suggests that major goals of the course were met. This provides a template for other administrators and educators seeking to modify curricula in response to calls to modify training programs for doctoral students.

Student evaluations indicated the course was effective at motivating active learning and that students became more active learners. The evaluation survey questions were directly related to three specific course goals: (1) Students reported developing skills in problem solving, hypothesis testing and experimental design. (2) The course helped develop oral presentation skills and written communication skills (in one iteration of the course) and (3) students developed collaboration and team skills. Thus, from the students’ perspective, these three course goals were met. Student perceptions of peer professionalism was measured using peer-to-peer surveys. The wide range of Goldilocks scores in the first student cohort was unexpected. In the second student cohort changes in professional behavior were measured over time and the score ranges were narrower. The reasons for the difference between cohorts is unclear. One possibility for this discrepancy is that the first iteration of the course extended over one semester and was during the first full semester of the pandemic, impacting professional behavior and perceptions of professionalism. The second cohort completed a professionalism survey three times during the course. The narrow range of scores from this cohort in the initial survey made detection of improved professionalism over the course difficult. Results do indicate that professionalism improved in terms of respect and compassion between the first and last surveys. Finally, the results of the pre-test and post-test analysis demonstrated a trend of improved performance on the post-test relative to the pre-test for students in each year of the course and a statistical difference between the pre- and post-test scores in the second year.

Areas for improvement. The course was initially offered as a one-credit course. Student comments on course evaluations and comments in debriefing sessions with facilitators at the end of the course concurred that the work load exceeded that of a one credit course. As a result, the year two version was offered as a two-credit course to better align course credits with workload.

There were student misperceptions about the goals of the course in the first year. Some students equated experimental design with research methods and expressed disappointment that this was not a methods course. While learning appropriate methods is a goal of the course, the main emphasis is developing hypotheses and designing experiments to test the hypotheses. As such, the choice of methods was driven by the hypotheses and experimental design. This misperception was addressed in the second year by clearly elaborating on the course goals in an introductory class session.

The original course offering contained limited statistical exercises to simulate experimental planning and data analysis, e.g. students were required to conduct a power analysis. Between the first and second years of the course, the entire first semester biomedical sciences curriculum was overhauled with several new course offerings. This new curriculum contained an independent biostatistics workshop that students completed prior to the beginning of this course. Additional statistics exercises were incorporated into the PBL course to provide the students with more experience in the analysis of experimental results. Student evaluations indicated that the introduction of these additional exercises was not effective. Improved coordination between the biostatistics workshop and the PBL course is required to align expectations, better equipping students for the statistical analysis of experimental results encountered later in this course.

An important aspect that was evident from student surveys, facilitator discussions and debrief sessions was that improved coordination between the individual facilitators of the different groups is required to reduce intergroup variability. Due to class size, the students were divided into six groups, with each facilitator assigned to the same group for the duration of the course to maintain continuity. The facilitators met independent of the students throughout the course to discuss upcoming sessions and to share their experiences with their respective groups. This allowed the different facilitators to compare approaches and discuss emerging or perceived concerns/issues. In the second year, one facilitator rotated between different groups during each session to observe how the different student groups functioned. Such a real time faculty peer-evaluation process has the potential to reduce variability between groups, but was challenging to implement within the short three-week time period. Comprehensive training where all facilitators become well versed in PBL strategies and adhere to an established set of guidelines/script for each session is one mechanism that may reduce variability across different facilitator-group pairings.

Limitations. The current study has a number of limitations. The sample size for each class was small, with 30 students enrolled in the first year of the course and 27 students enrolled in the second. The response rates for the pre-tests were high (> 87%) but the response rate for the post-test varied between the first year (60%) and second year (96%) of the course. The higher response rate in the second year might be due to fewer end of semester surveys since this was the only course that the students took in that time period. Additionally, the post-test in the second year was conducted at a scheduled time, rather than on the student’s own time as was the case in year one. Due to restructuring of the graduate curriculum and the pandemic, the two iterations of the course were formatted differently. This precluded pooling the data from the two offerings and makes comparison between the outcomes difficult.

Presentation of the course was similar, but not identical, to all of the students. Six different PBL groups were required to accommodate the number of matriculating students in each year. Despite efforts to provide a consistent experience, there was variability between the different facilitators in running their respective groups. Further, the development of each session in each group was different, since discussion was driven by the students and their collective interests. These variables could be responsible for increasing the spread of scores on the post-tests and decreasing the value of the course for a subset of students.

The pre- and post-tests were conducted anonymously to encourage student participation. This prevented correlating the differential between pre- and post-test scores for each student and in comparing learning between different groups. The pre-test and post-test were identical, and provided the students with five options to design experiments (with identical instructions) in response to a different biomedical science problem. An alternative approach could have used isomorphic questions for the pre- and post-tests. It is clear that some students answered the same question on the pre- and post-test, and may benefit from answering the same question twice (albeit after taking the course). Some students clearly answered different questions on the pre- and post-test and the outcomes might be skewed if the two questions challenged the student differently.

While the course analysis captured the first two levels of the Kirkpatrick model of evaluation (reaction and learning), it did not attempt to measure the third level (behavior) or fourth level (results) [ 41 ]. Future studies are required to measure the third level. This could be achieved by asking students to elaborate on their experimental design used in recent experiments in their dissertation laboratory following completion of the course, or by evaluating the experimental design students incorporate into their dissertation proposals. The fourth Kirkpatrick level could potentially be assessed by surveying preceptors about their students’ abilities in experimental design in a longitudinal manner at semi- or annual committee meetings and accompanying written progress reports. The advantage of focusing on the first two Kirkpatrick levels of evaluation is that the measured outcomes can be confidently attributed to the course. Third and fourth level evaluations are more complicated, since they necessarily take place at some point after completion of the course. Thus, the third and fourth level outcomes can result from additional factors outside of the course (e.g. other coursework, working in the lab, attendance in student-based research forum, meeting with mentors, etc.). Another limiting factor is the use of a single test to measure student learning. Additional alternative approaches to measure learning might better capture differences between the pre- and post-test scores.

Implementation. This curriculum is readily scalable and can be modified for graduate programs of any size, with the caveat that larger programs will require more facilitators. At Van Andel, the doctoral cohorts are three to five new students per year and all are accommodated in one PBL group [ 25 ]. At our institution, we have scaled up to a moderate sized doctoral program with 25 to 30 matriculating students per year, dividing the students into six PBL groups (4–5 students each). Medical School classes frequently exceed 100 students (our program has 115–120 new students each fall) and typically have between five and eight students per group. Our graduate course has groups at the lower end of this range. This course could be scaled up by increasing the number of students in the group or by increasing the number of groups.

Consistency between groups is important so each group of students has a similar experience and reaps the full benefit of this experience. Regular meetings between the course coordinator and facilitators to discuss the content of upcoming sessions and define rubrics to guide student feedback and evaluation were mechanisms used to standardize between the different groups in this course (Appendix VI). In hindsight, the course would benefit from more rigorous facilitator training prior to participation in the course. While a number of our facilitators were veterans of a medical school PBL course, the necessary skillset required to effectively manage a graduate level PBL course that is centered on developing critical thinking and experimental design are different. Such training requires an extensive time commitment by the course coordinators and participating facilitators.

The most difficult task in developing this course involved the course conception and development of the problem-based assignments. Designing a COVID-19 based PBL course in 2020 required de novo development of all course material. This entailed collecting and compiling information about the virus and the disease to provide quick reference for facilitators to guide discussion in their groups, all in the face of constantly shifting scientific and medical knowledge, along with the complete lack of traditional peer-based academic social engagement due to the pandemic. In development of this course, three different COVID-based problems were identified, with appropriate general background material for each problem requiring extensive research and development. Background material on cell and animal models, general strategies for experimental manipulation and methods to measure specific outcomes were collected in each case. Student copies for each session were designed to contain a series of questions as a guide to identifying important background concepts. Facilitator copies for each session were prepared with the goal of efficiently and effectively guiding each class meeting. These guidelines contained ideas for discussion points, areas of elaboration and a truncated key of necessary information to guide the group (Appendix IV). Several PBL repositories exist (e.g. https://itue.udel.edu/pbl/problems/ , https://www.nsta.org/case-studies ) and MedEdPORTAL ( https://www.mededportal.org/ ) publishes medical-specific cases. These provide valuable resources for case-based ideas, but few are specifically geared for research-focused biomedical graduate students. As such, modification of cases germane to first year biomedical graduate students with a research-centered focus is required prior to implementation. Finally, appropriate support materials for surveys and evaluation rubrics requires additional development and refinement of current or existing templates to permit improved evaluation of learning outcomes (Appendix VI).

Development of an effective PBL course takes considerable time and effort to conceive and construct. Successful implementation requires the requisite higher administrative support to identify and devote the necessary and appropriate faculty needed for course creation, the assignment of skilled faculty to serve as facilitators and staff support to coordinate the logistics for the course. It is critical that there is strong faculty commitment amongst the facilitators to devote the time and energy necessary to prepare and to successfully facilitate a group of students. Strong institutional support is linked to facilitator satisfaction and commitment to the PBL-based programs [ 42 ]. Institutional support can be demonstrated in multiple ways. The time commitment for course developers, coordinators and facilitators should be accurately reflected in teaching assignments. Performance in these roles in PBL should factor into decisions about support for professional development, e.g. travel awards, and merit based pay increases. Further, efforts in developing, implementing and executing a successful PBL course should be recognized as important activities during annual faculty evaluations by departmental chairs and promotion and tenure committees.

Key Takeaways. The creation and implementation of this course was intellectually stimulating and facilitators found their interactions with students gratifying. From student survey responses and test results the course was at least modestly successful at achieving its goals. Based upon our experience, important issues to consider when deciding to implement such a curriculum include: (1) support of the administration for developing the curriculum, (2) facilitator buy-in to the approach, (3) continuity (not uniformity) between PBL groups, (4) other components of the curriculum and how they might be leveraged to enhance the effectiveness of PBL and (5) effort required to develop and deliver the course, which must be recognized by the administration.

Future Directions. Novel curriculum development is an often overlooked but important component to contemporary graduate student education in the biomedical sciences. It is critical that modifications incorporated in graduate education are evidence based. We report the implementation of a novel PBL course for training in the scientific skill sets required for developing and testing hypotheses, and demonstrate its effectiveness. Additional measures to assess the course goals in improving critical thinking, experimental design and self-efficacy in experimental design will be implemented using validated tests [ 22 , 43 , 44 , 45 ]. Further studies are also required to determine the long-term impact of this training on student performance in the laboratory and progression towards degree. It will be interesting to determine if similar curriculum changes to emphasize development of skills will shorten the time to degree, a frequent recommendation for training the modern biomedical workforce [ 1 , 46 , 47 , 48 ].

Incorporation of courses emphasizing development of skills can be done in conjunction with traditional didactic instruction to build the necessary knowledge base for modern biomedical research. Our PBL course was stand-alone, requiring the students to research background material prior to hypothesis development and experimental design. Coordination between the two modalities would obviate the need for background research in the PBL component, reinforce the basic knowledge presented didactically through application, and prepare students for higher order thinking about the application of the concepts learned in the traditional classroom. Maintaining a balance between problem-based and traditional instruction may also be key in improving faculty engagement into such new and future initiatives. Continued investments in the creation and improvement of innovative components of graduate curricula centered around developing scientific skills of doctoral students can be intellectually stimulating for faculty and provide a better training environment for students. The effort may be rewarded by streamlining training and strengthening the biomedical workforce of the future.

Data Availability

All data generated in this study are included in this published article and its supplementary information files.

Abbreviations

  • Problem-based learning

Science, technology, engineering, and math

kindergarten through grade 12

Intraclass coefficient>

severe acute respiratory syndrome coronavirus 2

Coronavirus disease 19

National Institutes of Health. Biomedical research workforce working group report. Bethesda, MD: National Institutes of Health; 2012.

Google Scholar  

Sinche M, Layton RL, Brandt PD, O’Connell AB, Hall JD, Freeman AM, Harrell JR, Cook JG, Brennwald PJ. An evidence-based evaluation of transferrable skills and job satisfaction for science PhDs. PLoS ONE. 2017;12:e0185023.

Ghaffarzadegan N, Hawley J, Larson R, Xue Y. A note on PhD Population Growth in Biomedical Sciences. Syst Res Behav Sci. 2015;23:402–5.

National Academies of Sciences Engineering and Medicine. The next generation of biomedical and behavioral sciences researchers: breaking through. Washington, DC: National Academies Press (US); 2018.

National Academies of Sciences Engineering and Medicine. Graduate STEM education for the 21st century. Washington, DC: National Academies Press; 2018.

Roach M, Sauermann H. The declining interest in an academic career. PLoS ONE. 2017;12:e0184130.

Sauermann H, Roach M. Science PhD career preferences: levels, changes, and advisor encouragement. PLoS ONE. 2012;7:e36307.

St Clair R, Hutto T, MacBeth C, Newstetter W, McCarty NA, Melkers J. The “new normal”: adapting doctoral trainee career preparation for broad career paths in science. PLoS ONE. 2017;12:e0177035.

Fuhrmann CN, Halme DG, O’Sullivan PS, Lindstaedt B. Improving graduate education to support a branching career pipeline: recommendations based on a survey of doctoral students in the basic biomedical sciences. CBE—Life Sci Educ. 2011;10:239–49.

Casadevall A, Ellis LM, Davies EW, McFall-Ngai M, Fang FC. (2016) A framework for improving the quality of research in the biological sciences. 7, e01256–01216.

Casadevall A, Fang FC. (2016) Rigorous science: a how-to guide. 7, e01902-01916.

Bosch G, Casadevall A. Graduate Biomedical Science Education needs a New Philosophy. mBio. 2017;8:e01539–01517.

Bosch G. Train PhD students to be thinkers not just specialists. Nature. 2018;554:277–8.

Verderame MF, Freedman VH, Kozlowski LM, McCormack WT. (2018) Competency-based assessment for the training of PhD students and early-career scientists. Elife 7, e34801.

Graziane J, Graziane N. Neuroscience Milestones: developing standardized core-competencies for Research-Based neuroscience trainees. J Neurosci. 2022;42:7332–8.

Edgar L, Roberts S, Holmboe E. Milestones 2.0: a step forward. J graduate Med Educ. 2018;10:367–9.

Kiley M, Wisker G. Threshold concepts in research education and evidence of threshold crossing. High Educ Res Dev. 2009;28:431–41.

Timmerman BC, Feldon D, Maher M, Strickland D, Gilmore J. Performance-based assessment of graduate student research skills: timing, trajectory, and potential thresholds. Stud High Educ. 2013;38:693–710.

Lachance K, Heustis RJ, Loparo JJ, Venkatesh MJ. Self-efficacy and performance of research skills among first-semester bioscience doctoral students. CBE—Life Sci Educ. 2020;19:ar28.

Heustis RJ, Venkatesh MJ, Gutlerner JL, Loparo JJ. Embedding academic and professional skills training with experimental-design chalk talks. Nat Biotechnol. 2019;37:1523–7.

Ulibarri N, Cravens AE, Cornelius M, Royalty A, Nabergoj AS. Research as design: developing creative confidence in doctoral students through design thinking. Int J Doctoral Stud. 2014;9:249–70.

Gottesman AJ, Hoskins SG. CREATE cornerstone: introduction to scientific thinking, a new course for STEM-interested freshmen, demystifies scientific thinking through analysis of scientific literature. CBE—Life Sci Educ. 2013;12:59–72.

Koenig K, Schen M, Edwards M, Bao L. (2012) Addressing STEM Retention through a scientific thought and methods Course. J Coll Sci Teach 41.

Freeman S, Eddy SL, McDonough M, Smith MK, Okoroafor N, Jordt H, Wenderoth MP. Active learning increases student performance in science, engineering, and mathematics. Proc Natl Acad Sci. 2014;111:8410–5.

Turner JD, Triezenberg SJ. PBL for Ph. D.: a problem-based learning approach to doctoral education in biomedical research. ASQ High Educ Brief. 2010;3:1–5.

Neufeld VR, Barrows HS. The “McMaster Philosophy”: an approach to medical education. Acad Med. 1974;49:1040–50.

Duch BJ, Groh SE, Allen DE. The power of problem-based learning: a practical” how to” for teaching undergraduate courses in any discipline. Sterling, VA: Stylus Publishing, LLC.; 2001.

Wirkala C, Kuhn D. Problem-based learning in K–12 education: is it effective and how does it achieve its effects? Am Educ Res J. 2011;48:1157–86.

Norman G, Schmidt HG. The psychological basis of problem-based learning: a review of the evidence. Acad Med. 1992;67:557–65.

Handelsman J, Ebert-May D, Beichner R, Bruns P, Chang A, DeHaan R, Gentile J, Lauffer S, Stewart J, Tilghman SM. Scientific teaching. Science. 2004;304:521–2.

Brown PC, Roediger III, H. L., and, McDaniel MA. Make it stick: the science of successful learning. Cambridge, Massachusets: The Belknap Press of Harvard University Press; 2014.

Willingham DT. Critical thinking: why is it so hard to teach? Arts Educ Policy Rev. 2008;109:21–32.

Hung W, Jonassen DH, Liu R. (2008) Problem-based learning. In Handbook of research on educational communications and technology pp. 485–506, Routledge, Abingdon UK.

Uluçınar U. The Effect of Problem-Based learning in Science Education on Academic Achievement: a Meta-Analytical Study. Sci Educ Int. 2023;34:72–85.

Chen C-H, Yang Y-C. Revisiting the effects of project-based learning on students’ academic achievement: a meta-analysis investigating moderators. Educational Res Rev. 2019;26:71–81.

Liu Y, Pásztor A. Effects of problem-based learning instructional intervention on critical thinking in higher education: a meta-analysis. Think Skills Creativity. 2022;45:101069.

Kalaian SA, Kasim RM, Nims JK. Effectiveness of small-group learning pedagogies in Engineering and Technology Education: a Meta-analysis. J Technol Educ. 2018;29:20–35.

Liu L, Du X, Zhang Z, Zhou J. Effect of problem-based learning in pharmacology education: a meta-analysis. Stud Educational Evaluation. 2019;60:43–58.

Dochy F, Segers M, Van den Bossche P, Gijbels D. Effects of problem-based learning: a meta-analysis. Learn instruction. 2003;13:533–68.

Azer SA. Challenges facing PBL tutors: 12 tips for successful group facilitation. Med Teach. 2005;27:676–81.

Kirkpatrick DL. Seven keys to unlock the four levels of evaluation. Perform Improv. 2006;45:5–8.

Trullàs JC, Blay C, Sarri E, Pujol R. Effectiveness of problem-based learning methodology in undergraduate medical education: a scoping review. BMC Med Educ. 2022;22:104.

Deane T, Nomme K, Jeffery E, Pollock C, Birol G. Development of the biological experimental design concept inventory (BEDCI). CBE—Life Sci Educ. 2014;13:540–51.

Sirum K, Humburg J. The experimental design ability test (EDAT). Bioscene: J Coll Biology Teach. 2011;37:8–16.

Hoskins SG, Lopatto D, Stevens LM. The CREATE approach to primary literature shifts undergraduates’ self-assessed ability to read and analyze journal articles, attitudes about science, and epistemological beliefs. CBE—Life Sci Educ. 2011;10:368–78.

Pickett CL, Corb BW, Matthews CR, Sundquist WI, Berg JM. Toward a sustainable biomedical research enterprise: finding consensus and implementing recommendations. Proc Natl Acad Sci U S A. 2015;112:10832–6.

National Research Council. Research universities and the future of America: ten breakthrough actions vital to our nation’s prosperity and security. Washington, DC: National Academies Press; 2012.

American Academy of Arts and Sciences. Restoring the Foundation: the vital role of Research in preserving the American Dream: report brief. American Academy of Arts & Sciences; 2014.

Download references

Acknowledgements

Thanks to Mary Wimmer and Drew Shiemke for many discussions over the years about PBL in the medical curriculum and examples of case studies. We thank Steve Treisenberg for initial suggestions and discussions regarding PBL effectiveness in the Van Andel Institute. Thanks to Paul and Julie Lockman for discussions about PBL in School of Pharmacy curricula and examples of case studies. Special thanks to the facilitators of the groups, Stan Hileman, Hunter Zhang, Paul Chantler, Yehenew Agazie, Saravan Kolandaivelu, Hangang Yu, Tim Eubank, William Walker, and Amanda Gatesman-Ammer. Without their considerable efforts the course could never have been successfully implemented. Thanks to the Department of Biochemistry and Molecular Medicine for supporting the development of this project. MS is the director of the Cell & Molecular Biology and Biomedical Engineering Training Program (T32 GM133369).

There was no funding available for this work.

Author information

Authors and affiliations.

Department of Biochemistry and Molecular Medicine, West Virginia University School of Medicine, Robert C. Byrd Health Sciences Center 64 Medical Center Drive, P.O. Box 9142, Morgantown, WV, 26506, USA

Michael D. Schaller, Marieta Gencheva, Michael R. Gunther & Scott A. Weed

You can also search for this author in PubMed   Google Scholar

Contributions

SW and MS developed the concept for the course. MS was responsible for creation and development of all of the content, for the implementation of the course, the design of the study and creating the first draft of the manuscript. MG, MRG and SW graded the pre- and post-test answers in a blind fashion. MS, MG, MRG and SW analyzed the data and edited the manuscript.

Corresponding author

Correspondence to Michael D. Schaller .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethics approval and consent to participate

The West Virginia University Institutional Review Board approved the study (WVU IRB Protocol#: 2008081739). Informed consent was provided in writing and all information was collected anonymously. All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not applicable.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, supplementary material 5, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Schaller, M.D., Gencheva, M., Gunther, M.R. et al. Training doctoral students in critical thinking and experimental design using problem-based learning. BMC Med Educ 23 , 579 (2023). https://doi.org/10.1186/s12909-023-04569-7

Download citation

Received : 04 March 2023

Accepted : 05 August 2023

Published : 16 August 2023

DOI : https://doi.org/10.1186/s12909-023-04569-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Critical thinking
  • Experimental design
  • Doctoral Student

BMC Medical Education

ISSN: 1472-6920

critical thinking experimental design

Critical Thinking in Science

Author: Daniell DiFrancesca
Level: Middle School
Content Area: General Science

Part 1: Introduction to Experimental Design

  • Part 2: The Story of Pi
  • Part 3: Experimenting with pH
  • Part 4: Water Quality
  • Part 5: Change Over Time
  • Part 6: Cells
  • Part 7: Microbiology and Infectious Disease
  • About the Author

critical thinking experimental design

Introduction:

Students will learn and implement experimental design vocabulary while practicing their critical thinking skills in an inquiry based experiment. This lesson is written using the 5E Learning Model.

Learning Outcomes:

  • Students will define and apply the experimental design vocabulary.
  • Students will use the experimental design graphic organizer to plan an investigation.
  • Students will design and complete their own scientific experiment.

Curriculum Alignment:


1.01 Identify and create questions and hypotheses that can be answered through scientific investigations.

1.02 Develop appropriate experimental procedures for:

  • Given questions.
  • Student generated questions.

1.04 Analyze variables in scientific investigations:

  • Identify dependent and independent.
  • Use of a control.
  • Manipulate.
  • Describe relationships between.
  • Define operationally.

1.05 Analyze evidence to:

  • Explain observations.
  • Make inferences and predictions.
  • Develop the relationship between evidence and explanation.

1.06 Use mathematics to gather, organize, and present quantitative data resulting from scientific investigations:

  • Measurement.
  • Analysis of data.
  • Prediction models.

1.08 Use oral and written language to:

  • Communicate findings.
  • Defend conclusions of scientific investigations.
  • Describe strengths and weaknesses of claims, arguments, and/or data

Classroom Time Required:

Approximately 6 class periods (~50 minutes each) are needed, however, some things can be assigned as homework to decrease the time spent in class.

Materials Needed:

  • Overhead transparency of Experimental Design Graphic Organizer
  • Student copies of: Experimental Design Graphic Organizer, Vocabulary Graphic Organizer, Explore worksheet, Explain worksheet
  • Supplies for experiment: Dixie drinking cups, Pepsi and Coke (~1.5 to 2 ounces per student) Possibly ice to keep soda cold
  • Copies of worksheet, Overhead, Small drinking cups (2 per student), Pepsi and Coke
  • Dictionaries or definition sheets for the vocabulary words

Technology Resources:

  • Overhead Projector

Pre-Activities/ Activities:

  • What are the “rules” for designing an experiment?
  • Teacher and class will discuss the following questions:
  • Is there a specific way to design an experiment? (Try to get them to mention the scientific method and discuss any “holes” in this.
  • Are their rules scientists follow when designing an experiment?
  • Are all experiments designed the same?
  • What kinds of experiments have you done on your own? (Good things to discuss are cooking, testing sports techniques, trying to fix things, etc….. Try to relate experimentation to their everyday life.)
  • Review an experiment and answer questions.
  • Students will read a description of an experiment and answer questions about the design of the experiment without using the vocabulary. (See Worksheet 1)
  • Vocabulary introduction and application
  • Students will define the experimental design vocabulary using the graphic organizer (See Worksheet 2).
  • Independent variable, Dependent variable, Control, Constant, Hypothesis, Qualitative observation, Quantitative observation, Inference (Definitions available)
  • Students will review the worksheet from the explore section and match the vocabulary to the pieces of the experiment. Review answers with the class.
  • Students will read a second experiment description and identify the pieces of the experiment using their vocabulary definitions (See Worksheet 3).
  • Introduce Experimental Design Graphic Organizer (EDGO) and complete class designed experiment.
  • The teacher should review the EXAMPLE PEPSI VS COKE EDGO for any ideas or questions
  • Use overhead projector to review the blank EDGO and complete as a class (See Worksheet 4)
  • Tell the class that you are going to do the Pepsi Coke Challenge. The question they need to answer is: Can girls taste the difference between Pepsi and Coke better than the boys?
  • As a class, plan the Pepsi verses Coke experiment. This is a good time to discuss double blind studies and why it is important to make this a double blind study. Students can look at the results within their own class as well as the whole team.
  • This is a good chance to also test multiple variables. You do not need to let students know this, but if the data chart also records things like age, frequency of drinking soda (daily, weekly, monthly, rarely), ability to roll tongue, or anything else they think might be interested in, the results can be analyzed for each variable.
  • I removed labels from the bottles and labeled them A and B. I used a different labeling system for under the cups so the students did not see a pattern (numbered cups were Pepsi, lettered cups were Coke).
  • I recorded the data and organized the tasting while students completed other work in their seats. Two students at a time tasted the soda and I recorded data. You could also have a volunteer who is not participating help with this.
  • Check for students who do not want to drink soda as well as any dietary needs such as diet soda.
  • Do not verify guesses until all of the classes have completed the experiment.
  • How can you accurately remember the pieces of an experiment?
  • Write a poem about four of the vocabulary words.
  • Write a song about four of the vocabulary words.
  • Create a memorization tool for four of the vocabulary words.
  • Make a poster about four of the vocabulary words.

Teachers should evaluate these choices to ensure they show an understanding of the vocabulary.

Assessment:

See Evaluate piece of Activities Section.

Modifications:

  • A different experiment can be designed in the Elaborate section.
  • EDGO can be edited for any motor skill deficiencies by making it larger, or making it available to be typed on.
  • All basic modifications can be used for these activities.

Alternative Assessments:

  • Make necessary adjustments for different experiments.

Critical Vocabulary

  • Independent Variable- the part of the experiment that is controlled or changed by the experimenter
  • Dependent Variable- the part of the experiment that is observed or measured to gather data; changes because of the independent variable
  • Control- standard of comparison in the experiment; level of the independent variable that is left in the natural state, unchanged
  • Constant- part of the experiment that is kept the same to avoid affecting the independent variable
  • Hypothesis- educated guess or prediction about the experimental results
  • Qualitative observation- word observations such as color or texture
  • Quantitative observation- number observations or measurements
  • Inference- attempt to explain the observations

This is the first lesson in the Critical Thinking in Science Unit. The other lessons continue using the vocabulary and Experimental Design Graphic Organizer while teaching the 8th grade content. Students are designing their own experiments to improve their ability to approach problems and questions scientifically. By developing their ability to reason through problems they are becoming critical thinkers.

Supplemental Files: 

application/pdf icon

Address: Campus Box 7006. Raleigh, NC 27695 Telephone: 919.515.5118 Fax: 919.515.5831 E-Mail: [email protected]

Powered by Drupal, an open source content management system

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Editor's Choice
  • Special Collections
  • Author Guidelines
  • Submission Site
  • Open Access
  • Reasons to submit
  • About BioScience
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Potentially Offensive Content
  • Terms and Conditions
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

Development of in-class paper-and-pencil activities focused on experimental design, implementation of the activities, assessment of experimental design ability, identification of students’ accurate and inaccurate conceptions regarding experimental design, comparing introductory students with advanced students to determine whether inaccurate conceptions persist throughout the curriculum, statistical analysis, conclusions, references cited.

  • < Previous

How Students Think about Experimental Design: Novel Conceptions Revealed by in-Class Activities

  • Article contents
  • Figures & tables
  • Supplementary Data

Sara E. Brownell, Mary Pat Wenderoth, Roddy Theobald, Nnadozie Okoroafor, Mikhail Koval, Scott Freeman, Cristina L. Walcher-Chevillet, Alison J. Crowe, How Students Think about Experimental Design: Novel Conceptions Revealed by in-Class Activities, BioScience , Volume 64, Issue 2, February 2014, Pages 125–137, https://doi.org/10.1093/biosci/bit016

  • Permissions Icon Permissions

Experimental design is a fundamental skill for scientists, but it is often not explicitly taught in large introductory biology classes. We have designed two pencil-and-paper in-class activities to increase student understanding of experimental design: an analyze activity, in which students are asked to evaluate data, and a design activity, in which students are asked to propose a novel experiment. We found that students who completed the design activity but not the analyze activity performed significantly better on the Expanded Experimental Design Ability Tool (E-EDAT) than did students who attended a didactic lecture about experimental design. By using grounded theory on student responses on the in-class activities, we have identified a novel set of accurate and inaccurate conceptions focused on two aspects of experimental design: sample size and the repetition of experiments. These findings can be used to help guide science majors through mastering the fundamental skill of designing rigorous experiments.

Experimental design is a fundamental skill, essential for achieving success in science (Coil et al. 2010 ) and gaining fluency in scientific literacy and critical thinking in general (Brewer and Smith 2011 ). However, explicit instruction and practice in experimental design is often lacking in introductory biology lecture courses because of perceived time pressures, large class sizes, and the need to emphasize content rather than skills (Dirks and Cunningham 2006 ).

Efforts have been made to develop activities to test and improve students’ skills and knowledge of experimental design (Hoefnagels 2002 , Hiebert 2007 , White et al. 2009 , Pollack 2010 , Sirum and Humburg 2011 , D’Costa and Schlueter 2013 , Kloser et al. 2013 ). In addition, students’ understanding of the significance of controls has been addressed in many studies (Shadmi 1981 , Chen and Klahr 1999 , Lin and Lehman 1999 , Boudreaux et al. 2008 , Shi et al. 2011 ). In biology, the development of a validated assessment to measure student understanding of experimental design (Sirum and Humburg 2011 ) illustrated how challenging it is for introductory-level college students to design well-controlled experiments. However, despite this body of research, there are few studies in which students’ accurate and inaccurate conceptions about experimental design have been identified (Kanari and Millar 2004 , White et al. 2009 , Colon-Berlingeri and Burrows 2011 ). Therefore, there is a need for additional studies to investigate how best to teach this topic to introductory biology students and what aspects of experimental design are particularly difficult for students to grasp.

In this Education article, we describe two pencil-and-paper in-class group activities designed to test alternative hypotheses about how best to teach experimental design in a large introductory biology classroom. We describe the relative effectiveness of these activities in the improvement of students’ experimental design ability and discuss specific accurate and inaccurate conceptions that we identified from student responses to the in-class activities. To determine whether inaccurate conceptions persist as students progress through the undergraduate biology curriculum, we have also assessed students enrolled in upper-level biology courses on their understanding of two key elements of experimental design: sample size and repeating an experiment.

Numerous studies have demonstrated that active learning approaches more effectively increase student learning than do traditional lectures (Hake 1998 , Beichner et al. 2007 , Freeman et al. 2007). However, there have been few studies on the relative effectiveness of different types of active learning approaches for helping students learn specific concepts (Eddy et al. 2013 ). Our first goal was to develop in-class activities to test alternative hypotheses for how best to teach experimental design in a large introductory biology lecture hall. On the basis of previous studies (Boudreaux et al. 2008 , Crowe et al. 2008 ), we reasoned that student understanding of experimental design could be improved by working in a group (a) to develop a hypothesis and design an experiment to test that hypothesis (the design activity; see supplemental appendices A1 and A2) or (b) to analyze and draw appropriate conclusions from mock experimental data (the analyze activity; see supplemental appendices B1 and B2). Both tasks require higher-order thinking and are considered high level on Bloom's taxonomy of cognitive domains (Bloom and Krathwohl 1956 ); however, the first activity requires synthesis-level skills, whereas the second relies on analysis and evaluation skills.

The in-class activities were based on a nontechnical scenario approachable for students taking their first biology course. Both activities prompted the students to consider the basic elements of a well-designed experiment and were structured as a series of guided prompts (Lin and Lehman 1999 ). We designed both activities with the goal of improving students’ ability to explain the importance of the elements included in an experimental design and to recognize the iterative nature of science and the tentative nature of results (Giere 2004 ). The activities were pilot tested twice with students in a large introductory biology course and revised before the final versions were administered. The data reported in this article are from the student responses to the final version of the activities.

Students enrolled in Biology 180, the first course of a three-quarter introductory biology series at a large public research university, completed the activities on the second day of class in autumn 2011. This introductory biology course is a required gateway course for all students interested in majoring in biological sciences and is focused on the topics of ecology and evolution. Typical enrollment is approximately 600–800 students, primarily sophomores. All of the students attend the same lecture period and are asked to sit with students from their lab sections. For the in-class activity, the ­students self-aggregated into groups of two or three on the basis of where they were sitting in the large traditional lecture hall, and the student groups were randomly assigned one of the two activities to complete in the lecture hall. While they were working on the activity, the students could request help from other groups, the instructor, or teaching assistants (TAs). One instructor and 16 TAs (approximately 40 students per TA) circulated around the room to answer questions while the students were working on the activities. Because these activities were administered on the second day of class, the TAs had only minimal experience facilitating group discussions, so they primarily responded to student-generated questions. Participation points were given for completing the activity, independent of performance. Approval for this study was obtained from the University of Washington Institutional Review Board (application no. 36743).

To measure the impact of the in-class activity on students’ ability to design an experiment, we adapted the Experimental Design Ability Tool (EDAT; Sirum and Humburg 2011 ) to create the Expanded EDAT (E-EDAT). The EDAT is an open-ended response instrument administered as a pre- and posttest in which students are asked to design an investigative strategy to address a company's claim regarding one of its products (Sirum and Humburg 2011 ). The instrument is content independent; it works particularly well for introductory biology students and nonmajors, because it does not require any technical expertise. However, in our initial administration of the EDAT, we found that several of the grading criteria did not discriminate among our students. To enhance the discrimination ability of the test, we created the E-EDAT by adding prompts to the EDAT that direct students to (a) design an experiment that would test a company's claim, (b) provide justification for each element of their research design, and (c) state whether the conclusions drawn from their proposed study could prove the company's claim (supplemental appendix C1).

To score the students’ responses on the E-EDAT, we developed an expanded scoring rubric (supplemental appendix C2) that awarded the students points for recognizing that they needed to include an experimental design element (e.g., a large sample size) and for giving an appropriate explanation of why that element was needed (e.g., to account for natural variability in the population). We were particularly interested in their reasoning, because it had previously been shown that students often understand what components are important for experimental design but do not necessarily know why they should use them (Boudreaux et al. 2008 ). We did not introduce any novel elements to the E-EDAT, but through an iterative process based on student responses on the EDAT, we altered the scoring to reflect both an inclusion of essential elements and appropriate reasoning for including those elements. Whether a response warranted partial credit or full credit for a criterion on the E-EDAT was determined through an iterative process with four different raters, who scored the E-EDAT responses independently and then held norming sessions to come to agreement on what score to give a particular response. Content validity of the final rubric was affirmed by asking three experts in biology to confirm that the scoring rubric was scientifically accurate and relevant to the understanding of experimental design. Unlike the binary system employed in the original EDAT rubric, this expanded scoring system allowed us to identify students with an intermediate understanding of a concept. The students could receive 17 points using the E-EDAT scoring rubric, as opposed to the 10 points possible on the original EDAT (Sirum and Humburg 2011 ).

The students completed the E-EDAT online, outside of class; the pretest was completed the evening before the in-class activity, and the posttest was completed the evening after the in-class activity. Only those students who participated in the in-class activity, completed the pre- and posttests, and consented to have their data analyzed were included in the study. From a consenting population of 357 students who completed the design activity and 276 students who completed the analyze activity, we selected a random subset of pre- and posttests to score with the expanded rubric and then included only the responses from the consenting students in the final analysis ( n = 87 in the design activity group; n = 95 in the analyze activity group). All of our future references to the effectiveness of the in-class activities are based on the data from this randomly selected subset of the students’ pre- and posttests. The E-EDATs were scored blindly by two independent graders. To assess the level of agreement between the graders, we calculated the interrater reliability using Cohen's kappa coefficient for a subset of blinded responses (90 responses each on the pre- and posttests) graded independently by each grader. The interrater reliability on individual questions ranged from .54 to .89, with a reliability of .76 across all of the questions. This indicates moderately strong agreement between the two graders and illustrates that the expanded scoring rubric provides sufficient guidance to achieve reliability between independent graders.

We compared the experimental design ability of students who completed the in-class activities with that of students who learned experimental design through a traditional lecture by measuring learning gains on the E-EDAT. The students in all three comparison groups (i.e., design activity, analyze activity, lecture) were enrolled in Biology 180, but the students who participated in the didactic lecture took the course in a different term and with a different instructor. The students in the traditional lecture course were assigned the same readings and received instruction of equivalent length to the in-class activities on the scientific method and experimental design. The lecture included examples of experimental data that illustrated inherent variation in a population and variable outcomes from repeating an experiment with a different population. The lecture also included explicit statements that a hypothesis can never be definitively proven (one of the elements scored on the assessment). One hundred student pre- and posttest responses were randomly selected from the lecture group and scored; we compared the lecture students’ learning gains with the gains attained by the students who completed the activities.

Many terms have been used to describe student conceptions, including naive conceptions (Strike and Posner 1992 ), alternative conceptions (Mak et al. 1999 , Poehnl and Bogner 2013 ), preconceptions (Clement et al. 1989 , Ryan and Aikenhead 1992 ), misconceptions (Coley and Tanner 2012 , Yates and Marek 2013 ), and inaccurate conceptions (Zuckerman 1994 , Edens and Potter 2003 ). In this article, we will use a model of describing student conceptions as either accurate or inaccurate, defining accurate as being in accordance with what is known to be scientifically true and confirmed by a group of expert scientists. The students’ 
handwritten responses to questions posed on the in-class activity worksheets were transcribed and coded as accurate, inaccurate, a mixed model that was a combination of accurate and inaccurate, or too vague to determine accuracy. A response was classified as vague if it was incomplete, did not answer the question, or was so general that we could not determine whether the student held an inaccurate or accurate conception. We took a conservative approach in our analysis by removing vague answers from the data set so that we did not incorrectly infer what the students were thinking (Gormally et al. 2012 ).

We chose to focus on two aspects of experimental design that we determined were challenging for the students on the basis of their low E-EDAT scores: sample size and repeating an experiment (supplemental table S1). The majority of the students’ answers on the E-EDAT did not include any mention of sample size or repetition, which may be because of the open-ended nature of the E-EDAT, which does not contain specific prompts for students to address each of these aspects of experimental design. Alternatively, it could be because the students did not think that sample size and repeating an experiment are important elements of experimental design. The commonality between these two elements is that both sample size and repeating an experiment are relevant to one's confidence in a conclusion based on a given set of data and require an understanding of the inherent variation that exists in biological populations; they both help students understand the iterative, tentative nature of scientific results.

In order to assess the quality of the students’ conceptions, we analyzed their responses on the in-class worksheets, because the students were explicitly asked to consider sample size and repeating an experiment as they completed those worksheets (table 1 ). Grounded theory was used to identify specific conceptions—both accurate and inaccurate conceptions—that the students held about sample size and repeating an experiment from the in-class activities. Grounded theory is a process by which researchers do not hold previous ideas or hypotheses about the data; rather, the themes emerge from the data itself (Glaser and Strauss 1967 ). We decided to use this method as a way to examine the students’ responses without prior bias in order to uncover potentially novel conceptions. Two raters then scored the students’ written answers on the in-class activities for the presence of these conceptions. The raters’ agreement averaged 70%, and disagreement in coding was discussed to achieve consensus. To achieve expert validation, we asked a group of five expert biologists ( expert was defined as holding a PhD in a biology-related field) in our research group to review the list of identified accurate and inaccurate conceptions (see the tables), and they agreed with the raters’ designations. In addition, we asked a group of three outside expert biologists to confirm the designations.

Prompts on each of the activities for sample size and repeating an experiment.

ActivitySample sizeRepeating an experiment
DesignWhy did you choose that number of poppies?Should you repeat the experiment? Why or why not?
AnalyzeWhy is sample size important?Why was the experiment repeated?
ActivitySample sizeRepeating an experiment
DesignWhy did you choose that number of poppies?Should you repeat the experiment? Why or why not?
AnalyzeWhy is sample size important?Why was the experiment repeated?

We also surveyed undergraduate biology majors enrolled in 400-level (senior-level) courses ( n = 122) to assess their understanding of the importance of sample size and repeating an experiment and to investigate whether they maintained the inaccurate conceptions held by the introductory students. Using an online survey, the advanced students were asked a subset of the questions from the analyze activity, because the question prompts were more direct and elicited fewer vague conceptions than did the question prompts of the design activity. These students received participation points for completing the questions regardless of the accuracy of their responses. The same two independent raters who coded the introductory student responses coded the advanced student responses, using the same set of categories described for the introductory students. Rater agreement was established to be over 70%, and disagreements in coding were discussed to achieve consensus.

As a preliminary analysis, Student's t -tests were used to compare the students’ gain scores on the E-EDAT among the three groups: the students who completed the design activity, the students who completed the analyze activity, and the students who were in the lecture course. The gains were calculated as the posttest score minus the pretest score. However, there are some differences in the characteristics of the students in the three groups (see table 2 ) that may be correlated with the test score gains. To control for these differences, we used a multiple linear regression model in which each student's test score gain was the response variable, and observable student characteristics, including gender, ethnicity, socioeconomic status, grade point average, and verbal SAT score, serve as predictor variables. This regression model produces estimates of two treatment effects: the treatment effect of the design activity relative to the lecture and the treatment effect of the analyze activity relative to the lecture. In each case, the treatment effect is the average difference in test score gains between the two groups, holding observable characteristics of students in the two groups constant.

Demographic information for the subset of students who completed the activities or experienced a didactic lecture and whose scores were analyzed.

Design ( = 87)Analyze ( = 95)Lecture ( = 100)
Grade point average3.36*3.283.23
SAT verbal score609593585
Low socioeconomic status (percentage of the respondents,%)8.0*12.619.0
Racial or ethnic identity (%)
African American2.34.22.0
American Indian1.10.03.0
Asian33.334.739.0
White51.743.237.0
Hawaiian or Pacific Islander1.11.12.0
Hispanic2.32.15.0
International5.79.58.0
No race information2.35.34.0
Female (%)63.262.166.0
Design ( = 87)Analyze ( = 95)Lecture ( = 100)
Grade point average3.36*3.283.23
SAT verbal score609593585
Low socioeconomic status (percentage of the respondents,%)8.0*12.619.0
Racial or ethnic identity (%)
African American2.34.22.0
American Indian1.10.03.0
Asian33.334.739.0
White51.743.237.0
Hawaiian or Pacific Islander1.11.12.0
Hispanic2.32.15.0
International5.79.58.0
No race information2.35.34.0
Female (%)63.262.166.0

a Low socioeconomic status was measured by admission into the Equal Opportunity Program.

* p < .05.

In order to determine significant differences in the student conceptions derived from grounded theory, a chi-squared analysis was used to compare the inaccurate, mixed model, and accurate conceptions, and t -tests were used for comparing the specific student conceptions ( α = .05).

Finding 1: An active approach leads to greater understanding of experimental design than passive lecturing does.

The introductory biology students showed gains in their ability to design an experiment after the in-class pencil-and-paper activities, independent of the type of activity, and the students who completed the design activity had significantly higher gains than did the students who attended a didactic lecture.

We found that the students who completed the design and the students who completed the analyze activity had similar learning gains when their pre- and posttests were compared ( p = .21). However, only the students who completed the design activity demonstrated significantly higher gains on the E-EDAT than the group of students who learned about experimental design through the didactic lecture ( t- test, p < .05; figure 1 ). The average score on the posttest for all of the groups was 7.6 out of 17, with no individual group scoring higher than 8.1, which indicated no ceiling effect (table S1). The learning gains could be seen in multiple criteria (table S1) and could not be attributed to one particular aspect of the E-EDAT.

Students who completed the design activity scored higher on the Expanded Experimental Design Ability Tool (E-EDAT) than did students who experienced only a lecture on experimental design (design activity, n = 87; analyze activity, n = 95; lecture, n = 100; design–analyze comparison, p = .210; design–lecture comparison, p = .018; analyze–lecture comparison, p = .275). The gain was calculated as the posttest score minus the pretest score. The error bars represent the standard error.

Students who completed the design activity scored higher on the Expanded Experimental Design Ability Tool (E-EDAT) than did students who experienced only a lecture on experimental design (design activity, n = 87; analyze activity, n = 95; lecture, n = 100; design–analyze comparison, p = .210; design–lecture comparison, p = .018; analyze–lecture comparison, p = .275). The gain was calculated as the posttest score minus the pretest score. The error bars represent the standard error.

The results of the linear regression model used to control for differences in the observable characteristics of the students in the three groups (table 2 ) indicate that, holding individual student characteristics constant, the students who completed the design experiment gained almost half a point more out of 17 possible points, on average, than did the students in the lecture class, so the differences in the E-EDAT scores are likely due to the differences in activities, not to differences in the student population ( p = .022; table 3 ). The observation that the students in the lecture group did not show gains from the pretest to the posttest indicates that the process of taking the E-EDAT itself did not lead to learning.

Multiple linear regression of gain scores against individual student characteristics.

Variable correlated with gain scoreCoefficientStandard error
Intercept.0500.8310.060.952
Grade point average–.0160.223–0.071.944
SAT verbal score.0000.001–0.127.899
Underrepresented minority.4410.3961.112.267
Low socioeconomic status–.2650.334–0.793.429
Female.2840.1821.563.119
group.4980.2172.298.022*
group.2480.2111.175.241
Variable correlated with gain scoreCoefficientStandard error
Intercept.0500.8310.060.952
Grade point average–.0160.223–0.071.944
SAT verbal score.0000.001–0.127.899
Underrepresented minority.4410.3961.112.267
Low socioeconomic status–.2650.334–0.793.429
Female.2840.1821.563.119
group.4980.2172.298.022*
group.2480.2111.175.241

Finding 2: Introductory students do not have a strong understanding of the importance of sample size and repeating an experiment.

The analysis of the introductory students’ written responses on the in-class activity worksheets revealed that the students harbored three distinct levels of understanding about sample size and repeating an experiment: accurate, inaccurate, and a combination of accurate and inaccurate conceptions (figure 2a , 2b ). If we combine the student responses that were completely inaccurate with those that contained mixed conceptions, we see that the majority of the student responses on the design activity and the analyze activity contained some inaccurate conceptions about both sample size ( design , 75.2%; analyze , 69.8%) and repeating an experiment ( design , 63.4%; analyze , 81.1%). The students’ responses revealed more inaccurate conceptions surrounding the purpose of repeating an experiment than regarding the importance of sample size for both the design (45.5%) and the analyze (52.2%) activities ( t -test, p < .01). Interestingly, the design activity responses contained significantly fewer inaccurate conceptions about sample size than did the analyze activity responses (10.2% and 39.6%, respectively; t -test, p < .01).

Introductory students’ conceptions of (a) sample size and (b) repeating an experiment. For responses regarding the rationale for sample size, the data are shown as percentages of the total number of student group responses (n = 69 for the design group, n = 96 for the analyze group). The student group responses for the design activity were completely inaccurate (10.2%), completely accurate (24.6%,) or a mix of accurate and inaccurate (65.2%) conceptions; the differences between these groups are statistically significant (chi-squared analysis, p < .001). The student group responses for the analyze activity were completely inaccurate (39.6%), completely accurate (30.2%), or a mix of accurate and inaccurate (30.2%). Chi-squared analysis indicates that these differences are not statistically significant. The student responses that were too vague to code have been removed. There was a statistically significant difference between the design and analyze student responses for inaccurate and mixed conceptions (Student's t-test, p < .01). For responses regarding the rationale for repeating an experiment, the data are shown as percentages of the total number of student responses (n = 112 for the design group and n = 90 for the analyze group). The student responses for the design activity were completely inaccurate (45.5%), completely accurate (36.6%), or a mix of accurate and inaccurate (17.9%). The student responses for the analyze activity were completely inaccurate (52.2%), completely accurate (18.9%), or a mix of accurate and inaccurate (28.9%). Chi-squared analysis for both the design and the analyze groups indicates that these are statistically significant differences (p < .001).

Introductory students’ conceptions of (a) sample size and (b) repeating an experiment. For responses regarding the rationale for sample size, the data are shown as percentages of the total number of student group responses ( n = 69 for the design group, n = 96 for the analyze group). The student group responses for the design activity were completely inaccurate (10.2%), completely accurate (24.6%,) or a mix of accurate and inaccurate (65.2%) conceptions; the differences between these groups are statistically significant (chi-squared analysis, p < .001). The student group responses for the analyze activity were completely inaccurate (39.6%), completely accurate (30.2%), or a mix of accurate and inaccurate (30.2%). Chi-squared analysis indicates that these differences are not statistically significant. The student responses that were too vague to code have been removed. There was a statistically significant difference between the design and analyze student responses for inaccurate and mixed conceptions (Student's t-test, p < .01). For responses regarding the rationale for repeating an experiment, the data are shown as percentages of the total number of student responses ( n = 112 for the design group and n = 90 for the analyze group). The student responses for the design activity were completely inaccurate (45.5%), completely accurate (36.6%), or a mix of accurate and inaccurate (17.9%). The student responses for the analyze activity were completely inaccurate (52.2%), completely accurate (18.9%), or a mix of accurate and inaccurate (28.9%). Chi-squared analysis for both the design and the analyze groups indicates that these are statistically significant differences ( p < .001).

Notably, there were significantly more vague answers to the question about sample size in the design activity (47.3%) than in the analyze activity (12.7%) (data not shown). However, when students were prompted to provide reasoning for repeating an experiment, their responses contained similar percentages of vague responses in both activities ( design , 11.8%; analyze , 17.4%). Vague responses may reflect confused thinking, a misinterpretation of the question, or a low level of motivation to answer the question.

Finding 3: Novel accurate conceptions and inaccurate conceptions were identified from introductory student responses for sample size and repeating an experiment on the in-class activities.

In order to further explore what conceptions the students held about sample size and repeating an experiment, we used grounded theory to identify three distinct accurate conceptions (table 4a ) and three distinct inaccurate conceptions (table 4b ) about sample size. We also identified three distinct accurate conceptions (table 5a ) and seven distinct inaccurate conceptions (table 5b ) about repeating an experiment. Several students who completed the design activity (8.9%) stated that it was not necessary to repeat an experiment, particularly if the sample size was large enough. Because the analyze activity did not allow the students this option, we cannot conclude whether this idea is a general inaccurate conception held by introductory students or whether the nature of the design exercise led the students to this conception.

Introductory students’ accurate conceptions about sample size.

Category of accurate conceptionsExample student responseDesign activityAnalyze activity
It is better to have a larger sample size than a smaller one“Large enough sample size to draw conclusion from”82.653.1*
Too big of a sample size is not cost effective or manageable“Large enough, but not terribly difficult to organize/take care of”30.42.1*
A large sample size is needed because of inherent variation in a given population“Sample size should be large in order to average out natural variation in a population”11.610.4
Category of accurate conceptionsExample student responseDesign activityAnalyze activity
It is better to have a larger sample size than a smaller one“Large enough sample size to draw conclusion from”82.653.1*
Too big of a sample size is not cost effective or manageable“Large enough, but not terribly difficult to organize/take care of”30.42.1*
A large sample size is needed because of inherent variation in a given population“Sample size should be large in order to average out natural variation in a population”11.610.4

Note: The data are shown as percentages of the number of student responses on each in-class activity ( n = 69 for the design group, n = 96 for the analyze group). Student responses that were too vague to code have been removed.

* p < .05 (Student's t -test).

Introductory students’ inaccurate conceptions about sample size.

Category of inaccurate conceptionsWhy the conception is incorrectExample student responseDesign activityAnalyze activity
Larger sample size ensures randomized or controlled resultsA large sample size can still be biased if only certain individuals are chosen (e.g., sampling error)“Large sample size—randomization”11.616.7
A larger sample size gives more accurate dataA larger sample size may yield a more accurate interpretation of the data but not necessarily more accurate data if the data collected are all outliers (e.g., sampling error)“Larger sample size, more accurate data”26.115.6
A larger sample size eliminates variables, chance, or outliersA larger sample size can decrease the impact of variables and outliers but does not decrease their number“It's a large sample size to decrease unusual data”37.738.5
Category of inaccurate conceptionsWhy the conception is incorrectExample student responseDesign activityAnalyze activity
Larger sample size ensures randomized or controlled resultsA large sample size can still be biased if only certain individuals are chosen (e.g., sampling error)“Large sample size—randomization”11.616.7
A larger sample size gives more accurate dataA larger sample size may yield a more accurate interpretation of the data but not necessarily more accurate data if the data collected are all outliers (e.g., sampling error)“Larger sample size, more accurate data”26.115.6
A larger sample size eliminates variables, chance, or outliersA larger sample size can decrease the impact of variables and outliers but does not decrease their number“It's a large sample size to decrease unusual data”37.738.5

Introductory students’ accurate conceptions about repeating an experiment.

Category of accurate conceptionsExample student responseDesign activityAnalyze activity
Repeating an experiment increases confidence in the data“Yes, more trials will show that the experiment is replicable”35.743.3
Repetition reduces the likelihood that uncontrolled variable affected the results“Yes, to account for uncontrolled variables (such as animals and insects)”7.13.3
Repetition reduces the impact of chance or randomness on the interpretations“To verify that nothing happened by accident to change the outcome”10.76.7
Repetition is needed because of inherent variation in a given population“Some poppy seeds might not be from the same gene pool”0.90
Category of accurate conceptionsExample student responseDesign activityAnalyze activity
Repeating an experiment increases confidence in the data“Yes, more trials will show that the experiment is replicable”35.743.3
Repetition reduces the likelihood that uncontrolled variable affected the results“Yes, to account for uncontrolled variables (such as animals and insects)”7.13.3
Repetition reduces the impact of chance or randomness on the interpretations“To verify that nothing happened by accident to change the outcome”10.76.7
Repetition is needed because of inherent variation in a given population“Some poppy seeds might not be from the same gene pool”0.90

Note: The data are shown as percentages of the number of student responses on each in-class activity ( n = 112 for the design group, n = 90 for the analyze group). Student responses that were too vague to code have been removed.

Introductory students’ inaccurate conceptions about repeating an experiment.

Category of inaccurate conceptionsWhy the conception is incorrectExample student responseDesign activityAnalyze activity
It is not necessary to repeat an experimentExperiments need to be repeated“No, the sample size should account for any differences”8.9*
Repeat to increase sample sizeRepeating an experiment gives a replicate, not a larger sample size“Repeated to create a larger sample size”9.812.2
Repeat to change a variableWhen repeating an experiment, all variables should remain constant“To see how results will vary with diff[erent] variables”5.46.7
Repeat only to avoid making errorsThis is not the only reason one would repeat an experiment“Repeated to reduce effects of making mistakes”8.010.0
Repeat to eliminate outliers, chance, or variationRepeating an experiment can decrease the impact of variables and outliers but does not decrease their number“To eliminate the possibility of an anomaly”17.022.2
Repeat to make data—not the interpretation—more accurateRepeating an experiment may give a more accurate interpretation of the data but not necessarily more accurate data if there were an uncontrolled variable affecting the accuracy“To make the results more accurate”17.928.9
Repeat to make certain or prove that the findings are correct (overstating the claim of what a repeated experiment could tell them)Too absolute; you cannot prove a hypothesis“To ensure the validity of the results”18.828.9
Category of inaccurate conceptionsWhy the conception is incorrectExample student responseDesign activityAnalyze activity
It is not necessary to repeat an experimentExperiments need to be repeated“No, the sample size should account for any differences”8.9*
Repeat to increase sample sizeRepeating an experiment gives a replicate, not a larger sample size“Repeated to create a larger sample size”9.812.2
Repeat to change a variableWhen repeating an experiment, all variables should remain constant“To see how results will vary with diff[erent] variables”5.46.7
Repeat only to avoid making errorsThis is not the only reason one would repeat an experiment“Repeated to reduce effects of making mistakes”8.010.0
Repeat to eliminate outliers, chance, or variationRepeating an experiment can decrease the impact of variables and outliers but does not decrease their number“To eliminate the possibility of an anomaly”17.022.2
Repeat to make data—not the interpretation—more accurateRepeating an experiment may give a more accurate interpretation of the data but not necessarily more accurate data if there were an uncontrolled variable affecting the accuracy“To make the results more accurate”17.928.9
Repeat to make certain or prove that the findings are correct (overstating the claim of what a repeated experiment could tell them)Too absolute; you cannot prove a hypothesis“To ensure the validity of the results”18.828.9

Finding 4: Some inaccurate conceptions are “sticky.”

The advanced students held fewer inaccurate and more accurate conceptions than did the introductory students who completed the analyze activity, but over a third of the advanced students continued to harbor inaccurate conceptions.

We surveyed the advanced biology majors’ understanding of experimental design by asking them the same questions about sample size and repeating an experiment that were included in the analyze activity. We found that the advanced students held significantly more accurate conceptions (advanced, 57.3%; introductory, 30.2%) and fewer inaccurate conceptions about sample size (advanced, 14.5%; introductory 39.6%) than did the introductory students who completed the analyze activity ( p s < .05; figure 3 ). Similarly, in describing their reasoning for repeating an experiment, the advanced students held significantly more accurate conceptions (advanced, 56.1%; introductory, 18.9%) and fewer inaccurate conceptions (advanced, 9.6%; introductory, 52.2%) than did the introductory students ( p s < .05).

Advanced students’ conceptions of (a) sample size and (b) repeating an experiment. For sample size, the data are shown as percentages of the total number of student responses (n = 110). The advanced student responses were completely inaccurate (14.5%), completely accurate (57.3%), or a mix of accurate and inaccurate (28.2%). Compared with 
the introductory students, who completed the analyze activity, the advanced students had significantly more accurate conceptions and fewer inaccurate conceptions (chi-squared analysis, p < .01). Student responses that were too vague to code have been removed. For repeating an experiment, the data are shown as percentages of the total number of student responses (n = 114). The advanced student responses were completely inaccurate (9.6%), completely accurate (56.1%), or a mix of accurate and inaccurate (34.2%). The advanced students had significantly more accurate conceptions than inaccurate conceptions (chi-squared analysis, p < .01). The introductory student group is the same as the analyze group (see figure 2).

Advanced students’ conceptions of (a) sample size and (b) repeating an experiment. For sample size, the data are shown as percentages of the total number of student responses ( n = 110). The advanced student responses were completely inaccurate (14.5%), completely accurate (57.3%), or a mix of accurate and inaccurate (28.2%). Compared with 
the introductory students, who completed the analyze activity, the advanced students had significantly more accurate conceptions and fewer inaccurate conceptions (chi-squared analysis, p < .01). Student responses that were too vague to code have been removed. For repeating an experiment, the data are shown as percentages of the total number of student responses ( n = 114). The advanced student responses were completely inaccurate (9.6%), completely accurate (56.1%), or a mix of accurate and inaccurate (34.2%). The advanced students had significantly more accurate conceptions than inaccurate conceptions (chi-squared analysis, p < .01). The introductory student group is the same as the analyze group (see figure 2 ).

When we combine the student responses that were completely inaccurate with those containing mixed conceptions, we see that the majority of responses from the advanced students contained accurate conceptions about both sample size (57.3%) and repeating an experiment (56.1%). However, we still identified a notable percentage of inaccurate conceptions in the advanced students’ responses (42.7% for sample size, 43.8% for repeating an experiment), which indicate that these conceptions about experimental design persist even among students who have almost completed the undergraduate biology curriculum.

We found that the advanced students held more accurate conceptions (table 6a ) and fewer inaccurate conceptions (table 6b ) about sample size than did their introductory-level counterparts. Specifically, significantly more advanced student responses (77.3%) than introductory student responses (53.1%) exhibited the accurate conception that a larger sample size is good ( p < .05; table 6a ). However, there was no significant difference in the advanced and introductory students’ recognition that too big of a sample size is not cost effective (advanced, 3.6%; introductory, 2.1%). Importantly, we did not observe a statistically significant difference between the advanced and introductory students’ recognition 
that sample size is important because of inherent natural biological diversity (advanced, 20%; introductory, 10.4%), the primary reason that most biologists would give for including a large sample size.

Advanced students’ accurate conceptions about sample size.

Category of accurate conceptionsPercentage of responses
A larger sample size is good77.3*
Too big of a sample size is not cost effective or manageable3.6
A large sample size is needed because of inherent variation in a given population20.0
Category of accurate conceptionsPercentage of responses
A larger sample size is good77.3*
Too big of a sample size is not cost effective or manageable3.6
A large sample size is needed because of inherent variation in a given population20.0

Note: The data are shown as percentages of the total number of advanced student responses ( n = 110). Student responses that were too vague to code have been removed. The advanced student responses were compared (using Student's t -tests) with the introductory student group responses on the analyze activity (see table 3a ).

Advanced students’ inaccurate conceptions about sample size.

Category of inaccurate conceptionsPercentage of responses
Larger sample size gives randomized or controlled results2.7*
A larger sample size gives more accurate data31.8*
A larger sample size eliminates variables, chance, or outliers4.5*
Category of inaccurate conceptionsPercentage of responses
Larger sample size gives randomized or controlled results2.7*
A larger sample size gives more accurate data31.8*
A larger sample size eliminates variables, chance, or outliers4.5*

Note: The data are shown as percentages of the total number of advanced student responses ( n = 110; see table 3a ). Student responses that were too vague to code have been removed. The advanced student responses were compared (using Student's t -tests) with the introductory student group responses on the analyze activity (see table 3a ).

In general, the advanced students held fewer inaccurate conceptions about sample size than did the introductory students (table 6b ); however, the inaccurate idea that a larger sample size gives more accurate data was more common in the advanced group (advanced, 31.8%; introductory, 15.6%).

The advanced students held more accurate conceptions and fewer inaccurate conceptions than did the introductory students (table 7a , 7b ) regarding the importance of repeating an experiment. A significantly higher percentage of the advanced students correctly stated that it is important to repeat an experiment to reduce the likelihood that an uncontrolled variable has affected the results (advanced, 21.9%; introductory, 3.3%) or to reduce the impact of chance or randomness on the interpretation of the results (advanced, 36.8%; introductory, 6.7%; p s < .05). However, the advanced students and the introductory students were equally likely to recognize that repeating an experiment and getting a reproducible result would increase confidence in the data (advanced, 52.6%; introductory, 43.3%; table 7a ). Significantly fewer of the advanced students held the inaccurate conception that you repeat an experiment to eliminate outliers or to account for chance (advanced, 0.9%; introductory, 22.2%), to make the data more accurate (advanced, 14.9%; introductory, 28.9%), or to make certain that the findings are correct (advanced, 13.2%; introductory, 28.9%; p s < .05; table 7b ).

Advanced students’ accurate conceptions about repeating an experiment.

Category of accurate conceptionsAdvanced students
Reproducibility increases confidence in data52.6
Reduce likelihood that uncontrolled variable affected results21.9*
Reduce impact of chance or randomness on interpretations36.8*
Category of accurate conceptionsAdvanced students
Reproducibility increases confidence in data52.6
Reduce likelihood that uncontrolled variable affected results21.9*
Reduce impact of chance or randomness on interpretations36.8*

Note: The data are shown as percentages of the total number of advanced student responses ( n = 114; see table 4a ). Student responses that were too vague to code have been removed. The advanced student responses were compared (using Student's t -tests) with the introductory student group responses on the analyze activity (table 4a ).

Advanced students’ inaccurate conceptions about repeating an experiment.

Category of inaccurate conceptionsAdvanced students
Repeat an experiment to increase sample size10.5
Repeat to change a variable1.8
Repeat to avoid making errors9.6
Repeat to eliminate outliers, chance, or variation0.9*
Repeat to make data more accurate14.9*
Repeat to make certain that findings are correct13.2*
Category of inaccurate conceptionsAdvanced students
Repeat an experiment to increase sample size10.5
Repeat to change a variable1.8
Repeat to avoid making errors9.6
Repeat to eliminate outliers, chance, or variation0.9*
Repeat to make data more accurate14.9*
Repeat to make certain that findings are correct13.2*

Note: The data are shown as percentages of the number of advanced student responses ( n = 114; see table 4b ). Student responses that were too vague to code have been removed. The advanced student responses were compared (using Student's t -tests) with the introductory student group responses on the analyze activity (table 4a ).

We originally set out to identify which activity would lead to higher learning gains, but the finding that both in-class activities were beneficial for improving students’ experimental design ability may be even more interesting. It suggests that there may not be one “right” way to teach a skill as complex as experimental design. Because the students completed these activities in groups, we cannot disaggregate their responses by demographic characteristics for this study. However, it would be an interesting area for future research to see whether certain students learn better with an analysis or evaluation task than with a synthesis-level task.

Although we found no significant differences in E-EDAT score gains between the two activities, we did observe a difference of 0.5 point between the design activity and the passive lecture. Although this number is small, it is statistically significant, and we believe that it demonstrates a meaningful improvement. We were not expecting a large difference between the pre- and posttest scores for an intervention of only 30 minutes, and, given the time spent on the task, this gain is similar to what has previously been reported (Sirum and Humburg 2011 ).

There are at least two possible explanations for why only the design activity was significantly more effective than the lecture: There may be closer alignment of the design task with the assessment instrument, or the students’ ability to apply concepts to new situations may increase after completing a synthesis-level activity. The development of additional validated tools to assess student understanding of experimental design would allow us to differentiate these possibilities. The students completed either the design activity or the analyze activity in this study, but a possible area for future exploration would be to determine whether there could be a synergistic effect resulting from students completing both activities. In particular, it may be interesting to investigate whether there is an order effect, such that we see higher gains in students who complete the analyze activity before the design activity or vice versa.

We also uncovered several novel accurate and inaccurate student conceptions about experimental design. The structure of the two in-class activities probably affected which accurate and inaccurate conceptions were identified. The increased number of vague responses on the design activity, for example, may be a result of the more open-ended nature of the questions on that activity. Interestingly, the students in the design activity group were much more likely to consider the advantages of a large sample size, as well as the cost and logistical challenges associated with using a large sample size; the design activity was better than the analyze activity at eliciting these accurate conceptions concerning sample size. Furthermore, the design activity responses contained significantly fewer inaccurate conceptions about sample size than did the analyze activity responses. This suggests that the process of interpreting data may reveal more inaccuracies in students’ ways of thinking than does the act of designing an experiment, which indicates that the analyze activity may be more effective at eliciting inaccurate student ideas. Alternatively, designing an experiment may help move students toward more accurate conceptions. Questions that prompted students to consider the purpose of repeating an experiment elicited significantly more inaccurate conceptions than did those focused on sample size. This may be because of the difficulty of the topic, the wording of the prompt, or because students are not often asked to consider why they should repeat an experiment (e.g., in “cookbook” lab courses).

Much of the research focused on gaining insight into student understanding of biological concepts has generally relied heavily on identifying student misconceptions (Nelson 2008 ), which are defined as scientifically inaccurate ideas. Although the identification of misconceptions has been valuable for the biology education community, recent findings indicate that students’ misunderstandings of concepts cannot simply be described as misconceptions; rather, there is a continuum of student understanding known as a learning progression (Alonzo and Gotwals 2012 ). Learning progressions are research-based models of how core ideas are formed over time, often focused on students’ ways of thinking (Songer et al. 2009 , Duncan and Rivet 2013 ). Learners develop their understanding of complex biological concepts in stages that build on each other. It has been shown that scientifically inaccurate answers may be useful for students at early stages of learning, allowing them to partially understand a topic (Duncan and Rivet 2013 ). Although there are a few studies in which inaccurate but possibly productive student conceptions related to experimental design have been identified (Kanari and Millar 2004 , White et al. 2009 , Colon-Berlingeri and Burrows 2011 ), we currently lack a learning progression for undergraduate biology majors’ understanding of experimental design, and we hope that findings from our study can help move the field toward this goal.

Specifically, our investigation of the differences between introductory and advanced students could be useful for developing a learning progression. Overall, the advanced students held significantly more accurate conceptions and fewer inaccurate conceptions for sample size and repeating an experiment than did the introductory students. There are a few surprising observations about the differences between the advanced and introductory students’ ideas concerning the significance of sample size and repeating an experiment. First, the advanced students were more likely to hold the inaccurate conception that repeating an experiment leads to more accurate data. A possible reason for this is that the advanced students were actually thinking correctly about how repeating an experiment could lead to a more accurate interpretation of the results but simply used the phrase “more accurate data” to convey the idea that repeating an experiment will lead to increased confidence in one's interpretation of the data. This difference in language, although it is subtle, is important but may not be clear to these students. Intriguingly, this inaccurate idea of “more accurate data” could be viewed as a productive misunderstanding in a student learning progression of experimental design (Duncan and Rivet 2013 ). As students move from not thinking about how sample size affects the quality of the data to thinking that a larger sample size often leads to a more accurate interpretation of those data, perhaps thinking incorrectly about accurate data is an indication that the students are on the path to building a deeper understanding.

Although the advanced students were more likely to provide accurate justifications for experimental design elements, we think that it may be primarily because of an increased proficiency with statistics and may not be reflective of improved understanding of how biological variation influences experimental design. The vast majority of the introductory and advanced students who mentioned anything related to variation discussed “unique individuals” or “mutants,” as opposed to the variation that exists on a natural continuum (data not shown). Although we do not have an explanation for why these biology students’ did not think about inherent variation in a population, our study suggests that this concept may need to be more explicitly taught in the biology classroom.

Although it does not provide a complete picture, this study is an important first step toward revealing the types of conceptions that students hold about experimental design. The underlying reasons for which students hold these inaccurate conceptions remain to be explored. Is it a result of how we teach experimental design in lab courses—with a very small sample size and typically never repeating experiments? Do students not understand inherent variability in the population because we often present the data as averages, and undergraduate students rarely see raw data? Do students understand the underlying biological principles but have linguistic difficulties describing precise aspects of experimental design (e.g., the difference between “decreasing the effect of outliers” and “decreasing outliers”) that make their responses inaccurate? Future directions for this research include developing research tools that target these specific inaccurate conceptions and using think-alouds and interviews to more deeply probe student understanding.

It is possible that the present results underrepresent the extent to which students hold inaccurate conceptions about experimental design. We relied on the students to come up with an inaccurate conception, as opposed to asking a specific question about the inaccurate conception, so it is possible that some of the students may have had inaccurate conceptions that they did not write down. The students also worked in groups, so they had the opportunity to discuss their responses with each other. They were able to ask the TAs and the instructor questions about the activities during the class session, so, perhaps, some of their inaccurate conceptions were clarified in class. Although our work is an important first exploration into possible conceptions that students may harbor, more work needs to be done to determine how prevalent these conceptions are for undergraduates.

There are a few limitations to this study that necessitate caution in generalizing the results. First, different instructors taught the lecture course and the course in which the activities were administered, so an instructor effect could have affected the difference we saw between the design activity and lecture students on the E-EDAT. Next, we collected data from the introductory students through in-class, handwritten worksheets, whereas we collected data from the advanced students though online questions. The questions were identical, but the method of delivery was different, which could have influenced the results. Finally, although we anticipate that we would obtain similar results with a different population of students, especially because we controlled for student ability in our regression model, collecting only one set of data is a limitation but is not uncommon for educational research. Determining the impact of these activities in different student populations is an interest of ours, and we encourage others to use our activities and the E-EDAT to see whether they obtain similar gains in different contexts.

In this article, we have presented two in-class activities that instructors can use to teach experimental design, a modified EDAT (E-EDAT), and a rubric to assess students’ ability to design an experiment and justify their reasoning. This study also provides novel insight into how students think about specific elements of experimental design, which could be the basis for building a learning progression of undergraduate thinking about experimental design. Much work still needs to be done before we can begin to model what the learning progression may be, but we believe that this study is an important step in our own learning progression of understanding student thinking about experimental design.

This work was supported in part by National Science Foundation grant no. DUE-0942215, awarded to AJC and MPW, and a Washington Research Foundation-Hall fellowship awarded to CLW-C. We would like to thank the University of Washington (UW) students who participated in the study and the UW faculty members who were supportive of these efforts. Special thanks go to John Parks for his invaluable support administering the activities, the UW Biology Education Research Group for helpful discussions, and Sarah Eddy for statistical advice.

Google Scholar

Google Preview

Supplementary data

Month: Total Views:
December 2016 5
January 2017 33
February 2017 22
March 2017 30
April 2017 15
May 2017 20
June 2017 15
July 2017 36
August 2017 32
September 2017 34
October 2017 30
November 2017 15
December 2017 78
January 2018 80
February 2018 98
March 2018 155
April 2018 90
May 2018 161
June 2018 108
July 2018 153
August 2018 164
September 2018 106
October 2018 100
November 2018 109
December 2018 82
January 2019 74
February 2019 87
March 2019 96
April 2019 111
May 2019 130
June 2019 85
July 2019 111
August 2019 158
September 2019 125
October 2019 85
November 2019 120
December 2019 57
January 2020 89
February 2020 106
March 2020 70
April 2020 95
May 2020 40
June 2020 59
July 2020 78
August 2020 91
September 2020 122
October 2020 86
November 2020 67
December 2020 55
January 2021 70
February 2021 84
March 2021 87
April 2021 69
May 2021 66
June 2021 39
July 2021 35
August 2021 47
September 2021 40
October 2021 38
November 2021 50
December 2021 38
January 2022 96
February 2022 49
March 2022 50
April 2022 57
May 2022 48
June 2022 40
July 2022 44
August 2022 69
September 2022 87
October 2022 67
November 2022 64
December 2022 54
January 2023 66
February 2023 54
March 2023 43
April 2023 46
May 2023 35
June 2023 43
July 2023 33
August 2023 41
September 2023 54
October 2023 64
November 2023 55
December 2023 45
January 2024 61
February 2024 66
March 2024 87
April 2024 41
May 2024 42
June 2024 45
July 2024 59
August 2024 36
September 2024 23

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1525-3244
  • Copyright © 2024 American Institute of Biological Sciences
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Plan to Attend Cell Bio 2024

Change Password

Your password must have 8 characters or more and contain 3 of the following:.

  • a lower case character, 
  • an upper case character, 
  • a special character 

Password Changed Successfully

Your password has been changed

  • Sign in / Register

Request Username

Can't sign in? Forgot your username?

Enter your email address below and we will send you your username

If the address matches an existing account you will receive an email with instructions to retrieve your username

CREATE Cornerstone: Introduction to Scientific Thinking, a New Course for STEM-Interested Freshmen, Demystifies Scientific Thinking through Analysis of Scientific Literature

  • Alan J. Gottesman
  • Sally G. Hoskins

Search for more papers by this author

Address correspondence to: Sally G. Hoskins ( E-mail Address: [email protected] ).

The C onsider, R ead, E lucidate hypotheses, A nalyze and interpret data, T hink of the next E xperiment (CREATE) strategy for teaching and learning uses intensive analysis of primary literature to improve students’ critical-thinking and content integration abilities, as well as their self-rated science attitudes, understanding, and confidence. CREATE also supports maturation of undergraduates’ epistemological beliefs about science. This approach, originally tested with upper-level students, has been adapted in Introduction to Scientific Thinking, a new course for freshmen. Results from this course's initial semesters indicate that freshmen in a one-semester introductory course that uses a narrowly focused set of readings to promote development of analytical skills made significant gains in critical-thinking and experimental design abilities. Students also reported significant gains in their ability to think scientifically and understand primary literature. Their perceptions and understanding of science improved, and multiple aspects of their epistemological beliefs about science gained sophistication. The course has no laboratory component, is relatively inexpensive to run, and could be adapted to any area of scientific study.

INTRODUCTION

We think a significant number of students lose interest in studying science early in their college careers, because many science curricula do not promote open-ended discussion, critical analysis, and creative study design—activities that characterize science as it is practiced. We thought that one way to attract and retain students who might be considering science studies would be to give them an opportunity to develop their reading and analytical skills and gain a realistic sense of scientific thinking as soon as they started college. A C onsider, R ead, E lucidate hypotheses, A nalyze and interpret data, T hink of the next E xperiment 1 (CREATE)-based course focused on scientific thinking, using a novel selection of readings whose analysis did not require years of content mastery, would, in principle, give freshmen a chance to engage deeply in activities characteristic of actual science practice. We hypothesized that such an experience could have a positive influence on students’ scientific abilities, their attitudes toward science, and their understanding of the research process early in their academic careers. To test this idea, we developed a new elective, Biology 10050: Introduction to Scientific Thinking.

Biology 10050 was developed as an adaptation of an upper-level course, Biology 35500: Analysis of Scientific Literature with CREATE. That course, offered at City College of New York (CCNY) since 2004, aims to demystify and humanize science through intensive analysis of primary literature. In Biology 35500, “modules”—sets of journal articles published sequentially from single laboratories—are the focus for an intensive elective. Students learn a new set of pedagogical approaches, including concept mapping, cartooning of methodology, figure annotation, use of templates to parse experimental logic, and design of follow-up studies ( Hoskins and Stevens, 2009 ; Hoskins, 2010b ). These methods are applied first to an article from the popular press and then in the analysis of a series of primary literature papers that follow a particular scientific question (e.g., “How do axons find their targets in the embryo?,” “How is axis polarity maintained during regeneration?”). By examining module articles in a stepwise manner, we develop a “lab meeting” atmosphere in the class, with experimental findings discussed as if they had been generated gradually by the students themselves. Within individual articles, every figure or table was analyzed with recognition that each specific question being addressed or question asked created a data subset that contributed to the major finding of the paper.

In CREATE class sessions, multiple aspects of study design are scrutinized closely as we work backward from data in each figure and table to reconstruct details of the particular experiment that generated those data before we analyze the findings. In the process of examining specific experiments and their outcomes, we repeatedly consider questions fundamental to much research, (e.g., “What is n ?,” “How was the sample selected?,” “What controls were done and what did each control for?,” “How do the methods work?,” “What is the basis of ‛specificity’ in staining, binding, or expression?,” “How convincing are the data?”). In addressing such questions, students gain insight into the design and interpretation of research beyond the individual study under analysis. Because methods are examined in terms of fundamental biological and chemical properties (e.g., “What makes antibodies ‛specific’?,” “Do antibody probes bind the same way that riboprobes do?,” “How can you tell whether a particular stem cell undergoes division after injury to an organism?”), students review fundamental content from previous course work in a new context. By considering “evolution of methodology” (e.g., differential screening of cDNA libraries vs. gene chip analysis vs. RNAseq approaches; gene knockout vs. RNA interference) students become aware of the pace of technique development and how the range of tools available may influence the nature of questions asked. In this way, Biology 35500, the original CREATE course, involves both close analysis of papers presented in their original sequence as an individual “module” but also consideration of broader nature of science issues. For example, discussion centered on the fact that what used to be considered “junk” DNA is now recognized as having a key role in microRNA pathways illustrates the malleability of scientific knowledge.

After completing analysis of each paper, and before moving to the next paper in the series, students create their own follow-up experiments, thereby building experimental design skills, as well as awareness that a given study could, in principle, move forward in a variety of ways. Students’ proposed follow-ups are vetted in a grant panel exercise designed to mimic activities of bona fide panels (see Hoskins et al ., 2007 ). In turn, these sessions lead to discussion focused on broader scientific issues, including interlaboratory competition, peer review, and the factors that might influence principal investigator (PI) decisions about what direction to take next.

Late in the semester, students, as a class, develop a list of 10–12 questions for paper authors. These are emailed as a single survey to each author (PIs, postdocs, graduate students). Many authors reply with thoughtful comments about their own paths to science, their motivations, and their lives beyond the laboratory. Discussion of authors’ varied responses complement the in-class data analysis with insight into the lives and motivations of “the people behind the papers.”

Our upper-level course led to gains in students’ content integration and critical-thinking ability, as well as in their self-assessed learning gains ( Hoskins et al ., 2007 ). We also found that undergraduates’ self-assessed science abilities, attitudes, and epistemological beliefs changed during the CREATE semester ( Hoskins et al ., 2011 ). Upper-level students’ postcourse interviews (see Tables 1 and S1 in Hoskins et al ., 2007 ), as well as conversations with alumni of Biology 35500 (“You have to do a version of this for freshmen—it changed how I read everything” and “If I had known sooner that research wasn't boring, I might have joined an undergrad research program”) inspired us to consider adapting upper-level CREATE for freshmen.

A related motivation for developing the CREATE Cornerstone course was that the biology department at CCNY, like its counterparts elsewhere, loses many would-be majors during the early years of the biology curriculum. Some students who start with the intention of declaring a biology major do not follow through. Others who do choose biology later change majors and leave science altogether, with multiple factors likely playing a role. Students may be poorly prepared for college-level science, feel overwhelmed by the amount of new information covered in the introductory-level courses ( Seymour and Hewitt, 1997 ), or be discouraged by textbooks’ depiction of biology as a largely descriptive science ( Duncan et al ., 2011 ). Nationwide, some students get the impression from the laboratory components of introductory biology, chemistry, or physics classes that lab work is routine, predictable, and boring.

We felt that a CREATE Cornerstone course focused on scientific thinking could support and build students’ science interest at an early phase of their academic careers. In part, adapting upper-level CREATE for freshmen might benefit students by teaching them a variety of techniques (the CREATE toolkit; Hoskins and Stevens, 2009 ) that make complex material more accessible and understandable. At the same time, the course seeks to provide students with an inside look at the workings of real-world biology research labs and the diversity and creativity of the scientists who work in them. We hypothesized that students in such a course would become more adept at thinking critically about scientific material and at designing and interpreting experiments—key strategic foci of the CREATE approach. In addition, we hypothesized that students would gain in their abilities to critically analyze scientific writing, deepen their understanding of the nature of science, and develop more mature epistemological beliefs about scientific knowledge. We also suspected that some students who had not considered careers in research, or others who had but quickly rejected the idea, would consider research more positively as their college education progressed.

Introduction to Scientific Thinking is a three-credit, one-semester elective for first-year college students with a declared interest in science, technology, engineering, and math (STEM) disciplines at the CCNY, a minority-serving institution. The course meets twice-weekly for 75 min/session, and on our campus is taken before the introductory-level courses in any of the basic sciences. The goal is to develop the science-related reading and analytical skills of freshmen by using the CREATE strategy to critically evaluate a number of recent and ongoing research studies. Ideally, the experience should also encourage students to persist in STEM disciplines, participate in undergraduate research experiences (UREs) in later years, and consider research as a career choice.

At CCNY, first-year students cannot declare a biology major. The course is thus aimed at presumptive biology majors and in principle could be taken concomitantly with the standard introductory biology (or other science) course. On campuses where students can or must declare a major in the first year, this course would be appropriate for students who evince interest in biology studies. The data reported here address changes in Biology 10050 students’ critical-thinking/experimental design abilities and in their attitudes and beliefs about science. The question of student persistence in STEM and participation in undergraduate research projects will be tracked in upcoming semesters.

METHODS AND ASSESSMENT TOOLS

Participants in this study were first-year students at CCNY who enrolled in the semester-long Biology 10050: Introduction to Scientific Thinking course during Fall 2011 and Spring 2012. In each semester, at the first class session, students were invited to participate anonymously in our study on a voluntary basis that had no bearing on class grade. Precourse data were collected during the first few classes and postcourse data in the final class session of the semester. All participating students were asked to devise a “secret code” number known only to them and to use this code on all surveys. Identifying surveys in this way allowed us to compare individual and group scores pre- and postcourse, while preserving student anonymity ( Hoskins et al ., 2007 ).

Critical Thinking Assessment Test (CAT).

Students in the Fall cohort of Biology 10050 completed the CAT ( Stein et al ., 2012 ). In the CAT, which is a reliable and valid test of critical thinking, students spent 1 h reading a number of informational passages and writing responses to a variety of prompts asking them to evaluate the information and draw conclusions. The same test was taken again at the end of the semester. The CAT tests were graded and analyzed statistically (Student's t test) by a scoring team at Tennessee Tech University, where this survey was created.

Experimental Design Ability Test (EDAT).

Students in both cohorts of Biology 10050 also completed the EDAT, the reliability and validity of which have been established by the EDAT developers ( Sirum and Humburg, 2011 ). In the EDAT, students were presented with a claim and challenged to “provide details of an investigative design” and indicate the evidence that would help them decide whether to accept the claim. Students were given 15 min to respond to a written prompt that described the assertion. Precourse and postcourse versions of the EDAT present different scenarios. Precourse, students read a paragraph presenting the claim that the herb ginseng enhances endurance; postcourse, the selected text alleged that iron supplements boost memory. The EDAT survey was scored separately by two investigators following the scoring rubric created and explained in Sirum and Humburg (2011 ). After the individual scoring, any discrepancies were discussed and reconciled. Tests for statistical significance were performed using the Wilcoxon signed-rank test ( http://vassarstats.net/index.html; Arora and Malhan, 2010 ). Effect sizes ( Cohen, 1992 ; Coe, 2002 ) were also determined.

Survey of Student Self-Rated Abilities, Attitudes, and Beliefs (SAAB).

To investigate students’ reactions to the CREATE course, we asked them to complete the SAAB. In this Likert-style survey, students reported their degree of agreement on a 5-point scale (range: strongly disagree to strongly agree) with a series of statements concerning their attitudes, self-rated abilities, and beliefs about analyzing scientific literature; the research process; the nature of scientific knowledge; and scientists and their motivations. The surveys were identical precourse and post course, and used statements whose derivation and description is described in Hoskins et al. (2011 ). Students were given 20 min to complete the survey. For statistical analysis, all response scores were aggregated into their appropriate categories (see Supplemental Material for derivation of categories) and changes precourse to postcourse were analyzed for statistical significance using the Wilcoxon signed-rank test. Because these data and those of the EDAT are nonparametric (a score of “4” is not twice as good as a score of “2,” for example) and noncontinuous, the signed-rank test was deemed an appropriate analytical tool ( Arora and Malhan, 2010 ).

The SAAB data for the Biology 10050 class include pooled results from the Fall and Spring sections (18 and 13 participating students, respectively). Data collected using the same survey, administered in the same manner, were also obtained from one contemporaneous section of the upper-level CREATE course (Biology 35500, 21 students; two meetings per week for 100 min/session). Additionally, the SAAB survey was administered to volunteers in a course in Organismic Biology (general physiology, 23 students; one 100-min lecture and one 3.5-h lab session/wk), none of whom had taken a CREATE class. This group was not a matched-control population (students were not freshmen). Rather, data from this cohort of students provided insight into potential changes in attitudes, abilities, and epistemological beliefs that might happen naturally during the course of a semester in a non-CREATE science class. The CREATE classes were taught by the same instructor (S.G.H.); the Organismic Biology class was taught by a colleague not otherwise involved in this study. Both instructors were experienced at teaching their respective courses.

Student Comments on Author Emails.

To gain insight into students’ reactions to author email responses, we assigned students to read and annotate the responses as they were received. Students included the responses in their notebooks/portfolios, with marginal notes indicating which aspects of each response they found most surprising and/or interesting. In the Spring session of Biology10050, we included a question on a late-semester (in-class, open-book) exam, asking students whether the emails changed their ideas about science research or scientists. We compiled responses and analyzed them for repeated themes.

Student Participation.

The CREATE study was approved by CUNY Institutional Review Board (Exemption category 1 and 2). Of the students in Bio 10050, 69% were female and 59% were members of minority groups currently underrepresented in academic science. Students were invited, in week 1 of class, to anonymously participate in an education study with the goal of “improving undergraduate education in science.” Participation was optional and the instructor noted that student participation or nonparticipation had no bearing on course grade or any other relationships with CCNY. There were no points or extra credit awarded for participation. We think that students who participated were motivated by the chance to take part in a science education study and/or to be part of a scientific experiment.

CURRICULAR DESIGN

Adapting create for freshmen.

In the original (upper-level) CREATE course, the class studied, sequentially, a series of papers published by a single lab that tracked the development of understanding in a particular field of scientific inquiry (e.g., how embryonic retinal axons find their targets in the brain; how planaria maintain positional information during regeneration). For the freshmen, we changed the types of articles studied, using popular press articles and a wider range of scientific literature, but applied the same overall CREATE teaching/learning strategies. The freshmen initially read and analyzed numerous popular press stories based on journal articles. We also read a variety of newspaper and magazine pieces describing scientific investigations or researchers. These warm-up exercises, used more extensively for the freshmen than in upper-level CREATE, started students toward developing the skills they would need for reading and analyzing primary literature later in the semester. All the readings (in all CREATE courses) are actual texts as originally published. In some cases, we read only parts of papers, but we did not rewrite or simplify any of the material. The freshmen ultimately read a pair of papers published in sequence that addressed a subject—the ability of infants to recognize and judge the social actions of others—related to a number of the shorter readings.

Toward the end of the semester, the freshmen, as a class, composed a list of 10–12 questions about the studies we had read, “research life,” and the researchers themselves. These questions were emailed as a single survey to each paper's authors, with a cover letter explaining our approach and inviting a response. This key strategic component of CREATE courses seeks to shift students’ often-negative preconceptions about what research/researchers/research careers are like. Many of the scientist-authors responded with comprehensive answers related to their personal and professional lives, their contributions to the work that we studied, and their scientific experiences as their careers developed. The generosity of authors in preparing thoughtful responses is especially valuable and memorable, according to our students.

CREATE Cornerstone Objectives and Selected Exercises

Students learned to use CREATE tools, including concept mapping, paraphrasing, cartooning, annotating figures, applying templates to parse experimental logic, designing follow-up experiments, and participating in grant panels ( Hoskins and Stevens, 2009 ). The CREATE techniques aim to sharpen students’ analytical skills and build critical-reading habits that can be used in new situations. These approaches also build students’ metacognition—the ability to track their own understanding ( Tanner, 2012 ). To construct a concept map successfully, for example, students need to understand individual ideas and discern the relationships between them. To sketch a cartoon that shows what took place in the lab to generate the data presented in a particular figure, students must make sure they understand the relevant methodology. We applied concept mapping and cartooning along with other CREATE tools to a novel combination of readings. Articles selected for Biology 10050 were chosen because of their topicality, relatively simple methodology, and aspects of each that provoked controversy, exemplified the role of controls, and/or highlighted important distinctions between data and their interpretation. Goals for the Cornerstone students included learning: to read with skepticism, to critically analyze data and generate alternative interpretations, to recognize the malleability of scientific knowledge, and to develop and evaluate experiments with particular emphasis on controls and their roles. A final goal was for students to develop a more realistic view of research and researchers than the one often promoted in popular culture.

Developing an Appropriately Skeptical Reading Style

How, in principle, would one determine “the most appealing sounds in the world,” whether babies “automatically” swipe iPhones expecting a response, or whether “love” is experienced by phone owners (as claimed by )?

What evidence would you find convincing?

What studies would you do if you were interested in such issues?

How did Lindstrom make such determinations?

On what basis do the neuroscientists challenge the stated conclusions?

Do the ’ edits shift the message of the original letter to the editor? If so, how?

Taking all of the readings and analyses together, what do you conclude about iPhone “love”? Why?

As they learned to use and apply CREATE tools, students accustomed to reading and passively accepting the information encountered in their textbooks, on the Internet, or in newspapers began to recognize that just because something is published does not mean it is beyond criticism ( Hoskins, 2010a ).

Data Analysis—Developing Alternative Interpretations

Can we conclude that writing about one's test concerns leads to less choking on exams? How solid is that conclusion?

If we had generated these data ourselves, could we publish now? Why? Why not?

Are any alternative interpretations of the data plausible?

Through discussion, students proposed a third “write about anything” group as an additional control. We next provided the paper's figure 2 and associated narrative. The authors had added a third group that was instructed to write about “an unrelated unemotional event.” Students saw that the investigators had added the same control group they had asked for, extending the study to resolve the “writing-only” issue. This bolstered students’ sense that they were “thinking like scientists.”

Using Sketching to Clarify Design—Developing Alternative Interpretations

One paper's abstract alone served as the focus for a class. The abstract for “Empathy and Pro-Social Behavior in Rats” ( Bartal et al ., 2011 ) outlines five individual experiments. As homework, students cartooned each experiment, all of which tested conditions under which one rat would open a transparent plastic container that restrained a second rat. Students defined the specific hypothesis being addressed in each study, the controls needed in each case (none are included in the abstract), the conclusions stated, and possible alternative interpretations.

After comparing cartoons and resolving discrepancies, the class considered whether the behaviors observed were necessarily signs of “empathy.” Might there be other explanations? Working in small groups, students proposed multiple alternatives that could in principle account for rats’ apparently helpful behavior: inquisitiveness, a pheromone signal, an aversion to squeaky distress calls, and the like. The published paper provoked substantial interest and some controversy, as reported in Nature ( Gewin, 2011 ). We reviewed the published critique, and students found that some of “our” alternative interpretations had also been raised by top scientists in the field, again recognizing that their own thinking was scientific. Students also noted that even peer-reviewed work published in Science , where the original article appeared, can evoke intelligent criticism, and that scientists do not always agree.

Established Knowledge Can Change

A provocative set of readings discuss the discovery that peptic ulcers have a bacterial origin ( Associated Press, 2005 ; Centers for Disease Control and Prevention, 2005 ). It took the PI's ingestion of Helicobacter pylori , the suspected pathogen, hardly a canonical step in “The Scientific Method,” to generate the conclusive data. This nature of science story illustrates how established scientific knowledge—that ulcers had psychological not bacteriological etiology—can be wrong. Reading the description of Dr. Barry Marshall being met with scorn at meetings where he initially presented his unconventional hypothesis, students saw that novel (and possibly revolutionary) ideas may not be instantly welcomed. This recent scientific development highlighted the personal factors and genuine passion that can underlie science, making the point that as scientific study continues, some established ideas of today will inevitably be supplanted. The ulcer readings also illustrated the value of a healthy skepticism even about “obvious” facts, such as that the stomach's acidity would kill all bacteria within.

Introducing Experimental Design and Peer Review

At the conclusion of many of the discussion units, the freshmen proposed follow-up experiments. The challenge: If your research team had just performed the work we reviewed, what would you do next? Each student independently devised two distinct follow-ups as homework. Three or four times during the semester, students formed teams of four to act as grant panels charged with assessing the studies designed by their peers. The first time this was done, we challenged the panels to establish appropriate funding criteria before looking at the proposed studies. Discussions of criteria led to consideration of evolution, evolutionarily conserved mechanisms, and the meaning of model systems, as many groups only wanted to fund work that is “relevant to humans.” We also discussed realities of reputation and how it may affect funding success. Some groups sought to fund “established investigators who have already published in the field,” leading other students to question how anyone gets started in research. Such discussions build students’ understanding of the sociological context of science.

After criteria had been discussed, each student submitted one of his or her experiments, sans name or other identifier, into the grant pool. The instructor then presented each proposed follow-up study to the class without evaluative comments. When the panels subsequently conferred to rank the proposed experiments, students thought critically about the work of their peers, debating and defending their judgments in the sort of open-ended give-and-take that characterizes science as it is practiced. There is no single correct answer to the question: “Which of the ≈25 proposed studies is the best?” Students were thus freed from the pressure to be right, or to divine, somehow, what the instructor's opinion might have been.

Using Multiple Popular Press Articles to Build Toward a Mini-Module of Primary Literature

We developed students’ critical-reading skills through repeated practice with short articles. In the process, we pointed out multiple aspects of scientific thinking, and introduced the subject matter knowledge that would be needed in the later reading of primary research reports exploring infant cognition. Early in the semester, we read and analyzed “Babies Recognize Faces Better Than Adults, Study Says” ( Mayell, 2005 ) and a popular press account of “Plasticity of Face Processing in Infancy” ( Pascalis et al ., 2005 ), a study that tested the memories of 6- to 9-mo-old infants. Students discovered gaps in the popular press version (no information on “ n ” or gender distribution of infant subjects, and unclear methodology, for example). We added additional information from the Proceedings of the National Academy of Science paper as discussion required it (for details of teaching with this paper, see Hoskins, 2010b ). Exercises of this sort challenge students to read actively and seek key missing information (e.g., “How many female vs. male babies were studied?” or “Exactly how was the visual training done?”) that is essential to their evaluations.

Two additional popular press stories ( Talbot, 2006 ; Angier, 2012 ) and a study on babies’ perception of normal versus scrambled facial features ( Maurer and Barrera, 1981 ) were critically analyzed in other class sessions. Discussions covered broader questions including: How can you tell whether a baby who is too young to talk notices something novel, and why might it matter? Because one of the studies was funded by the National Institutes of Health, we considered how a real-life grant panel might evaluate the work's health relevance. Students raised the possibility of using methods from the infant studies for early detection of neurological abnormalities, such as autism, and discussed the degree to which environmental enrichment activities could be considered “health related.” These readings and discussions set the stage for the analysis of two full-length papers.

“Social Evaluation by Preverbal Infants” ( Hamlin et al ., 2007 ), examines 6- and 10-mo-old babies’ abilities to discriminate between and react to helpful, neutral, and hindering “behaviors” by observed “characters.” The babies witnessed scenarios in which experimenter-manipulated blocks of wood bearing large googly eyes interacted on a hill. One sort of block (e.g., red circle) would move partway up the hill, but slide down before reaching the summit. Another block (e.g., yellow square) might, in a subsequent “episode,” seemingly help it move up. A third block (blue triangle) might hinder upward movement. A series of control experiments explored the need for eyes and other animating traits on the blocks. Other controls investigated whether babies preferred particular colors/shapes, or upward motion to downward, rather than seemingly helpful interactions (which moved the target block up) to hindering ones (which moved it down).

We started by providing the introduction, first figure, and associated text for initial analysis. As before, we did not tell the students what additional experiments were done. Through class discussion, students developed their own questions and alternative interpretations (e.g., “maybe the babies aren't judging behavior; they just like yellow better than blue”). As in the discussions of “Babies recognize faces” and “Writing about testing…,” only after the students raised particular issues did we provide sections of the paper with the relevant additional information and control experiments. After analyzing the full paper, students designed follow-up experiments, vetted them in a grant panel, and then read and analyzed the authors’ actual next paper.

“Three-Month-Olds Show a Negativity Bias in Their Social Evaluations” ( Hamlin et al ., 2010 ) was concerned with younger babies’ reactions to similar social interactions. This second paper used many of the same methods as the first, facilitating students’ ability to read the material. Interestingly, the later work produced a different result, finding that younger babies were averse to hinderers but (unlike their “elders”) did not show any particular preference for helpers. As the authors discussed possible evolutionary implications of their work, we were able to return to a critical theme that had arisen earlier in the semester, in the “model systems” discussion.

Assessment in Biology 10050

The study presented here is based on tools (CAT, EDAT, SAAB) administered anonymously pre- and postcourse. To evaluate students’ understanding of course material as a basis for determining grades, we assess students in all CREATE classes using a combination of in-class activities; writing assignments; open-book, open-notes exams; and class participation. There is no assigned textbook, but students can consult during exams the notebooks/portfolios they compiled throughout the semester (see Hoskins et al ., 2007 , for details). We find that open-book testing changes the classroom atmosphere and relieves students from the pressure to study primarily by memorizing, making it easier for them to focus on critically evaluating scientific writing and explaining their insights. With the exception of analysis of one exam question (see Student Reactions to Emails , below), the classroom assessments were not used as data for this study.

CAT Outcomes

Students in the Fall CREATE Cornerstone course took the CAT ( Table 1 ; Stein et al ., 2012 ), and tests were scored by a trained team at Tennessee Tech University, where this test was created. Biology 10050 students’ overall CAT scores improved significantly postcourse versus precourse, with a large effect size (0.97). While there is overlap between categories, CAT questions address four main areas. Overall, the largest gains made by CREATE Cornerstone students were on CAT questions that tested “evaluating and interpreting information.” Students also made gains on questions involving problem solving, creative thinking, and/or effective communication (the other three subcategories addressed by the CAT). While these findings must be interpreted with caution due to the small sample size, they suggest that students in the pilot CREATE Cornerstone course made substantial gains in their ability to read, understand, and critically analyze information, and that such gains are transferable to the content domain addressed by the CAT test, which was not related to the material covered in the course.

CAT test results

Critical Thinking Ability Test (CAT)PrecoursePostcourse SignificanceEffect size
Mean (SD)9.6 (2.5)13.0 (4.4)15 < 0.050.97

a The CAT (duration 1 h) was administered pre- and postcourse to the Fall 2011 Biology 10050 class and scored at Tennessee Tech University. We present the overall score for the test, precourse vs. postcourse. Fifteen students took both tests. Significance: Student's t test.

EDAT Outcomes

Students in both Fall and Spring CREATE Cornerstone classes completed a pre- and postcourse EDAT that was scored using a 10-point rubric ( Sirum and Humburg, 2011 ). Results are summarized in Table 2 . Scores suggest that the first-year students gained significantly in experimental design ability over the semester, citing more components of an “ideal” experimental design postcourse than precourse.

EDAT results: mean and SD

EDAT testPrecoursePostcourse SignificanceEffect size
Mean (SD)4.3 (2.1)5.9 (1.4)28 < 0.010.91

a Pool of two classes of Biology 10050: n = 28 total. Statistical significance tested with Wilcoxon signed-rank test. Scores can range from 0 to 10, per the EDAT rubric (see Sirum and Humburg, 2011 ).

SAAB Outcomes

Results from the SAAB surveys for each class are displayed in two groupings in Table 3 . The upper group reflects the items related to students’ self-rated skills and understanding; the lower group shows results for items that reflect students’ epistemological beliefs about science (see Hoskins et al ., 2011 , for a discussion of the derivations of all categories).

SAAB survey outcomes in three student cohorts: freshman CREATE students ( = 28), upper-level CREATE students ( = 19), and mid-level non-CREATE students ( = 23)

Freshman-level CREATE class
CategoryPrecourse mean (SD)Postcourse mean (SD)Significance Effect # Ss
Decoding literature17.3 (3.2)21.9 (3.0)<0.0011.486
Interpreting data14.1 (2.6)15.4 (2.3)0.0080.534
Active reading13.0 (2.4)16.1 (2.3)<0.0011.324
Visualization12.5 (2.8)15.6 (2.1)<0.0011.274
Think like a scientist11.9 (2.2)15.5 (1.6)<0.0011.904
Research in context10.9 (1.5)13.8 (1.4)<0.0012.003
Certainty of knowledge22.1 (2.9)24.3 (3.8)0.0020.666
Ability is innate6.9 (1.4)8.0 (1.6)0.0050.732
Science is creative3.9 (0.8)4.3 (0.7)0.0050.531
Scientists as people2.8 (0.9)3.6 (1.0)0.0040.841
Scientists’ motives3.8 (1.1)4.1 (0.8)ns0.351
Known outcomes3.9 (1.0)4.2 (0.9)ns0.321
Collaboration4.2 (0.6)4.3 (0.8)ns0.141
Upper-level CREATE class
CategoryPrecourse meanPostcourse meanSignificance Effect # Ss
Decoding literature15.5 (2.8)20.3 (2.7)<0.0011.756
Interpreting data13.7 (2.2)16.4 (1.7)<0.0011.394
Active reading13.9 (2.1)16.9 (1.9)<0.0011.504
Visualization13.3 (2.1)16.6 (1.7)<0.0011.744
Think like a scientist13.3 (2.5)16.5 (2.1)<0.0011.394
Research in context13.5 (1.1)14.3 (0.9)0.0370.803
Certainty of knowledge23.0 (2.7)26.1 (2.9)0.0210.826
Ability is innate7.3 (1.9)8.4 (1.5)ns0.652
Science is creative4.1 (0.7)4.6 (0.9)ns0.631
Scientists as people2.5 (1.0)3.9 (0.8)0.0070.441
Scientists’ motives3.8 (1.0)4.3 (0.6)0.0140.631
Known outcomes3.7 (1.0)4.3 (1.1)ns0.571
collaboration4.4 (0.6)4.5 (0.6)ns0.171
Mid-level non-CREATE class
CategoryPrecourse meanPostcourse meanSignificance Effect # Ss
Decoding literature19.6 (4.3)20.4 (3.3)ns0.216
Interpreting data15.3 (2.3)15.7 (2.6)ns0.164
Active reading14.7 (2.3)14.7 (2.4)ns0.014
Visualization14.0 (2.3)14.7 (1.8)ns0.344
Think like a scientist13.8 (2.7)13.8 (2.5)ns0.004
Research in context13.5 (1.1)13.5 (1.4)ns0.003
Certainty of knowledge23.7 (3.3)23.6 (3.2)ns−0.036
Ability is innate7.3 (1.5)7.6 (1.5)ns0.202
Science is creative3.8 (1.2)4.2 (0.7)ns0.421
Scientists as people3.1 (0.9)3.3 (1.1)ns0.031
Scientists’ motives3.9 (0.9)4.0 (1.1)ns0.101
Known outcomes3.9 (1.0)4.0 (0.8)ns0.111
Collaboration4.4 (0.5)4.2 (0.6)ns−0.361

a Responses were tabulated using a 1–5 scale. (1 = “I strongly disagree”; 2 = “I disagree”; 3 = “I am neutral”; 4 = “I agree”; 5 = “I strongly agree”). Some propositions were worded so that an answer reflecting a more mature understanding would get a lower score (“I accept the information about science presented in newspaper articles without challenging it,” for example). These were reverse-scored for analysis.

The Wilcoxon signed-rank test for statistical significance was performed on precourse/postcourse raw data totals for all categories. Category 1–6: self-rated skills and attitude factors; categories 7–13: epistemological factors.

The survey was developed in a previous study of upper-level CREATE students ( Hoskins et al ., 2011 ). Different categories are probed by different numbers of statements (#Ss).

b p values for statistical significance (Wilcoxon signed-rank test). ns = not significant.

c Mean difference/average SD.

d # Ss = number of statements in category.

SAAB results show significant gains made by CREATE Cornerstone students in all six skills and attitudes categories and in the majority (four out of seven) of epistemological categories. Students in the upper-level CREATE course (for which a year of introductory biology, a semester of genetics, and a semester of cell biology are prerequisites) shifted significantly on all skills and attitudes categories, and three of the seven epistemological categories. Students in the mid-level physiology course (for which a year of introductory biology and a semester of genetics are prerequisites), in contrast, did not shift significantly in any category.

Effect sizes help to determine whether statistically significant changes are likely to be meaningful. For skills and attitudes shifts, effect sizes for freshmen were large ( Cohen, 1992 ) in five of the six categories and moderate for “interpreting data.” Effect sizes for upper-level CREATE students in these categories were all large. In this regard, it may be relevant that upper-level students read literature that was substantially more complex and looked closely at more figures during the semester than did the first-year students. It is also interesting to note that the mid-level physiology course included a weekly laboratory, in which data were generated and analyzed, and one experimental design activity.

For epistemological beliefs categories, effect sizes in three of the four categories that shifted significantly in the freshman CREATE group (certainty of knowledge, innate ability, creativity of science) were moderate. The effect size of “sense of scientists as people” was large. Upper-level CREATE students also shifted significantly in this category, but with a smaller effect size, possibly reflecting the fact that many upper-level students were working in labs and had a better sense precourse of what research scientists were like. Upper-level CREATE students also showed significant changes in understanding of the uncertainty of scientific knowledge (large effect size), and of “sense of scientists’ motivations” (moderate effect size).

Both the CREATE courses, but not the mid-level physiology course, sent email surveys to authors of papers and discussed author responses late in the semester. Different material was read and analyzed in each CREATE course; thus, different authors were queried and different responses were received by the two groups. We think it likely that this component of the CREATE courses played a large role in changing students’ opinions about what scientists are like and (for upper-level CREATE students) why they do what they do.

Student Reactions to Emails

On the second exam in the Spring semester, we included a question asking students about their reactions to the author emails, focusing on their preconceptions about “scientists/research careers” and whether the author responses changed these views. We coded all the responses ( n = 15), extracting key themes from each, and summarize below the themes mentioned by four or more students.

The most prevalent response to the emails was students’ statements that, precourse, they had assumed today's researchers were “straight-A” students in college (14/15 responses; 93% of students). The same students (14/15) noted that they no longer believed this to be true, citing several authors who described academic struggles that preceded their eventual success. Thirteen out of 15 students (86%) said that the responses had changed their preconceptions about researchers, and 9/15 (60%) noted that respondents stressed the importance of passion (as opposed to good grades) as a key to research success. Seven out of 15 students (47%) expressed enthusiasm on learning that the responding scientists described a great deal of work-related travel, including international travel. Forty percent of students (6/15) described having held one or more of the preconceptions that 1) scientists were loners or nerds, 2) who lacked social lives, 3) because science consumed all their time. A similar percentage noted that precourse they had assumed all scientists had lofty goals of “helping people,” but they had come to realize that many had more personal goals of satisfying their own curiosity. Five out of 15 students (33%) stated that precourse they had assumed most scientists did not enjoy their jobs, that research was not fun, and that lab life was boring, but they no longer held these views. Five out of 15 (33%) said they were surprised to learn scientists had flexible work schedules, and a similar percentage stated that they had learned from the emails that motivation was very important. Finally, 4/15 (27%) noted their surprise that the authors answered at all.

Genesis of the CREATE Strategy

The CREATE strategy originated as a response to the observation that many upper-level undergraduate biology majors—despite the years spent studying a wide range of scientific topics—were not well-prepared to read and understand primary literature; did not readily “think like scientists,” with an appropriately critical eye; did not see science research as an attractive career choice; and had little or no practical experience mustering their content knowledge to attack novel scientific problems. Discussions with students in other courses over the years, and with other faculty on our campus and elsewhere, revealed that many students believed: research is dull, and lab exercises formulaic and boring ( Luckie et al ., 2004 ); there is a single and eternal right answer to every scientific question ( Liu and Tsai, 2008 ); primary literature is written in a nearly unbreakable code; and scientists themselves are stereotypic nerds or “machinery kind of people” ( Hoskins et al ., 2007 ). Our findings in the pilot CREATE Cornerstone course suggest that these viewpoints can be changed over a single semester through intensive analysis of scientific literature.

Themes Highlighted in Readings

The curriculum examples outlined above illustrate how fundamental features of scientific thinking can be studied in a realistic domain-specific context, which appears to be a key element in developing critical-thinking skills ( Willingham, 2007 ). Students repeatedly thought carefully about control groups—what they “control” for, how they are interpreted, and why they are needed. Multiple studies underscored the importance of careful attention to sample size and selection. In the experiments on infants, for example, students raised issues of possible gender-related behavioral differences, whether postnatal age is comparable between full-term and premature infants, and the like. Students practiced developing alternative interpretations of data and noted that not all conclusions are equally strong. Several studies highlighted the potential for introducing unanticipated bias (see discussion of a possible “Clever Hans” effect in “Babies Recognize Faces Better Than Adults, Study Says” in Hoskins, 2010b ). Students saw that original, interesting, and important investigations are currently ongoing (many readings were published in 2011–2012). Students also recognized that even very early in their academic careers they are capable of reading, understanding, and intelligently criticizing scientific literature, and that research science is neither routine, predictable, or boring, nor something found only in textbooks.

Grant Panels Promote Open-Ended Thinking and Insight into the Nature of Science.

CREATE Cornerstone students made significant gains on the EDAT, which presents a scenario distinct from that of any of the Cornerstone readings. Students’ gains on this test suggest that their general experimental design skills have improved during the semester.

Experimental design skills are honed in class through grant panel activities that focus on follow-up experiments to the studies we analyzed that are designed by the students as homework. These are repeated several times during the semester. Although panels focus specifically on experimental systems under study in class, they likely help students develop a more generalized skill in experimental design and creative thinking. In each panel all students’ experiments are reviewed, and the panels (groups of four students) discuss the merits of each. Early in the semester, some experiments must be culled based on absence of a hypothesis, absence of a cartoon, or general lack of clarity (approximately five of 20 in early panels). In end-of-semester exercises, virtually every experiment meets the basic criteria and can be considered seriously. Statements of hypotheses become clearer, controls stronger, designs and procedures better illustrated, and potential outcomes well anticipated.

Besides likely contributing to the development of students’ experimental design skills, the grant panels provide insights into the nature of science. It becomes evident, as the activity is repeated during the semester that, among the top experiments (typically four or five stand out), the study perceived by a particular panel to be “best” is to some degree a matter of taste. Some students prefer a reductionist approach, others an expansion of the study to encompass additional sensory modalities (e.g., an experiment investigating whether babies learn to recognize faces faster if each face is associated with a different musical tune). Some students focus mainly on experiments aimed at developing treatments for humans (e.g., take genes involved in planarian regeneration and immediately seek their counterparts in mammals). Many of our students are accustomed to “textbook” science where, typically, only the (successful) end points of studies are described, and very little current-day work is featured. The grant panel activity introduces the idea that working scientists likely select their follow-up experiment from a variety of valid possibilities, and that personal styles and preferences could influence such decisions.

Critical-Thinking and Experimental Design Skills—Tools of Science.

A significant number of students show interest in science in high school or before (often significantly before [ Gopnik, 2012 ]), but do not pursue STEM studies at the tertiary level. Either they never consider studying science in college, or they switch out of the field for a variety of reasons in their first or second year ( Seymour and Hewitt, 1997 ; Committee on Science and Technology, 2006 ). At the same time, for students who persist in STEM majors, some of the most creatively challenging and thought-provoking courses—capstone experiences—are reserved for seniors ( Goyette and DeLuca, 2007 ; Usher et al ., 2011 ; Wiegant et al ., 2011 ). We hoped to convey some of the analytical and creative aspects of science at the outset of students’ college careers with a CREATE course designed for freshmen. Providing this training early in students’ academic experience might help students gain skills and develop attitudes that would support their persistence in STEM ( Harrison et al ., 2011 ).

We used the CAT and EDAT assessments to probe the development of students’ abilities as they practiced the literature analysis process. The CAT test focuses on a content domain distinct from that of the CREATE class but challenges students in some parallel ways. Students must determine what data mean, decide which data are relevant, draw conclusions based on their understanding, and explain themselves in writing. Many campuses are using the CAT test for programmatic assessment, comparing scores of freshman with those of seniors, for example. We are aware of only one published study using CAT in a pre/post, single-course situation. First-year students in a semester-long inquiry-based microbiology module at Purdue University, performing hands-on research in an introductory class, make significant CAT gains during the semester ( Gasper et al ., 2012 ). The finding that CREATE Cornerstone students at CCNY similarly made significant gains on this test in a single semester suggests that transferable critical-thinking skills, such as those measured by the CAT, can also be built through classroom activities that do not involve hands-on inquiry labs.

While the small sample size in this pilot study precludes broad conclusions, it is interesting that our students made the largest gains on CAT questions whose solution required “evaluation and interpretation.” Introduction to Scientific Thinking emphasizes looking closely at data, reconstructing the experiment or study that gave rise to the data, and reasoning carefully about the logic of interpretations and the significance of the findings. Students carry out this process in a variety of content domains, engaging in friendly arguments about whether rats are empathic or just noise-averse, whether writing about fears really prevents choking on tests, and what it is that babies might prefer about a yellow square with googly eyes (the color? the shape? the eyes? the “helpful” behavior?). As noted by Stanger-Hall (2012) , close to 80% of U.S. high school seniors performed below the science proficiency level on a recent national standardized test ( National Center for Education Statistics, 2009 ). Among undergraduates, barely more than half the students sampled at 24 institutions made gains in critical thinking during their first 2 yr of college, as measured by the Collegiate Learning Assessment ( Arum and Roksa, 2011 ). These data suggest that current course work in high school and during early college years (when standard introductory science courses are taken by STEM majors) is not promoting substantial development of higher-order thinking and analytical reasoning skills. We find CREATE Cornerstone students’ outcomes on the CAT assessment encouraging in this regard. At the same time, some researchers suggest results of low-stakes tests like the Collegiate Assessment of Academic Proficiency may be influenced by low performance motivation among test takers, because participation in such exercises has no bearing on class grade ( Wise and DeMars, 2005 ). This issue could potentially influence our students’ performance on anonymous assessments. While we have no independent measure of students’ motivation for participating in our study, we believe it likely that, as participants in a novel course, they find the opportunity to be part of a scientific study to be intriguing and a motive to perform well.

The EDAT assessment called on students to think like scientists: analyze a problem, determine evidence required to solve it, and design a properly controlled experiment that could generate the relevant data. Students made statistically significant gains in their experimental design ability, with their postcourse responses mentioning more of the points that experts see as essential to good experimental design ( Sirum and Humburg, 2011 ). In the Cornerstone classroom, students repeatedly proposed and evaluated experimental designs as they participated in multiple grant panels and worked with different student-colleagues. We suspect that these exercises served as a form of practice during the CREATE semester, helping students build competence in their ability to formulate, express, and defend ideas about particular proposed studies ( Ambrose et al ., 2010 ). At the same time, the challenge of producing an experiment that would be singled out by a grant panel for “funding” may have stimulated some students’ efforts to be particularly creative in their experimental designs.

The CAT and EDAT findings also support our sense that skills deemed important by many science faculty (e.g., problem solving/critical thinking, data interpretation, written and oral communication; Coil et al ., 2010 ), including ourselves, can be taught in a course that emphasizes the process of science, including close reading and critical analysis of primary literature, creative experimental design, and a look behind the scenes into the lives and dispositions of paper authors. While we teach or review relevant content in the context of particular reading assignments, we do not seek to achieve the broad coverage of a typical introductory course. Students need not know the details of the electron transport chain in order to analyze “Babies Recognize Faces Better Than Adults, Study Says,” although they do need to know the fundamental logic of study design, use of controls, and the danger of being unwilling to think beyond your preferred hypothesis. To analyze the rat studies, students must understand the terms “empathy” and “prosocial behavior,” and know how to think about variables, controls, and multiple aspects of animal behavior. In each case, they also need metacognitive awareness—the ability to determine what they do and do not understand, as well as “how we know what we know” ( Tanner, 2012 ), another skill developed through practice during the semester.

Student Attitudes and Beliefs—Influences on Learning and Career Options.

On the SAAB survey, freshmen reported significant gains in their self-rated ability to: “decode” primary literature; interpret data; read actively (annotating, concept mapping and/or cartooning the material they were reading); visualize scientific procedures; feel like they were thinking like scientists; and see experiments in a broader context ( Table 3 ). Effect sizes ( Cohen, 1992 ; Coe, 2002 ) were large for five of the SAAB measures and moderate for “interpreting data.” With regard to students’ epistemological beliefs, previous researchers ( Perry, 1970 ; Baxter Magolda, 1992 ) have noted that students’ naïve epistemological beliefs about science resist change, even after a 4-yr undergraduate program. In some cases, such beliefs appear to regress after students take introductory biology courses ( Smith and Wenk, 2006 ; Samsar et al ., 2011 ). After one semester, the freshmen in the CREATE Cornerstone course reported significant increases in four of the seven epistemological categories we surveyed: the uncertain nature of scientific knowledge; the question of whether one needs to have a special innate ability to do science; whether science is creative; and their sense of scientists as “real people.” A concurrent upper-level CREATE class also made gains in several epistemological categories, while students in a non-CREATE comparison course that included a weekly laboratory session did not change significantly in any category ( Table 3 ). These findings argue that the shifts we see relate to the CREATE experience, rather than to intellectual maturation that might occur naturally in college biology students over the course of a semester.

While student epistemology is rarely emphasized in college teaching handbooks, students’ attitudes in this area can strongly influence their learning. For example, students who feel that intelligence is a fixed quantity in which they are lacking may decrease their efforts to learn and study ineffectively as a result ( Henderson and Dweck, 1990 ). The high attrition rate of students from the biology major has been attributed in large part to students’ failure to connect intellectually with the subject, and the traditional mode of teaching introductory courses itself can slow students’ development of higher-order thinking skills (e.g., analysis, synthesis, evaluation; Bloom et al ., 1956 ). While the majority of faculty members who teach introductory biology courses want students to learn higher-order skills, exams in such courses tend to focus at lower levels ( Momsen et al ., 2010 ). Multiple-choice testing (often considered a practical requirement for a large lecture course) shapes students’ study habits in unproductive ways and interferes with critical thinking ( Stanger-Hall, 2012 ). Noting that epistemological change is typically slow, Smith and Wenk point out that “…one cannot ignore the potential retarding effect of an entrenched instructional system of lecture, textbook readings, and recitation on the students’ epistemological development” ( Smith and Wenk, 2006 , p. 777). This phenomenon may be reflected in the differences in responses of CREATE and non-CREATE students on the SAAB survey.

I had this preconception [pre-course] that…you had to be like Einstein to penetrate that field. I thought you always had to have straight A's and be highly versatile, but after reading the e-mails from the authors I know that's definitely not the case. From what they said I know that you don't have to be perfect or like Einstein. It's the passion and motivation to learn and make discoveries. You have to have a drive that leads you on. It was inspiring to hear that “science has many paths” by S—[the quote in the author's response was “there are many paths to science”]. To me this means that there's no one path or just one requirement like reading a textbook, but many. I can conduct research with as little a space as a backyard or in one of the biggest labs and each one could lead to success and greatness. (Exam response, freshman in Biology 10050)

Students’ self-reported reactions to author emails suggest that students starting college, at least at CCNY, harbor serious misconceptions about research/researchers that could likely interfere with their potential development as scientists. Nearly all students in the class noted that before they read the authors’ responses they had assumed that “only straight A students can become scientists.” This supposition changed when responding scientists recounted particular academic travails (e.g., rejection from some graduate schools) that preceded their success. Other student comments regarding their precourse suppositions that research is boring; that researchers are both overworked and unhappy with their jobs; and that such jobs allow no time for hobbies, families, or personal life, suggest that students’ precollege science experience has not presented research careers in an accurate light. Notably these views defy logic, suggesting that some adopted the stereotype without giving it much thought. Why would people who are so smart (“like Einstein”) and who achieved “straight As” in college choose dull, boring careers? Why would someone engaged in a boring career that he or she did not enjoy, nevertheless work so intensely that he or she had time for nothing else? We have speculated elsewhere that popular culture's depictions of scientists may influence students negatively, starting in high school or before ( Hoskins and Stevens, 2009 ). Changing students’ negative views of researchers/research careers is a likely required first step, if such students are to be inspired to undertake undergraduate research experiences that can lead to research careers ( Harrison et al ., 2011 ). Given that the no-cost email survey of authors can have a strong positive impact on students’ views, we encourage other STEM faculty, particularly those working with high school students or first-year undergraduates, to consider this activity.

Early Interventions.

Traditionally, STEM-inclined students spend their early college years in conventional core courses. Electives, including capstone courses, are reserved for upper-level students. Recently, however, a number of colleges and universities have begun developing nontraditional courses for entering students. A 5-d presemester “boot camp” for biology students at Louisiana State University aims to teach students about the different expectations at college versus high school, focusing on study strategies and introductory biology material. This brief presemester experience resulted in gains for boot-camp veterans, as compared with a matched control group, in classroom performance and in persistence in the major ( Wischusen and Wischusen, 2007 ). In a new course focused on freshmen's ability to reason scientifically, students studied a variety of topics that a faculty group had deemed valuable for introductory STEM courses. Students made significant gains in understanding of control of variables and proportional thinking, and also showed greater persistence in STEM ( Koenig et al ., 2012 ). Freshmen at Cabrini College participated in Phage Genomics rather than a standard Introductory Biology laboratory course. The novel course involved participation in a two-semester, hands-on research project that significantly increased students’ interest in postgraduate education, their understanding of scientific research, and their persistence in the biology major ( Harrison et al ., 2011 ).

Beyond Textbooks.

Visual representations in journal articles are both more frequent and more complex than those seen in textbooks ( Rybarczyk, 2011 ). When visual representations do appear in textbooks, they rarely illustrate the process of science ( Duncan et al ., 2011 ). Controversy, certainly a part of science, is virtually absent from textbooks ( Seethaler, 2005 ). Some faculty members feel that encounters with primary literature, as well as capstone courses, and the majority of undergraduate research experiences, should be reserved for upper-level students, who have built a broad foundation of content knowledge in textbook-based courses. We agree that understanding the nuts and bolts of a paper is a prerequisite for full understanding. Further, analysis and comprehension skills are better taught in the context of a particular content domain ( Willingham, 2007 ). At the same time, particularly for biology, the explosion of fundamental content makes it impossible for faculty to cover, let alone teach, “basic” material even to the same depth it was covered in the introductory courses of their own undergraduate years ( Hoskins and Stevens, 2009 ). In addition, despite having encountered them in multiple courses, students may fail to retain key concepts (e.g., function of control experiments; see Shi et al ., 2011 ). Our compromise was to base a freshman course on particular examples of scientific literature, choosing topics in a limited range of content areas and focusing in-depth on scientific thinking and data analysis. While we designed Introduction to Scientific Thinking for future biology majors, the approach could be easily adapted to other STEM domains. Interestingly, a recent study argues that long-term cognitive advantages can arise from studying individual topics in depth. First-year undergraduates’ grades in science courses were highest for students who had studied a single topic in depth for a month or more at any time during high school. Grades showed no correlation with economic status, region, class size, parents’ academic level, or other factors ( Schwartz et al ., 2009 ).

Taken together, our findings support the hypothesis that a CREATE Cornerstone course designed for first-year students can bring about gains in multiple areas, including critical-thinking and experimental design ability, self-rated attitudes, abilities and epistemological beliefs, and understanding of scientists as people. Our freshman students did not have a base of content knowledge in biology beyond what they retained from high school or had absorbed from popular media. By choosing articles from top journals (e.g., Science , Nature ) but focusing on topics that did not require deep understanding of, for example, gene knockout techniques or electrophoresis, we were able to give students a taste of the sorts of design logic, interpretational challenges and controversies, and creativity that are hallmarks of real-world scientific investigation. At the same time that our students gained understanding of how authentic scientific studies are carried out and interpreted, their email interviews of authors provided a personalized glimpse behind the scenes into the lives, attitudes, and motivations of the researchers themselves. Ideally, such insights will help to dispel misconceptions that can drive students away from science. To the extent that students in the one-semester Introduction to Scientific Thinking course make significant gains in scientific thinking ability, they become better prepared to master the material in any STEM major they choose, as gains in critical-thinking and reading/analytical skills should help them manage the information load in the more content-heavy science courses to come.

CONCLUSIONS

Introduction to Scientific Thinking, the CREATE Cornerstone course, improved critical-thinking and experimental design skills of freshmen at the same time that it positively shifted their attitudes about their reading/analytical abilities, their understanding of scientists as people, and multiple aspects of their epistemological beliefs. There are few reported approaches to changing the critical thinking of first-year science students, and it appears that epistemological beliefs among college students at all undergraduate levels are quite stable. We find that a one-semester course positively affects both. The course has no laboratory component, so it is relatively inexpensive to offer. Because the topic area of the articles that can be analyzed ranges broadly, readings can be selected for their utility in a variety of introductory science courses. Finally, the email survey responses from paper authors have a strong effect on students’ sense of scientists as people, helping them to overcome misconceptions of the sort that can dissuade students from seeking research opportunities and, by extension, research careers. We are encouraged by the results of this pilot study and conclude that important gains—both practical and attitudinal—with potential to help students make progress in STEM, can be achieved in a one-semester course that meets 2.5 h/wk and could, in principle, be added to many curricula.

If, as exhorted by many science-education policy reformers, we are to do a better job at encouraging students to consider research careers seriously ( National Research Council, 2003 ; American Association for the Advancement of Science, 2011 ), we need to move beyond standard first-year courses and reveal scientific research as a creative and exciting career choice undertaken by interesting and diverse individuals, not unlike the first-year students themselves. While it would be gratifying to see more students enter STEM research fields, the enhancement of skills, attitudes, and epistemological beliefs concerning science engendered by CREATE Cornerstone is aligned with societal and civic goals, even for students who go in other directions.

ACKNOWLEDGMENTS

Many thanks to CCNY students for participating in the CREATE courses and/or associated assessments. We thank Dr. Millie Roth and NKem Stanley-Mbamelu, Constance Harper, and Maria Harvey of City College Academy for Professional Preparation for assistance with first-year students, and Drs. Shubha Govind, Anu Janakiraman, Kristy Kenyon, Jonathan Levitt, and Leslie Stevens for assistance, comments on the manuscript, and ongoing discussions of CREATE teaching/learning issues. Many thanks to Dr. Barry Stein, Elizabeth Lisic, and the CAT team at Tennessee Tech University for use and grading of the CAT instrument. We also thank the anonymous reviewers, whose insightful comments strengthened the manuscript. We are very grateful to the NSF for support.

This project addresses a new course developed by S.G.H. based upon work supported by the National Science Foundation (NSF) CCLI/TUES (Course, Curriculum and Laboratory Improvement/Transforming Undergraduate Education in Science) program under NSF grant 0942790. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.

1 See Hoskins et al. , 2007 .

  • Ambrose SA, Bridges M, DiPietro M, Lovett M, Norman M, Mayer R ( 2010 ). How Learning Works: Seven Research-Based Principles for Smart Teaching , San Francisco, CA: Wiley. Google Scholar
  • American Association for the Advancement of Science ( 2011 ). Vision and Change in Undergraduate Biology Education , Washington, DC, http://visionandchange.org/finalreport (accessed 27 June 2012). Google Scholar
  • Angier N ( 2012 ). Insights from the minds of babes . New York Times 1 May D1. Google Scholar
  • Arora PN, Malhan PK ( 2010 ). Biostatistics , Mumbai, India:: Global Media, (eBook accessed 25 September 2012 through CCNY library). Google Scholar
  • Arum M, Roksa J ( 2011 ). Academically Adrift—Limited Learning on College Campuses , Chicago, IL: University of Chicago Press. Google Scholar
  • Associated Press ( 2005 ). Two Australians win Nobel Prize in medicine In: www.msnbc.msn.com/id/9576387/ns/health-health_care/t/two-australians-win-nobel-prize-medicine/#.UE_1RCLlaOx (accessed 11 September 2012). Google Scholar
  • Bartal IB-A, Decety J, Mason P ( 2011 ). Empathy and pro-social behavior in rats . Science 334 , 1427-1430. Medline ,  Google Scholar
  • Baxter Magolda MB ( 1992 ). Knowing and Reasoning in College: Gender-Related Patterns in Students’ Intellectual Development , San Francisco, CA: Jossey-Bass. Google Scholar
  • Bloom BS, Englehart MD, Furst EJ, Hill WH, Krathwohl D ( 1956 ). Taxonomy of Educational Objectives: Cognitive Domain , New York: McKay. Google Scholar
  • Centers for Disease Control and Prevention ( 2005 ). Helicobacter pylori and Peptic Ulcer Disease In: www.cdc.gov/ulcer/history.htm (accessed 11 September 2012). Google Scholar
  • Coe R ( 2002 ). It's the effect size, stupid. what effect size is and why it is important . Paper presented at the British Educational Research Association Annual Conference, 12–14 September 2002, Exeter, UK. www.psych.yorku.ca/mcnelles/documents/ESguide.pdf (accessed 23 September 2012). Google Scholar
  • Cohen J ( 1992 ). Statistical power analysis . Curr Dir Psychol Sci 92 , 98-101. Google Scholar
  • Coil D, Wenderoth MP, Cunningham M, Dirks C ( 2010 ). Teaching the process of science: faculty perceptions and effective methodology . CBE Life Sci Educ 9 , 524-535. Link ,  Google Scholar
  • Committee on Science and Technology (2006) Undergraduate science, math, and engineering education: what's working? U.S. House of Representatives Committee on Science and Technology Subcommittee on Research and Science Education. Second Session. 15 March. http://commdocs.house.gov/committees/science/hsy26481.000/hsy26481_0f.htm (accessed 1 October 2012). Google Scholar
  • Duncan D, Lubman A, Hoskins SG ( 2011 ). Introductory biology textbooks under-represent scientific process . J Microbiol Biol Educ 12 , 143-151. Medline ,  Google Scholar
  • Gasper BJ, Minchella DJ, Weaver GC, Csonka LN, Gardner SM ( 2012 ). Adapting to osmotic stress and the process of science . Science 335 , 1590-1591. Medline ,  Google Scholar
  • Gewin V ( 2011 ). Rats free each other from cages . Nature 480 , doi:10.1038/nature.2011.9603. Google Scholar
  • Gopnik A ( 2012 ). Scientific thinking in young children: theoretical advances, empirical research, and policy implications . Science 337 , 1623-1627. Medline ,  Google Scholar
  • Goyette SR, DeLuca J ( 2007 ). A semester-long student-directed research project involving enzyme immunoassay: appropriate for immunology, endocrinology, or neuroscience courses . CBE Life Sci Educ 6 , 332-342. Medline ,  Google Scholar
  • Hamlin JK, Wynn K, Bloom P ( 2007 ). Social evaluation by preverbal infants . Nature 450 , 557-559. Medline ,  Google Scholar
  • Hamlin JK, Wynn K, Bloom P ( 2010 ). Three-month-olds show a negativity bias in their social evaluations . Dev Sci 13 , 923-929. Medline ,  Google Scholar
  • Harrison M, Dunbar D, Ratmansky L, Boyd K, Lopatto D ( 2011 ). Classroom-based science research at the introductory level: changes in career choices and attitude . CBE Life Sci Educ 10 , 279-286. Link ,  Google Scholar
  • Henderson V, Dweck CS ( 1990 , Ed. S FeldmanG Elliott , Achievement and motivation in adolescence: a new model and data In: At the Threshold: The Developing Adolescent , Cambridge, MA: Harvard University Press, pp 308-329. Google Scholar
  • Hoskins SG ( 2010a ). “But if it's in the newspaper, doesn't that mean it's true?”: developing critical reading & analysis skills by evaluating newspaper science with CREATE . Am Biol Teach 72 , 415-420. Google Scholar
  • Hoskins SG ( 2010b , Ed. J MeinwaldJG Hildebrand , Teaching science for understanding: focusing on who, what and why In: Science and the Educated American: A Core Component of Liberal Education , American Academy of Sciences: Cambridge MA, 151-179. Google Scholar
  • Hoskins SG, Lopatto D, Stevens LM ( 2011 ). The CREATE approach to primary literature shifts undergraduates’ self-assessed ability to read and analyze journal articles, attitudes about science, and epistemological beliefs . CBE Life Sci Educ 10 , 368-378. Link ,  Google Scholar
  • Hoskins SG, Stevens LM ( 2009 ). Learning our L.I.M.I.T.S.: less is more in teaching science . Adv Physiol Educ 33 , 17-20. Medline ,  Google Scholar
  • Hoskins SG, Stevens LM, Nehm RH ( 2007 ). Selective use of the primary literature transforms the classroom into a virtual laboratory . Genetics 176 , 1381-1389. Medline ,  Google Scholar
  • Koenig K, Schen S, Edwards M, Bai L ( 2012 ). Addressing STEM retention through a scientific thought and methods course . J Coll Sci Teach 41 , 23-29. Google Scholar
  • Lindstrom M ( 2011 ). You love your iPhone. Literally . New York Times 1 October A21. Google Scholar
  • Liu S-Y, Tsai C-C ( 2008 ). Differences in the scientific epistemological views of undergraduate students . Int J Sci Educ 30 , 1055-1073. Google Scholar
  • Luckie DB, Maleszewski JJ, Loznak SD, Krha M ( 2004 ). Infusion of collaborative inquiry throughout a biology curriculum increases student learning: a four-year study of “Teams and Streams.” . Adv Physiol Educ 28 , 199-209. Medline ,  Google Scholar
  • Maurer D, Barrera M ( 1981 ). Infants’ perception of natural and distorted arrangements of a schematic face . Child Development 52 , 196-202. Medline ,  Google Scholar
  • Mayell H ( 2005 ). Babies recognize faces better than adults, study says. National Geographic News In: http://news.nationalgeographic.com/news/2005/03/0321_050321_babies.html (accessed 12 Sept-ember 2012). Google Scholar
  • Momsen JL, Long TM, Wyse SA, Ebert-May D ( 2010 ). Just the facts? Introductory undergraduate biology courses focus on low-level cognitive skills . CBE Life Sci Educ 9 , 435-440. Link ,  Google Scholar
  • National Center for Education Statistics ( 2009 ). The Nation's Report Card 2009, Washington, DC: In: Institute of Education Sciences of the U.S. Department of Education www.nationsreportcard.gov/science_2009/science_2009_report (accessed 3 November 2012). Google Scholar
  • National Research Council ( 2003 ). BIO2010: Transforming Undergraduate Education for Future Research Biologists , Washington, DC: National Academies Press. Google Scholar
  • Pascalis O, Scott LS, Kelly DJ, Shannon RW, Nicholson E, Coleman M, Nelson PA ( 2005 ). Plasticity of face processing in infancy . Proc Natl Acad Sci USA 102 , 5297-5300. Medline ,  Google Scholar
  • Perry W ( 1970 ). Forms of Intellectual Development in the College Years: A Scheme , San Francisco, CA: Jossey-Bass. Google Scholar
  • Poldrack R ( 2011a ). The iPhone and the brain. New York Times 5 October, A26 . Google Scholar
  • Poldrack R ( (2011b) ). NYT letter to the editor: The uncut version In: www.russpoldrack.org/search?updated-min=2011-01-01T00:00:00-08:00&updated-max=2012-01-01T00:00:00-08:00&max-results=8 (accessed 11 September 2012). Google Scholar
  • Ramirez G, Beilock SL ( 2011 ). Writing about testing worries boosts exam performance in the classroom . Science 331 , 211-213. Medline ,  Google Scholar
  • Rybarczyk ( 2011 ). Visual literacy in biology: a comparison of visual representations in textbooks and journal articles . J Coll Sci Teach 41 , 106-114. Google Scholar
  • Samsar K, Knight JK, Gülnur B, Smith MK ( 2011 ). The Colorado Learning About Science Attitude Survey (CLASS) for use in biology . CBE Life Sci Educ 10 , 268-278. Medline ,  Google Scholar
  • Schwartz MS, Sadler PM, Sonnet G, Tai RH ( 2009 ). Depth versus breadth: how content coverage in high school science courses relates to later success in college science coursework . Sci Educ 93 , 798-826. Google Scholar
  • Seethaler S ( 2005 ). Helping students make links through science controversy . Am Biol Teach 67 , 265-274. Google Scholar
  • Seymour E, Hewitt N ( 1997 ). Talking about Leaving: Why Undergraduates Leave the Sciences , Boulder, CO: Westview. Google Scholar
  • Shi J, Power JM, Klymkowsky MW ( 2011 ). Revealing student thinking about experimental design and the roles of control experiments . Int J Sch Teach Learn 5 , 1-16. Google Scholar
  • Sirum K, Humburg J ( 2011 ). The Experimental Design Ability Test (EDAT) . Bioscene 37 , 8-16. Google Scholar
  • Smith C, Wenk L ( 2006 ). The relation among three aspects of college freshmen's epistemology of science . J Res Sci Teach 43 , 747-785. Google Scholar
  • Stanger-Hall KF ( 2012 ). Multiple-choice exams: an obstacle for higher-level thinking in introductory science classes . CBE Life Sci Educ 11 , 294-306. Link ,  Google Scholar
  • Stein B, Haynes A, Redding M ( 2012 ). Critical Thinking Assessment Test, Version 5, Cookeville, TN: In: Center for Assessment & Improvement of Learning, Tennessee Tech University. Google Scholar
  • Talbot M ( 2006 ). The baby lab . The New Yorker 82 , 90-101. Google Scholar
  • Tanner D ( 2012 ). Promoting student metacognition . CBE Life Sci Educ 11 , 113-120. Link ,  Google Scholar
  • Usher DC, Driscoll TA, Dhurjati P, Pelesko JA, Rossi LF, Shleininger G, Pusecker K, White HB ( 2011 ). A transformative model for undergraduate quantitative biology education . CBE Life Sci Educ 9 , 181-188. Google Scholar
  • Wiegant F, Scager K, Boonstra J ( 2011 ). An undergraduate course to bridge the gap between textbooks and scientific research . CBE Life Sci Educ 10 , 83-94. Link ,  Google Scholar
  • Willingham DT ( 2007 ). Critical thinking: why is it so hard to teach? . Am Educ 31 , 8-19. Google Scholar
  • Wischusen SM, Wischusen EW ( 2007 ). Biology intensive orientation for students (BIOS), a biology “boot camp.” . CBE Life Sci Educ 6 , 172-178. Link ,  Google Scholar
  • Wise SL, DeMars CE ( 2005 ). Low examinee effort in low-stakes assessment: problems and potential solutions . Educ Assess 10 , 1-17. Google Scholar
  • Incorporating core concepts into an undergraduate neuroscience program in a resource-restricted environment 15 August 2024 | Frontiers in Education, Vol. 9
  • Fostering student authorship skills in synthetic biology 7 June 2024 | Frontiers in Bioengineering and Biotechnology, Vol. 12
  • CREATE’ing improvements in first-year students’ science efficacy via an online introductory course experience 25 Apr 2024 | Journal of Microbiology & Biology Education, Vol. 25, No. 1
  • Peer-reviewed presentation exchange in an undergraduate classroom 25 Apr 2024 | Journal of Microbiology & Biology Education, Vol. 25, No. 1
  • Evaluation of a stem-based didactic model for the development of scientific competences in high school students: a quasi-experimental study 12 February 2024 | Seminars in Medical Writing and Education, Vol. 3
  • Impact of transition to a hybrid model of biochemistry course-based undergraduate research experience during the COVID-19 pandemic on student science self-efficacy and conceptual knowledge 27 October 2023 | Discover Education, Vol. 2, No. 1
  • Training doctoral students in critical thinking and experimental design using problem-based learning 16 August 2023 | BMC Medical Education, Vol. 23, No. 1
  • Lara K. Goudsouzian and
  • Jeremy L. Hsu
  • Stephanie Gardner, Monitoring Editor
  • Synthesizing Research Narratives to Reveal the Big Picture: a CREATE(S) Intervention Modified for Journal Club Improves Undergraduate Science Literacy 22 Aug 2023 | Journal of Microbiology & Biology Education, Vol. 24, No. 2
  • A Modified CREATE Approach for Introducing Primary Literature Into Psychological Sciences Courses 25 March 2022 | Teaching of Psychology, Vol. 50, No. 3
  • Exploring Science Literature: Integrating Chemistry Research with Chemical Education 12 May 2023 | Journal of Chemical Education, Vol. 100, No. 6
  • Discussion of Annotated Research Articles Results in Increases in Scientific Literacy within a Cell Biology Course 20 Apr 2023 | Journal of Microbiology & Biology Education, Vol. 24, No. 1
  • Using CREATE and Scientific Literature to Teach Chemistry 1 February 2023 | Journal of Chemical Education, Vol. 100, No. 2
  • How Do Students Interact With the Primary Scientific Literature in an Undergraduate Science Program? 31 August 2023 | Journal of College Science Teaching, Vol. 52, No. 3
  • The Five Core Concepts of Biology as a Framework for Promoting Expert-Like Behaviors in Undergraduates Learning How to Read Primary Scientific Literature 15 Dec 2022 | Journal of Microbiology & Biology Education, Vol. 23, No. 3
  • From Novice To Expert: An Assessment To Measure Strategies Students Implement While Learning To Read Primary Scientific Literature 15 Dec 2022 | Journal of Microbiology & Biology Education, Vol. 23, No. 3
  • Replicating or franchising a STEM afterschool program model: core elements of programmatic integrity 28 January 2022 | International Journal of STEM Education, Vol. 9, No. 1
  • How do Chinese undergraduates understand critical thinking? A phenomenographic approach 20 September 2022 | Frontiers in Education, Vol. 7
  • Papers to Podcasts 1 Sep 2022 | The American Biology Teacher, Vol. 84, No. 7
  • Motivation in Reading Primary Scientific Literature: a questionnaire to assess student purpose and efficacy in reading disciplinary literature 19 May 2022 | International Journal of Science Education, Vol. 44, No. 8
  • Leveraging the Analytical Chemistry Primary Literature for Authentic, Integrated Content Knowledge and Process Skill Development 1 February 2022 | Journal of Chemical Education, Vol. 99, No. 3
  • Reading Research for Writing: Co-Constructing Core Skills Using Primary Literature 14 January 2022 | Impacting Education: Journal on Transforming Professional Practice, Vol. 7, No. 1
  • ACE-Bio Experimentation Competencies Across the Biology Curriculum: When Should We Teach Different Competencies and Concepts? 12 May 2022
  • Integrating the Five Core Concepts of Biology into Course Syllabi to Advance Student Science Epistemology and Experimentation Skills 12 May 2022
  • Addressing the COVID-19 Pandemic in Introductory Psychology Using the Jigsaw Method Adapted for Remote Learning 2 October 2023 | Journal of College Science Teaching, Vol. 51, No. 3
  • Dennis Lee ,
  • Mallory Wright ,
  • Courtney Faber ,
  • Cazembe Kennedy , and
  • Dylan Dittrich-Reed
  • Stanley M. Lo, Monitoring Editor
  • Traumatic exposure of college freshmen to primary scientific literature: How to avoid turning students off from reading journal articles 1 Sep 2021 | Teaching and Teacher Education, Vol. 105
  • Insights on biology student motivations and challenges when reading and analyzing primary literature 10 May 2021 | PLOS ONE, Vol. 16, No. 5
  • Sharing Notes Is Encouraged: Annotating and Cocreating with Hypothes.is and Google Docs † 30 Apr 2021 | Journal of Microbiology & Biology Education, Vol. 22, No. 1
  • Using Critical Analysis of Scientific Literature To Maintain an Interactive Learning Environment for In-Person and Online Course Modalities 30 Apr 2021 | Journal of Microbiology & Biology Education, Vol. 22, No. 1
  • Harder things will stretch you further: helping first-year undergraduate students meaningfully engage with recent research papers in probability and statistics 10 June 2020 | Teaching Mathematics and its Applications: An International Journal of the IMA, Vol. 40, No. 1
  • Mini-review: CREATE-ive use of primary literature in the science classroom 1 Jan 2021 | Neuroscience Letters, Vol. 742
  • Disciplinary literacies in STEM: what do undergraduates read, how do they read it, and can we teach scientific reading more effectively? 4 March 2021 | Higher Education Pedagogies, Vol. 6, No. 1
  • Case‐based learning combined with science, technology, engineering and math (STEM) education concept to improve clinical thinking of undergraduate nursing students: A randomized experiment 28 September 2020 | Nursing Open, Vol. 8, No. 1
  • Primary Literature in Undergraduate Science Courses: What are the Outcomes? 27 September 2023 | Journal of College Science Teaching, Vol. 50, No. 3
  • Using primary literature on SARS‐CoV‐2 to promote student learning about evolution 8 July 2020 | Ecology and Evolution, Vol. 10, No. 22
  • Meeting the Needs of A Changing Landscape: Advances and Challenges in Undergraduate Biology Education 13 May 2020 | Bulletin of Mathematical Biology, Vol. 82, No. 5
  • A Modified CREATE Intervention Improves Student Cognitive and Affective Outcomes in an Upper-Division Genetics Course 1 Jan 2020 | Journal of Microbiology & Biology Education, Vol. 21, No. 1
  • A Single, Narrowly Focused CREATE Primary Literature Module Evokes Gains in Genetics Students’ Self-Efficacy and Understanding of the Research Process 1 Jan 2020 | Journal of Microbiology & Biology Education, Vol. 21, No. 1
  • Does Repetition Matter? Analysis of Biology Majors’ Ability to Comprehend Journal Articles Across a Major 1 Jan 2020 | Journal of Microbiology & Biology Education, Vol. 21, No. 1
  • April A. Nelms and
  • Miriam Segura-Totten
  • Erin L. Dolan, Monitoring Editor
  • Ashley B. Heim and
  • Emily A. Holt
  • A module integrating conventional teaching and student‐centered approach for critical reading of scientific literature 21 June 2019 | Biochemistry and Molecular Biology Education, Vol. 47, No. 5
  • Using Primary Research Literature to Teach Critical Thinking in Pre-Service Teacher Education 25 Jan 2019
  • Annotated primary scientific literature: A pedagogical tool for undergraduate courses 9 January 2019 | PLOS Biology, Vol. 17, No. 1
  • Incorporating Research Into Courses 1 Jan 2019
  • From CREATE Workshop to Course Implementation: Examining Downstream Impacts on Teaching Practices and Student Learning at 4-Year Institutions 10 January 2019 | BioScience, Vol. 69, No. 1
  • Just Figures: A Method to Introduce Students to Data Analysis One Figure at a Time 1 Jan 2019 | Journal of Microbiology & Biology Education, Vol. 20, No. 2
  • Using “Research Boxes” to Enhance Understanding of Primary Literature and the Process of Science 1 Jan 2019 | Journal of Microbiology & Biology Education, Vol. 20, No. 2
  • Melanie L. Styers ,
  • Peter A. Van Zandt ,, and
  • Katherine L. Hayden
  • Maxwell Kramer ,
  • Dalay Olson , and
  • J. D. Walker
  • Daron Barnard, Monitoring Editor
  • A Cornerstone Course in Sociology: Providing Students with Theory, Methods, and Career Preparation Early in the Major 13 September 2017 | Teaching Sociology, Vol. 46, No. 2
  • Investigating Undergraduates’ Perceptions of Science in Courses Taught Using the CREATE Strategy 1 Mar 2018 | Journal of Microbiology & Biology Education, Vol. 19, No. 1
  • Domains and predictors of first-year student success: A systematic review 1 Feb 2018 | Educational Research Review, Vol. 23
  • Investigating teachers' perceptions of their own practices to improve students' critical thinking in secondary schools in Saudi Arabia 1 Jan 2018 | International Journal of Cognitive Research in Science Engineering and Education, Vol. 6, No. 3
  • Intentionally flawed manuscripts as means for teaching students to critically evaluate scientific papers 31 August 2017 | Biochemistry and Molecular Biology Education, Vol. 46, No. 1
  • An approach to teaching critical thinking across disciplines using performance tasks with a common rubric 1 Dec 2017 | Thinking Skills and Creativity, Vol. 26
  • CREATE Two-Year/Four-Year Faculty Workshops: A Focus on Practice, Reflection, and Novel Curricular Design Leads to Diverse Gains for Faculty at Two-Year and Four-Year Institutions 1 Dec 2017 | Journal of Microbiology & Biology Education, Vol. 18, No. 3
  • Using the research article as a model for teaching laboratory report writing provides opportunities for development of genre awareness and adoption of new literacy practices 1 Oct 2017 | English for Specific Purposes, Vol. 48
  • Anna Jo Auerbach , and
  • Elisabeth E. Schussler
  • Brickman Peggy, Monitoring Editor
  • Measuring and Advancing Experimental Design Ability in an Introductory Course without Altering Existing Lab Curriculum 1 Apr 2017 | Journal of Microbiology & Biology Education, Vol. 18, No. 1
  • Richard Lie ,
  • Christopher Abdullah ,
  • Wenliang He , and
  • Classroom Journal Club: Collaborative Study of Contemporary Primary Literature in the Biomechanics Classroom 7 June 2016 | Journal of Biomechanical Engineering, Vol. 138, No. 7
  • Caleb M. Trujillo ,
  • Trevor R. Anderson , and
  • Nancy J. Pelaez
  • Hannah Sevian, Monitoring Editor
  • Dina L. Newman ,
  • Christopher W. Snyder ,
  • J. Nick Fisk , and
  • L. Kate Wright
  • Ross Nehm, Monitoring Editor:
  • Use of primary literature in the undergraduate analytical class 23 March 2016 | Analytical and Bioanalytical Chemistry, Vol. 408, No. 12
  • Kristy L. Kenyon ,
  • Morgan E. Onorato ,
  • Alan J. Gottesman ,
  • Jamila Hoque , and
  • James Hewlett, Monitoring Editor:
  • Empathy for the Digital Age 1 Jan 2016
  • The Impact of Personality Factors and Preceding User Comments on the Processing of Research Findings on Deep Brain Stimulation: A Randomized Controlled Experiment in a Simulated Online Forum 3 March 2016 | Journal of Medical Internet Research, Vol. 18, No. 3
  • Learners’ Epistemic Beliefs and Their Relations with Science Learning—Exploring the Cultural Differences 5 August 2015
  • Targeting Critical Thinking Skills in a First-Year Undergraduate Research Course 1 Dec 2015 | Journal of Microbiology & Biology Education, Vol. 16, No. 2
  • Engaging students in paleontology: the design and implementation of an undergraduate-level blended course in Panama 29 September 2015 | Evolution: Education and Outreach, Vol. 8, No. 1
  • Views from academia and industry on skills needed for the modern research environment 6 July 2015 | Biochemistry and Molecular Biology Education, Vol. 43, No. 5
  • Julian Parris ,
  • Amy Guzdar , and
  • Nancy Pelaez, Monitoring Editor
  • Demystifying the Chemistry Literature: Building Information Literacy in First-Year Chemistry Students through Student-Centered Learning and Experiment Design 11 September 2014 | Journal of Chemical Education, Vol. 92, No. 1
  • Brian K. Sato ,
  • Pavan Kadandale ,
  • Wenliang He ,
  • Paige M. N. Murata ,
  • Yama Latif , and
  • Mark Warschauer
  • Using ASM Podcasts to Excite Undergraduate Students about Current Microbiological Research 1 Dec 2014 | Journal of Microbiology & Biology Education, Vol. 15, No. 2
  • Leslie M. Stevens , and
  • Library Value in the Classroom: Assessing Student Learning Outcomes from Instruction and Collections 1 May 2014 | The Journal of Academic Librarianship, Vol. 40, No. 3-4
  • Letter to the Editor 1 May 2014 | Journal of Microbiology & Biology Education, Vol. 15, No. 1
  • Jessica Middlemis Maher ,
  • Jonathan C. Markey , and
  • Diane Ebert-May
  • RECONCILING LEARNING AND TEACHING STYLES IN SCIENCE, TECHNOLOGY ENGINEERING AND MATHEMATICS DISCIPLINES THROUGH COGENERATIVE DIALOGUES 20 March 2013 | Problems of Education in the 21st Century, Vol. 52, No. 1
  • Work in Progress: Design and Implementation of Collaborative Problem-based Learning Laboratory Modules for Engineering and Nonengineering Students

Submitted: 21 November 2012 Revised: 15 December 2012 Accepted: 15 December 2012

© 2013 A. J. Gottesman and S. G. Hoskins. CBE—Life Sciences Education © 2013 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  • En español – ExME
  • Em português – EME

An introduction to different types of study design

Posted on 6th April 2021 by Hadi Abbas

""

Study designs are the set of methods and procedures used to collect and analyze data in a study.

Broadly speaking, there are 2 types of study designs: descriptive studies and analytical studies.

Descriptive studies

  • Describes specific characteristics in a population of interest
  • The most common forms are case reports and case series
  • In a case report, we discuss our experience with the patient’s symptoms, signs, diagnosis, and treatment
  • In a case series, several patients with similar experiences are grouped.

Analytical Studies

Analytical studies are of 2 types: observational and experimental.

Observational studies are studies that we conduct without any intervention or experiment. In those studies, we purely observe the outcomes.  On the other hand, in experimental studies, we conduct experiments and interventions.

Observational studies

Observational studies include many subtypes. Below, I will discuss the most common designs.

Cross-sectional study:

  • This design is transverse where we take a specific sample at a specific time without any follow-up
  • It allows us to calculate the frequency of disease ( p revalence ) or the frequency of a risk factor
  • This design is easy to conduct
  • For example – if we want to know the prevalence of migraine in a population, we can conduct a cross-sectional study whereby we take a sample from the population and calculate the number of patients with migraine headaches.

Cohort study:

  • We conduct this study by comparing two samples from the population: one sample with a risk factor while the other lacks this risk factor
  • It shows us the risk of developing the disease in individuals with the risk factor compared to those without the risk factor ( RR = relative risk )
  • Prospective : we follow the individuals in the future to know who will develop the disease
  • Retrospective : we look to the past to know who developed the disease (e.g. using medical records)
  • This design is the strongest among the observational studies
  • For example – to find out the relative risk of developing chronic obstructive pulmonary disease (COPD) among smokers, we take a sample including smokers and non-smokers. Then, we calculate the number of individuals with COPD among both.

Case-Control Study:

  • We conduct this study by comparing 2 groups: one group with the disease (cases) and another group without the disease (controls)
  • This design is always retrospective
  •  We aim to find out the odds of having a risk factor or an exposure if an individual has a specific disease (Odds ratio)
  •  Relatively easy to conduct
  • For example – we want to study the odds of being a smoker among hypertensive patients compared to normotensive ones. To do so, we choose a group of patients diagnosed with hypertension and another group that serves as the control (normal blood pressure). Then we study their smoking history to find out if there is a correlation.

Experimental Studies

  • Also known as interventional studies
  • Can involve animals and humans
  • Pre-clinical trials involve animals
  • Clinical trials are experimental studies involving humans
  • In clinical trials, we study the effect of an intervention compared to another intervention or placebo. As an example, I have listed the four phases of a drug trial:

I:  We aim to assess the safety of the drug ( is it safe ? )

II: We aim to assess the efficacy of the drug ( does it work ? )

III: We want to know if this drug is better than the old treatment ( is it better ? )

IV: We follow-up to detect long-term side effects ( can it stay in the market ? )

  • In randomized controlled trials, one group of participants receives the control, while the other receives the tested drug/intervention. Those studies are the best way to evaluate the efficacy of a treatment.

Finally, the figure below will help you with your understanding of different types of study designs.

A visual diagram describing the following. Two types of epidemiological studies are descriptive and analytical. Types of descriptive studies are case reports, case series, descriptive surveys. Types of analytical studies are observational or experimental. Observational studies can be cross-sectional, case-control or cohort studies. Types of experimental studies can be lab trials or field trials.

References (pdf)

You may also be interested in the following blogs for further reading:

An introduction to randomized controlled trials

Case-control and cohort studies: a brief overview

Cohort studies: prospective and retrospective designs

Prevalence vs Incidence: what is the difference?

' src=

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

No Comments on An introduction to different types of study design

' src=

you are amazing one!! if I get you I’m working with you! I’m student from Ethiopian higher education. health sciences student

' src=

Very informative and easy understandable

' src=

You are my kind of doctor. Do not lose sight of your objective.

' src=

Wow very erll explained and easy to understand

' src=

I’m Khamisu Habibu community health officer student from Abubakar Tafawa Balewa university teaching hospital Bauchi, Nigeria, I really appreciate your write up and you have make it clear for the learner. thank you

' src=

well understood,thank you so much

' src=

Well understood…thanks

' src=

Simply explained. Thank You.

' src=

Thanks a lot for this nice informative article which help me to understand different study designs that I felt difficult before

' src=

That’s lovely to hear, Mona, thank you for letting the author know how useful this was. If there are any other particular topics you think would be useful to you, and are not already on the website, please do let us know.

' src=

it is very informative and useful.

thank you statistician

Fabulous to hear, thank you John.

' src=

Thanks for this information

Thanks so much for this information….I have clearly known the types of study design Thanks

That’s so good to hear, Mirembe, thank you for letting the author know.

' src=

Very helpful article!! U have simplified everything for easy understanding

' src=

I’m a health science major currently taking statistics for health care workers…this is a challenging class…thanks for the simified feedback.

That’s good to hear this has helped you. Hopefully you will find some of the other blogs useful too. If you see any topics that are missing from the website, please do let us know!

' src=

Hello. I liked your presentation, the fact that you ranked them clearly is very helpful to understand for people like me who is a novelist researcher. However, I was expecting to read much more about the Experimental studies. So please direct me if you already have or will one day. Thank you

Dear Ay. My sincere apologies for not responding to your comment sooner. You may find it useful to filter the blogs by the topic of ‘Study design and research methods’ – here is a link to that filter: https://s4be.cochrane.org/blog/topic/study-design/ This will cover more detail about experimental studies. Or have a look on our library page for further resources there – you’ll find that on the ‘Resources’ drop down from the home page.

However, if there are specific things you feel you would like to learn about experimental studies, that are missing from the website, it would be great if you could let me know too. Thank you, and best of luck. Emma

' src=

Great job Mr Hadi. I advise you to prepare and study for the Australian Medical Board Exams as soon as you finish your undergrad study in Lebanon. Good luck and hope we can meet sometime in the future. Regards ;)

' src=

You have give a good explaination of what am looking for. However, references am not sure of where to get them from.

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

""

Cluster Randomized Trials: Concepts

This blog summarizes the concepts of cluster randomization, and the logistical and statistical considerations while designing a cluster randomized controlled trial.

""

Expertise-based Randomized Controlled Trials

This blog summarizes the concepts of Expertise-based randomized controlled trials with a focus on the advantages and challenges associated with this type of study.

critical thinking experimental design

A well-designed cohort study can provide powerful results. This blog introduces prospective and retrospective cohort studies, discussing the advantages, disadvantages and use of these type of study designs.

Experimental design

Introduction.

critical thinking experimental design

Experiments shape the human experience. Experiments are a critical component of all living natural systems encompassing evolution to community dynamics. Experiments in science are creative, iterative, & source critical thinking. We naturally experiment in art, science, and life. Here, we hone these skills through principles and practice. The principles are here, and the practice is in the form a lab manual entitled Designcraft for experiments .

Course outline

If you are electing to engage with this learning opportunity formally, please see the official course outline for specific details. There are two summative assessments to the lecture principles (again also see lab manual for the work associated with that component of the formal course offering if you are doing for credit).

  • Test (on content of the book and critical design thinking for science).
  • Grant proposal (for experiment and idea you care about).

Learning outcomes

  • Understand the core concepts of experimental design for any natural science experiment.
  • Understand key terminology, semantics, and experimental design philosophies.
  • Critically assess experiments.
  • Provide visual heuristics and workflows for experiments.
  • Be able to design & execute an effective experiment.
  • Be able to clearly write a well-structured manuscript suitable for publication in a journal.
  • Be able to write a competitive grant proposal appropriate for a Master’s application.

Steps to design success

  • Read a very accessible book on experimental design.
  • Take a test to demonstrate mastery of content and creative design for science experiments.
  • Select a science topic that you care deeply about it and do research on this opportunity.
  • Write a one-page grant proposal appropriate for a graduate-school funding application.

Experiments are a powerful tool to understand, manage, and explore the world around us. This course will provide you with the terminology and concepts you need to be competitive and effective in research and employment. The lectures include exploration of the key terminology and ideas you need to process experiments. You will also practice design experiments in the labs.

Lectures (or independent but facilitated student learning) include three mental processes.

Read. Think. Create.

In the first module (i.e., a total of 6 weeks allocated but please work at your own pace), we read a book together. This component of the lectures provides you with the critical elements, ideas, tools, and terminology you need to design better experiments. The extent that you develop your knowledge and design skills are evaluated using a test, provided in advance, that you complete on your own time. Lectures with decks are provided and they are the principles that emerged, for me, from reading the book.

In the second module (i.e., a total of 4 weeks blocked), you design an experiment for graduate-level research and prepare an NSERC grant proposal (very short, see guidelines). The primary purpose of this component of the lectures is to provide you with the opportunity to generate a novel, useful research proposal on a topic of your choice. Key readings and discussion are provided to support your development and exploration of a topic that further hone your skills.

This is the recommended timing for completing work. Deadlines are firm for submission of summative assessments, but the pacing to get each of those points in time is up to you. In lectures (and labs), we will however work through and discussion the material in this order.

week lecture resource labs
1 intro to course none
2 textbook ch 1 & 2 & pilot field labs
3 textbook ch 3 & 4 & pilot field labs
4 textbook ch 5 & 6 & pilot field labs
5 textbook ch 7 & 8 & collect data for field experiment you chose
6 textbook ch 9 & 10 &
7 see rubric provided here & due date in course outline work on lab report
8 work on lab report
9
10 shark-tank thinking for grants & & select and complete a data-design lab &
11 finalize grant proposal, ensure effective experimental design & & review rubric provided in course materials & & & select & complete a data-design lab
12 grant thinking & discussion on best principles for experimental design applications in daily life &

Lortie, CJ (2022): Experiment sandbox. figshare. Book. https://doi.org/10.6084/m9.figshare.20442801.v3

Creative Commons License

Logo for Mavs Open Press

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

8.1 Experimental design: What is it and when should it be used?

Learning objectives.

  • Define experiment
  • Identify the core features of true experimental designs
  • Describe the difference between an experimental group and a control group
  • Identify and describe the various types of true experimental designs

Experiments are an excellent data collection strategy for social workers wishing to observe the effects of a clinical intervention or social welfare program. Understanding what experiments are and how they are conducted is useful for all social scientists, whether they actually plan to use this methodology or simply aim to understand findings from experimental studies. An experiment is a method of data collection designed to test hypotheses under controlled conditions. In social scientific research, the term experiment has a precise meaning and should not be used to describe all research methodologies.

critical thinking experimental design

Experiments have a long and important history in social science. Behaviorists such as John Watson, B. F. Skinner, Ivan Pavlov, and Albert Bandura used experimental design to demonstrate the various types of conditioning. Using strictly controlled environments, behaviorists were able to isolate a single stimulus as the cause of measurable differences in behavior or physiological responses. The foundations of social learning theory and behavior modification are found in experimental research projects. Moreover, behaviorist experiments brought psychology and social science away from the abstract world of Freudian analysis and towards empirical inquiry, grounded in real-world observations and objectively-defined variables. Experiments are used at all levels of social work inquiry, including agency-based experiments that test therapeutic interventions and policy experiments that test new programs.

Several kinds of experimental designs exist. In general, designs considered to be true experiments contain three basic key features:

  • random assignment of participants into experimental and control groups
  • a “treatment” (or intervention) provided to the experimental group
  • measurement of the effects of the treatment in a post-test administered to both groups

Some true experiments are more complex.  Their designs can also include a pre-test and can have more than two groups, but these are the minimum requirements for a design to be a true experiment.

Experimental and control groups

In a true experiment, the effect of an intervention is tested by comparing two groups: one that is exposed to the intervention (the experimental group , also known as the treatment group) and another that does not receive the intervention (the control group ). Importantly, participants in a true experiment need to be randomly assigned to either the control or experimental groups. Random assignment uses a random number generator or some other random process to assign people into experimental and control groups. Random assignment is important in experimental research because it helps to ensure that the experimental group and control group are comparable and that any differences between the experimental and control groups are due to random chance. We will address more of the logic behind random assignment in the next section.

Treatment or intervention

In an experiment, the independent variable is receiving the intervention being tested—for example, a therapeutic technique, prevention program, or access to some service or support. It is less common in of social work research, but social science research may also have a stimulus, rather than an intervention as the independent variable. For example, an electric shock or a reading about death might be used as a stimulus to provoke a response.

In some cases, it may be immoral to withhold treatment completely from a control group within an experiment. If you recruited two groups of people with severe addiction and only provided treatment to one group, the other group would likely suffer. For these cases, researchers use a control group that receives “treatment as usual.” Experimenters must clearly define what treatment as usual means. For example, a standard treatment in substance abuse recovery is attending Alcoholics Anonymous or Narcotics Anonymous meetings. A substance abuse researcher conducting an experiment may use twelve-step programs in their control group and use their experimental intervention in the experimental group. The results would show whether the experimental intervention worked better than normal treatment, which is useful information.

The dependent variable is usually the intended effect the researcher wants the intervention to have. If the researcher is testing a new therapy for individuals with binge eating disorder, their dependent variable may be the number of binge eating episodes a participant reports. The researcher likely expects her intervention to decrease the number of binge eating episodes reported by participants. Thus, she must, at a minimum, measure the number of episodes that occur after the intervention, which is the post-test .  In a classic experimental design, participants are also given a pretest to measure the dependent variable before the experimental treatment begins.

Types of experimental design

Let’s put these concepts in chronological order so we can better understand how an experiment runs from start to finish. Once you’ve collected your sample, you’ll need to randomly assign your participants to the experimental group and control group. In a common type of experimental design, you will then give both groups your pretest, which measures your dependent variable, to see what your participants are like before you start your intervention. Next, you will provide your intervention, or independent variable, to your experimental group, but not to your control group. Many interventions last a few weeks or months to complete, particularly therapeutic treatments. Finally, you will administer your post-test to both groups to observe any changes in your dependent variable. What we’ve just described is known as the classical experimental design and is the simplest type of true experimental design. All of the designs we review in this section are variations on this approach. Figure 8.1 visually represents these steps.

Steps in classic experimental design: Sampling to Assignment to Pretest to intervention to Posttest

An interesting example of experimental research can be found in Shannon K. McCoy and Brenda Major’s (2003) study of people’s perceptions of prejudice. In one portion of this multifaceted study, all participants were given a pretest to assess their levels of depression. No significant differences in depression were found between the experimental and control groups during the pretest. Participants in the experimental group were then asked to read an article suggesting that prejudice against their own racial group is severe and pervasive, while participants in the control group were asked to read an article suggesting that prejudice against a racial group other than their own is severe and pervasive. Clearly, these were not meant to be interventions or treatments to help depression, but were stimuli designed to elicit changes in people’s depression levels. Upon measuring depression scores during the post-test period, the researchers discovered that those who had received the experimental stimulus (the article citing prejudice against their same racial group) reported greater depression than those in the control group. This is just one of many examples of social scientific experimental research.

In addition to classic experimental design, there are two other ways of designing experiments that are considered to fall within the purview of “true” experiments (Babbie, 2010; Campbell & Stanley, 1963).  The posttest-only control group design is almost the same as classic experimental design, except it does not use a pretest. Researchers who use posttest-only designs want to eliminate testing effects , in which participants’ scores on a measure change because they have already been exposed to it. If you took multiple SAT or ACT practice exams before you took the real one you sent to colleges, you’ve taken advantage of testing effects to get a better score. Considering the previous example on racism and depression, participants who are given a pretest about depression before being exposed to the stimulus would likely assume that the intervention is designed to address depression. That knowledge could cause them to answer differently on the post-test than they otherwise would. In theory, as long as the control and experimental groups have been determined randomly and are therefore comparable, no pretest is needed. However, most researchers prefer to use pretests in case randomization did not result in equivalent groups and to help assess change over time within both the experimental and control groups.

Researchers wishing to account for testing effects but also gather pretest data can use a Solomon four-group design. In the Solomon four-group design , the researcher uses four groups. Two groups are treated as they would be in a classic experiment—pretest, experimental group intervention, and post-test. The other two groups do not receive the pretest, though one receives the intervention. All groups are given the post-test. Table 8.1 illustrates the features of each of the four groups in the Solomon four-group design. By having one set of experimental and control groups that complete the pretest (Groups 1 and 2) and another set that does not complete the pretest (Groups 3 and 4), researchers using the Solomon four-group design can account for testing effects in their analysis.

Table 8.1 Solomon four-group design
Group 1 X X X
Group 2 X X
Group 3 X X
Group 4 X

Solomon four-group designs are challenging to implement in the real world because they are time- and resource-intensive. Researchers must recruit enough participants to create four groups and implement interventions in two of them.

Overall, true experimental designs are sometimes difficult to implement in a real-world practice environment. It may be impossible to withhold treatment from a control group or randomly assign participants in a study. In these cases, pre-experimental and quasi-experimental designs–which we  will discuss in the next section–can be used.  However, the differences in rigor from true experimental designs leave their conclusions more open to critique.

Experimental design in macro-level research

You can imagine that social work researchers may be limited in their ability to use random assignment when examining the effects of governmental policy on individuals.  For example, it is unlikely that a researcher could randomly assign some states to implement decriminalization of recreational marijuana and some states not to in order to assess the effects of the policy change.  There are, however, important examples of policy experiments that use random assignment, including the Oregon Medicaid experiment. In the Oregon Medicaid experiment, the wait list for Oregon was so long, state officials conducted a lottery to see who from the wait list would receive Medicaid (Baicker et al., 2013).  Researchers used the lottery as a natural experiment that included random assignment. People selected to be a part of Medicaid were the experimental group and those on the wait list were in the control group. There are some practical complications macro-level experiments, just as with other experiments.  For example, the ethical concern with using people on a wait list as a control group exists in macro-level research just as it does in micro-level research.

Key Takeaways

  • True experimental designs require random assignment.
  • Control groups do not receive an intervention, and experimental groups receive an intervention.
  • The basic components of a true experiment include a pretest, posttest, control group, and experimental group.
  • Testing effects may cause researchers to use variations on the classic experimental design.
  • Classic experimental design- uses random assignment, an experimental and control group, as well as pre- and posttesting
  • Control group- the group in an experiment that does not receive the intervention
  • Experiment- a method of data collection designed to test hypotheses under controlled conditions
  • Experimental group- the group in an experiment that receives the intervention
  • Posttest- a measurement taken after the intervention
  • Posttest-only control group design- a type of experimental design that uses random assignment, and an experimental and control group, but does not use a pretest
  • Pretest- a measurement taken prior to the intervention
  • Random assignment-using a random process to assign people into experimental and control groups
  • Solomon four-group design- uses random assignment, two experimental and two control groups, pretests for half of the groups, and posttests for all
  • Testing effects- when a participant’s scores on a measure change because they have already been exposed to it
  • True experiments- a group of experimental designs that contain independent and dependent variables, pretesting and post testing, and experimental and control groups

Image attributions

exam scientific experiment by mohamed_hassan CC-0

Foundations of Social Work Research Copyright © 2020 by Rebecca L. Mauldin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

What influences students’ abilities to critically evaluate scientific investigations?

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, United States of America

ORCID logo

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – review & editing

Affiliation Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY, United States of America

Roles Conceptualization, Investigation, Methodology, Writing – review & editing

Roles Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – review & editing

Roles Conceptualization, Funding acquisition, Methodology, Resources, Supervision, Writing – review & editing

  • Ashley B. Heim, 
  • Cole Walsh, 
  • David Esparza, 
  • Michelle K. Smith, 
  • N. G. Holmes

PLOS

  • Published: August 30, 2022
  • https://doi.org/10.1371/journal.pone.0273337
  • Reader Comments

Table 1

Critical thinking is the process by which people make decisions about what to trust and what to do. Many undergraduate courses, such as those in biology and physics, include critical thinking as an important learning goal. Assessing critical thinking, however, is non-trivial, with mixed recommendations for how to assess critical thinking as part of instruction. Here we evaluate the efficacy of assessment questions to probe students’ critical thinking skills in the context of biology and physics. We use two research-based standardized critical thinking instruments known as the Biology Lab Inventory of Critical Thinking in Ecology (Eco-BLIC) and Physics Lab Inventory of Critical Thinking (PLIC). These instruments provide experimental scenarios and pose questions asking students to evaluate what to trust and what to do regarding the quality of experimental designs and data. Using more than 3000 student responses from over 20 institutions, we sought to understand what features of the assessment questions elicit student critical thinking. Specifically, we investigated (a) how students critically evaluate aspects of research studies in biology and physics when they are individually evaluating one study at a time versus comparing and contrasting two and (b) whether individual evaluation questions are needed to encourage students to engage in critical thinking when comparing and contrasting. We found that students are more critical when making comparisons between two studies than when evaluating each study individually. Also, compare-and-contrast questions are sufficient for eliciting critical thinking, with students providing similar answers regardless of if the individual evaluation questions are included. This research offers new insight on the types of assessment questions that elicit critical thinking at the introductory undergraduate level; specifically, we recommend instructors incorporate more compare-and-contrast questions related to experimental design in their courses and assessments.

Citation: Heim AB, Walsh C, Esparza D, Smith MK, Holmes NG (2022) What influences students’ abilities to critically evaluate scientific investigations? PLoS ONE 17(8): e0273337. https://doi.org/10.1371/journal.pone.0273337

Editor: Dragan Pamucar, University of Belgrade Faculty of Organisational Sciences: Univerzitet u Beogradu Fakultet organizacionih nauka, SERBIA

Received: December 3, 2021; Accepted: August 6, 2022; Published: August 30, 2022

Copyright: © 2022 Heim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All raw data files are available from the Cornell Institute for Social and Economic Research (CISER) data and reproduction archive ( https://archive.ciser.cornell.edu/studies/2881 ).

Funding: This work was supported by the National Science Foundation under grants DUE-1909602 (MS & NH) and DUE-1611482 (NH). NSF: nsf.gov The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Critical thinking and its importance.

Critical thinking, defined here as “the ways in which one uses data and evidence to make decisions about what to trust and what to do” [ 1 ], is a foundational learning goal for almost any undergraduate course and can be integrated in many points in the undergraduate curriculum. Beyond the classroom, critical thinking skills are important so that students are able to effectively evaluate data presented to them in a society where information is so readily accessible [ 2 , 3 ]. Furthermore, critical thinking is consistently ranked as one of the most necessary outcomes of post-secondary education for career advancement by employers [ 4 ]. In the workplace, those with critical thinking skills are more competitive because employers assume they can make evidence-based decisions based on multiple perspectives, keep an open mind, and acknowledge personal limitations [ 5 , 6 ]. Despite the importance of critical thinking skills, there are mixed recommendations on how to elicit and assess critical thinking during and as a result of instruction. In response, here we evaluate the degree to which different critical thinking questions elicit students’ critical thinking skills.

Assessing critical thinking in STEM

Across STEM (i.e., science, technology, engineering, and mathematics) disciplines, several standardized assessments probe critical thinking skills. These assessments focus on aspects of critical thinking and ask students to evaluate experimental methods [ 7 – 11 ], form hypotheses and make predictions [ 12 , 13 ], evaluate data [ 2 , 12 – 14 ], or draw conclusions based on a scenario or figure [ 2 , 12 – 14 ]. Many of these assessments are open-response, so they can be difficult to score, and several are not freely available.

In addition, there is an ongoing debate regarding whether critical thinking is a domain-general or context-specific skill. That is, can someone transfer their critical thinking skills from one domain or context to another (domain-general) or do their critical thinking skills only apply in their domain or context of expertise (context-specific)? Research on the effectiveness of teaching critical thinking has found mixed results, primarily due to a lack of consensus definition of and assessment tools for critical thinking [ 15 , 16 ]. Some argue that critical thinking is domain-general—or what Ennis refers to as the “general approach”—because it is an overlapping skill that people use in various aspects of their lives [ 17 ]. In contrast, others argue that critical thinking must be elicited in a context-specific domain, as prior knowledge is needed to make informed decisions in one’s discipline [ 18 , 19 ]. Current assessments include domain-general components [ 2 , 7 , 8 , 14 , 20 , 21 ], asking students to evaluate, for instance, experiments on the effectiveness of dietary supplements in athletes [ 20 ] and context-specific components, such as to measure students’ abilities to think critically in domains such as neuroscience [ 9 ] and biology [ 10 ].

Others maintain the view that critical thinking is a context-specific skill for the purpose of undergraduate education, but argue that it should be content accessible [ 22 – 24 ], as “thought processes are intertwined with what is being thought about” [ 23 ]. From this viewpoint, the context of the assessment would need to be embedded in a relatively accessible context to assess critical thinking independent of students’ content knowledge. Thus, to effectively elicit critical thinking among students, instructors should use assessments that present students with accessible domain-specific information needed to think deeply about the questions being asked [ 24 , 25 ].

Within the context of STEM, current critical thinking assessments primarily ask students to evaluate a single experimental scenario (e.g., [ 10 , 20 ]), though compare-and-contrast questions about more than one scenario can be a powerful way to elicit critical thinking [ 26 , 27 ]. Generally included in the “Analysis” level of Bloom’s taxonomy [ 28 – 30 ], compare-and-contrast questions encourage students to recognize, distinguish between, and relate features between scenarios and discern relevant patterns or trends, rather than compile lists of important features [ 26 ]. For example, a compare-and-contrast assessment may ask students to compare the hypotheses and research methods used in two different experimental scenarios, instead of having them evaluate the research methods of a single experiment. Alternatively, students may inherently recall and use experimental scenarios based on their prior experiences and knowledge as they evaluate an individual scenario. In addition, evaluating a single experimental scenario individually may act as metacognitive scaffolding [ 31 , 32 ]—a process which “guides students by asking questions about the task or suggesting relevant domain-independent strategies [ 32 ]—to support students in their compare-and-contrast thinking.

Purpose and research questions

Our primary objective of this study was to better understand what features of assessment questions elicit student critical thinking using two existing instruments in STEM: the Biology Lab Inventory of Critical Thinking in Ecology (Eco-BLIC) and Physics Lab Inventory of Critical Thinking (PLIC). We focused on biology and physics since critical thinking assessments were already available for these disciplines. Specifically, we investigated (a) how students critically evaluate aspects of research studies in biology and physics when they are individually evaluating one study at a time or comparing and contrasting two studies and (b) whether individual evaluation questions are needed to encourage students to engage in critical thinking when comparing and contrasting.

Providing undergraduates with ample opportunities to practice critical thinking skills in the classroom is necessary for evidence-based critical thinking in their future careers and everyday life. While most critical thinking instruments in biology and physics contexts have undergone some form of validation to ensure they are accurately measuring the intended construct, to our knowledge none have explored how different question types influence students’ critical thinking. This research offers new insight on the types of questions that elicit critical thinking, which can further be applied by educators and researchers across disciplines to measure cognitive student outcomes and incorporate more effective critical thinking opportunities in the classroom.

Ethics statement

The procedures for this study were approved by the Institutional Review Board of Cornell University (Eco-BLIC: #1904008779; PLIC: #1608006532). Informed consent was obtained by all participating students via online consent forms at the beginning of the study, and students did not receive compensation for participating in this study unless their instructor offered credit for completing the assessment.

Participants and assessment distribution

We administered the Eco-BLIC to undergraduate students across 26 courses at 11 institutions (six doctoral-granting, three Master’s-granting, and two Baccalaureate-granting) in Fall 2020 and Spring 2021 and received 1612 usable responses. Additionally, we administered the PLIC to undergraduate students across 21 courses at 11 institutions (six doctoral-granting, one Master’s-granting, three four-year colleges, and one 2-year college) in Fall 2020 and Spring 2021 and received 1839 usable responses. We recruited participants via convenience sampling by emailing instructors of primarily introductory ecology-focused courses or introductory physics courses who expressed potential interest in implementing our instrument in their course(s). Both instruments were administered online via Qualtrics and students were allowed to complete the assessments outside of class. The demographic distribution of the response data is presented in Table 1 , all of which were self-reported by students. The values presented in this table represent all responses we received.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0273337.t001

Instrument description

Question types..

Though the content and concepts featured in the Eco-BLIC and PLIC are distinct, both instruments share a similar structure and set of question types. The Eco-BLIC—which was developed using a structure similar to that of the PLIC [ 1 ]—includes two predator-prey scenarios based on relationships between (a) smallmouth bass and mayflies and (b) great-horned owls and house mice. Within each scenario, students are presented with a field-based study and a laboratory-based study focused on a common research question about feeding behaviors of smallmouth bass or house mice, respectively. The prompts for these two Eco-BLIC scenarios are available in S1 and S2 Appendices. The PLIC focuses on two research groups conducting different experiments to test the relationship between oscillation periods of masses hanging on springs [ 1 ]; the prompts for this scenario can be found in S3 Appendix . The descriptive prompts in both the Eco-BLIC and PLIC also include a figure presenting data collected by each research group, from which students are expected to draw conclusions. The research scenarios (e.g., field-based group and lab-based group on the Eco-BLIC) are written so that each group has both strengths and weaknesses in their experimental designs.

After reading the prompt for the first experimental group (Group 1) in each instrument, students are asked to identify possible claims from Group 1’s data (data evaluation questions). Students next evaluate the strengths and weaknesses of various study features for Group 1 (individual evaluation questions). Examples of these individual evaluation questions are in Table 2 . They then suggest next steps the group should pursue (next steps items). Students are then asked to read about the prompt describing the second experimental group’s study (Group 2) and again answer questions about the possible claims, strengths and weaknesses, and next steps of Group 2’s study (data evaluation questions, individual evaluation questions, and next steps items). Once students have independently evaluated Groups 1 and 2, they answer a series of questions to compare the study approaches of Group 1 versus Group 2 (group comparison items). In this study, we focus our analysis on the individual evaluation questions and group comparison items.

thumbnail

https://doi.org/10.1371/journal.pone.0273337.t002

Instrument versions.

To determine whether the individual evaluation questions impacted the assessment of students’ critical thinking, students were randomly assigned to take one of two versions of the assessment via Qualtrics branch logic: 1) a version that included the individual evaluation and group comparison items or 2) a version with only the group comparison items, with the individual evaluation questions removed. We calculated the median time it took students to answer each of these versions for both the Eco-BLIC and PLIC.

Think-aloud interviews.

We also conducted one-on-one think-aloud interviews with students to elicit feedback on the assessment questions (Eco-BLIC n = 21; PLIC n = 4). Students were recruited via convenience sampling at our home institution and were primarily majoring in biology or physics. All interviews were audio-recorded and screen captured via Zoom and lasted approximately 30–60 minutes. We asked participants to discuss their reasoning for answering each question as they progressed through the instrument. We did not analyze these interviews in detail, but rather used them to extract relevant examples of critical thinking that helped to explain our quantitative findings. Multiple think-aloud interviews were conducted with students using previous versions of the PLIC [ 1 ], though these data are not discussed here.

Data analyses.

Our analyses focused on (1) investigating the alignment between students’ responses to the individual evaluation questions and the group comparison items and (2) comparing student responses between the two instrument versions. If individual evaluation and group comparison items elicit critical thinking in the same way, we would expect to see the same frequency of responses for each question type, as per Fig 1 . For example, if students evaluated one study feature of Group 1 as a strength and the same study feature for Group 2 as a strength, we would expect that students would respond that both groups were highly effective for this study feature on the group comparison item (i.e., data represented by the purple circle in the top right quadrant of Fig 1 ). Alternatively, if students evaluated one study feature of Group 1 as a strength and the same study feature for Group 2 as a weakness, we would expect that students would indicate that Group 1 was more effective than Group 2 on the group comparison item (i.e., data represented by the green circle in the lower right quadrant of Fig 1 ).

thumbnail

The x- and y-axes represent rankings on the individual evaluation questions for Groups 1 and 2 (or field and lab groups), respectively. The colors in the legend at the top of the figure denote responses to the group comparison items. In this idealized example, all pie charts are the same size to indicate that the student answers are equally proportioned across all answer combinations.

https://doi.org/10.1371/journal.pone.0273337.g001

We ran descriptive statistics to summarize student responses to questions and examine distributions and frequencies of the data on the Eco-BLIC and PLIC. We also conducted chi-square goodness-of-fit tests to analyze differences in student responses between versions within the relevant questions from the same instrument. In all of these tests, we used a Bonferroni correction to lower the chances of receiving a false positive and account for multiple comparisons. We generated figures—primarily multi-pie chart graphs and heat maps—to visualize differences between individual evaluation and group comparison items and between versions of each instrument with and without individual evaluation questions, respectively. All aforementioned data analyses and figures were conducted or generated in the R statistical computing environment (v. 4.1.1) and Microsoft Excel.

We asked students to evaluate different experimental set-ups on the Eco-BLIC and PLIC two ways. Students first evaluated the strengths and weaknesses of study features for each scenario individually (individual evaluation questions, Table 2 ) and, subsequently, answered a series of questions to compare and contrast the study approaches of both research groups side-by-side (group comparison items, Table 2 ). Through analyzing the individual evaluation questions, we found that students generally ranked experimental features (i.e., those related to study set-up, data collection and summary methods, and analysis and outcomes) of the independent research groups as strengths ( Fig 2 ), evidenced by the mean scores greater than 2 on a scale from 1 (weakness) to 4 (strength).

thumbnail

Each box represents the interquartile range (IQR). Lines within each box represent the median. Circles represent outliers of mean scores for each question.

https://doi.org/10.1371/journal.pone.0273337.g002

Individual evaluation versus compare-and-contrast evaluation

Our results indicate that when students consider Group 1 or Group 2 individually, they mark most study features as strengths (consistent with the means in Fig 2 ), shown by the large circles in the upper right quadrant across the three experimental scenarios ( Fig 3 ). However, the proportion of colors on each pie chart shows that students select a range of responses when comparing the two groups [e.g., Group 1 being more effective (green), Group 2 being more effective (blue), both groups being effective (purple), and neither group being effective (orange)]. We infer that students were more discerning (i.e., more selective) when they were asked to compare the two groups across the various study features ( Fig 3 ). In short, students think about the groups differently if they are rating either Group 1 or Group 2 in the individual evaluation questions versus directly comparing Group 1 to Group 2.

thumbnail

The x- and y-axes represent students’ rankings on the individual evaluation questions for Groups 1 and 2 on each assessment, respectively, where 1 indicates weakness and 4 indicates strength. The overall size of each pie chart represents the proportion of students who responded with each pair of ratings. The colors in the pie charts denote the proportion of students’ responses who chose each option on the group comparison items. (A) Eco-BLIC bass-mayfly scenario (B) Eco-BLIC owl-mouse scenario (C) PLIC oscillation periods of masses hanging on springs scenario.

https://doi.org/10.1371/journal.pone.0273337.g003

These results are further supported by student responses from the think-aloud interviews. For example, one interview participant responding to the bass-mayfly scenario of the Eco-BLIC explained that accounting for bias/error in both the field and lab groups in this scenario was a strength (i.e., 4). This participant mentioned that Group 1, who performed the experiment in the field, “[had] outliers, so they must have done pretty well,” and that Group 2, who collected organisms in the field but studied them in lab, “did a good job of accounting for bias.” However, when asked to compare between the groups, this student argued that Group 2 was more effective at accounting for bias/error, noting that “they controlled for more variables.”

Another individual who was evaluating “repeated trials for each mass” in the PLIC expressed a similar pattern. In response to ranking this feature of Group 1 as a strength, they explained: “Given their uncertainties and how small they are, [the group] seems like they’ve covered their bases pretty well.” Similarly, they evaluated this feature of Group 2 as a strength as well, simply noting: “Same as the last [group], I think it’s a strength.” However, when asked to compare between Groups 1 and 2, this individual argued that Group 1 was more effective because they conducted more trials.

Individual evaluation questions to support compare and contrast thinking

Given that students were more discerning when they directly compared two groups for both biology and physics experimental scenarios, we next sought to determine if the individual evaluation questions for Group 1 or Group 2 were necessary to elicit or helpful to support student critical thinking about the investigations. To test this, students were randomly assigned to one of two versions of the instrument. Students in one version saw individual evaluation questions about Group 1 and Group 2 and then saw group comparison items for Group 1 versus Group 2. Students in the second version only saw the group comparison items. We found that students assigned to both versions responded similarly to the group comparison questions, indicating that the individual evaluation questions did not promote additional critical thinking. We visually represent these similarities across versions with and without the individual evaluation questions in Fig 4 as heat maps.

thumbnail

The x-axis denotes students’ responses on the group comparison items (i.e., whether they ranked Group 1 as more effective, Group 2 as more effective, both groups as highly effective, or neither group as effective/both groups were minimally effective). The y-axis lists each of the study features that students compared between the field and lab groups. White and lighter shades of red indicate a lower percentage of student responses, while brighter red indicates a higher percentage of student responses. (A) Eco-BLIC bass-mayfly scenario. (B) Eco-BLIC owl-mouse scenario. (C) PLIC oscillation periods of masses hanging on springs scenario.

https://doi.org/10.1371/journal.pone.0273337.g004

We ran chi-square goodness-of-fit tests on the answers between student responses on both instrument versions and there were no significant differences on the Eco-BLIC bass-mayfly scenario ( Fig 4A ; based on an adjusted p -value of 0.006) or owl-mouse questions ( Fig 4B ; based on an adjusted p-value of 0.004). There were only three significant differences (out of 53 items) in how students responded to questions on both versions of the PLIC ( Fig 4C ; based on an adjusted p -value of 0.0005). The items that students responded to differently ( p <0.0005) across both versions were items where the two groups were identical in their design; namely, the equipment used (i.e., stopwatches), the variables measured (i.e., time and mass), and the number of bounces of the spring per trial (i.e., five bounces). We calculated Cramer’s C (Vc; [ 33 ]), a measure commonly applied to Chi-square goodness of fit models to understand the magnitude of significant results. We found that the effect sizes for these three items were small (Vc = 0.11, Vc = 0.10, Vc = 0.06, respectively).

The trend that students answer the Group 1 versus Group 2 comparison questions similarly, regardless of whether they responded to the individual evaluation questions, is further supported by student responses from the think-aloud interviews. For example, one participant who did not see the individual evaluation questions for the owl-mouse scenario of the Eco-BLIC independently explained that sampling mice from other fields was a strength for both the lab and field groups. They explained that for the lab group, “I think that [the mice] coming from multiple nearby fields is good…I was curious if [mouse] behavior was universal.” For the field group, they reasoned, “I also noticed it was just from a single nearby field…I thought that was good for control.” However, this individual ultimately reasoned that the field group was “more effective for sampling methods…it’s better to have them from a single field because you know they were exposed to similar environments.” Thus, even without individual evaluation questions available, students can still make individual evaluations when comparing and contrasting between groups.

We also determined that removing the individual evaluation questions decreased the duration of time students needed to complete the Eco-BLIC and PLIC. On the Eco-BLIC, the median time to completion for the version with individual evaluation and group comparison questions was approximately 30 minutes, while the version with only the group comparisons had a median time to completion of 18 minutes. On the PLIC, the median time to completion for the version with individual evaluation questions and group comparison questions was approximately 17 minutes, while the version with only the group comparisons had a median time to completion of 15 minutes.

To determine how to elicit critical thinking in a streamlined manner using introductory biology and physics material, we investigated (a) how students critically evaluate aspects of experimental investigations in biology and physics when they are individually evaluating one study at a time versus comparing and contrasting two and (b) whether individual evaluation questions are needed to encourage students to engage in critical thinking when comparing and contrasting.

Students are more discerning when making comparisons

We found that students were more discerning when comparing between the two groups in the Eco-BLIC and PLIC rather than when evaluating each group individually. While students tended to independently evaluate study features of each group as strengths ( Fig 2 ), there was greater variation in their responses to which group was more effective when directly comparing between the two groups ( Fig 3 ). Literature evaluating the role of contrasting cases provides plausible explanations for our results. In that work, contrasting between two cases supports students in identifying deep features of the cases, compared with evaluating one case after the other [ 34 – 37 ]. When presented with a single example, students may deem certain study features as unimportant or irrelevant, but comparing study features side-by-side allows students to recognize the distinct features of each case [ 38 ]. We infer, therefore, that students were better able to recognize the strengths and weaknesses of the two groups in each of the assessment scenarios when evaluating the groups side by side, rather than in isolation [ 39 , 40 ]. This result is somewhat surprising, however, as students could have used their knowledge of experimental designs as a contrasting case when evaluating each group. Future work, therefore, should evaluate whether experts use their vast knowledge base of experimental studies as discerning contrasts when evaluating each group individually. This work would help determine whether our results here suggest that students do not have a sufficient experiment-base to use as contrasts or if the students just do not use their experiment-base when evaluating the individual groups. Regardless, our study suggests that critical thinking assessments should ask students to compare and contrast experimental scenarios, rather than just evaluate individual cases.

Individual evaluation questions do not influence answers to compare and contrast questions

We found that individual evaluation questions were unnecessary for eliciting or supporting students’ critical thinking on the two assessments. Students responded to the group comparison items similarly whether or not they had received the individual evaluation questions. The exception to this pattern was that students responded differently to three group comparison items on the PLIC when individual evaluation questions were provided. These three questions constituted a small portion of the PLIC and showed a small effect size. Furthermore, removing the individual evaluation questions decreased the median time for students to complete the Eco-BLIC and PLIC. It is plausible that spending more time thinking about the experimental methods while responding to the individual evaluation questions would then prepare students to be better discerners on the group comparison questions. However, the overall trend is that individual evaluation questions do not have a strong impact on how students evaluate experimental scenarios, nor do they set students up to be better critical thinkers later. This finding aligns with prior research suggesting that students tend to disregard details when they evaluate a single case, rather than comparing and contrasting multiple cases [ 38 ], further supporting our findings about the effectiveness of the group comparison questions.

Practical implications

Individual evaluation questions were not effective for students to engage in critical thinking nor to prepare them for subsequent questions that elicit their critical thinking. Thus, researchers and instructors could make critical thinking assessments more effective and less time-consuming by encouraging comparisons between cases. Additionally, the study raises a question about whether instruction should incorporate more experimental case studies throughout their courses and assessments so that students have a richer experiment-base to use as contrasts when evaluating individual experimental scenarios. To help students discern information about experimental design, we suggest that instructors consider providing them with multiple experimental studies (i.e., cases) and asking them to compare and contrast between these studies.

Future directions and limitations

When designing critical thinking assessments, questions should ask students to make meaningful comparisons that require them to consider the important features of the scenarios. One challenge of relying on compare-and-contrast questions in the Eco-BLIC and PLIC to elicit students’ critical thinking is ensuring that students are comparing similar yet distinct study features across experimental scenarios, and that these comparisons are meaningful [ 38 ]. For example, though sample size is different between experimental scenarios in our instruments, it is a significant feature that has implications for other aspects of the research like statistical analyses and behaviors of the animals. Therefore, one limitation of our study could be that we exclusively focused on experimental method evaluation questions (i.e., what to trust), and we are unsure if the same principles hold for other dimensions of critical thinking (i.e., what to do). Future research should explore whether questions that are not in a compare-and-contrast format also effectively elicit critical thinking, and if so, to what degree.

As our question schema in the Eco-BLIC and PLIC were designed for introductory biology and physics content, it is unknown how effective this question schema would be for upper-division biology and physics undergraduates who we would expect to have more content knowledge and prior experiences for making comparisons in their respective disciplines [ 18 , 41 ]. For example, are compare-and-contrast questions still needed to elicit critical thinking among upper-division students, or would critical thinking in this population be more effectively assessed by incorporating more sophisticated data analyses in the research scenarios? Also, if students with more expert-like thinking have a richer set of experimental scenarios to inherently use as contrasts when comparing, we might expect their responses on the individual evaluation questions and group comparisons to better align. To further examine how accessible and context-specific the Eco-BLIC and PLIC are, novel scenarios could be developed that incorporate topics and concepts more commonly addressed in upper-division courses. Additionally, if instructors offer students more experience comparing and contrasting experimental scenarios in the classroom, would students be more discerning on the individual evaluation questions?

While a single consensus definition of critical thinking does not currently exist [ 15 ], continuing to explore critical thinking in other STEM disciplines beyond biology and physics may offer more insight into the context-specific nature of critical thinking [ 22 , 23 ]. Future studies should investigate critical thinking patterns in other STEM disciplines (e.g., mathematics, engineering, chemistry) through designing assessments that encourage students to evaluate aspects of at least two experimental studies. As undergraduates are often enrolled in multiple courses simultaneously and thus have domain-specific knowledge in STEM, would we observe similar patterns in critical thinking across additional STEM disciplines?

Lastly, we want to emphasize that we cannot infer every aspect of critical thinking from students’ responses on the Eco-BLIC and PLIC. However, we suggest that student responses on the think-aloud interviews provide additional qualitative insight into how and why students were making comparisons in each scenario and their overall critical thinking processes.

Conclusions

Overall, we found that comparing and contrasting two different experiments is an effective and efficient way to elicit context-specific critical thinking in introductory biology and physics undergraduates using the Eco-BLIC and the PLIC. Students are more discerning (i.e., critical) and engage more deeply with the scenarios when making comparisons between two groups. Further, students do not evaluate features of experimental studies differently when individual evaluation questions are provided or removed. These novel findings hold true across both introductory biology and physics, based on student responses on the Eco-BLIC and PLIC, respectively—though there is much more to explore regarding critical thinking processes of students across other STEM disciplines and in more advanced stages of their education. Undergraduate students in STEM need to be able to critically think for career advancement, and the Eco-BLIC and PLIC are two means of measuring students’ critical thinking in biology and physics experimental contexts via comparing and contrasting. This research offers new insight on the types of questions that elicit critical thinking, which can further be applied by educators and researchers across disciplines to teach and measure cognitive student outcomes. Specifically, we recommend instructors incorporate more compare-and-contrast questions related to experimental design in their courses to efficiently elicit undergraduates’ critical thinking.

Supporting information

S1 appendix. eco-blic bass-mayfly scenario prompt..

https://doi.org/10.1371/journal.pone.0273337.s001

S2 Appendix. Eco-BLIC owl-mouse scenario prompt.

https://doi.org/10.1371/journal.pone.0273337.s002

S3 Appendix. PLIC scenario prompt.

https://doi.org/10.1371/journal.pone.0273337.s003

Acknowledgments

We thank the members of the Cornell Discipline-based Education Research group for their feedback on this article, as well as our advisory board (Jenny Knight, Meghan Duffy, Luanna Prevost, and James Hewlett) and the AAALab for their ideas and suggestions. We also greatly appreciate the instructors who shared the Eco-BLIC and PLIC in their classes and the students who participated in this study.

  • View Article
  • Google Scholar
  • 2. Stein B, Haynes A, Redding M, Ennis T, Cecil M. Assessing critical thinking in STEM and beyond. In: Innovations in e-learning, instruction technology, assessment, and engineering education. Dordrecht, Netherlands: Springer; 2007. pp. 79–82.
  • PubMed/NCBI
  • 19. Carmichael M, Reid A, Karpicke JD. Assessing the impact of educational video on student engagement, critical thinking and learning. Sage Publishing. 2018. Retrieved from: https://au.sagepub.com/en-gb/oce/press/what-impact-does-videohave-on-student-engagement .
  • 26. Krishna Rao MR. Infusing critical thinking skills into content of AI course. In: Proceedings of the 10th annual SIGCSE conference on Innovation and technology in computer science education; 2005 Jun 27. pp. 173–177.
  • 28. Bloom BS. Taxonomy of educational objectives. Vol. 1: Cognitive domain. New York, NY: McKay; 1956.
  • 33. Cramer H. Mathematical methods of statistics. Princeton, NJ: Princeton University Press; 1946.
  • 38. Schwartz DL, Tsang JM, Blair KP. The ABCs of how we learn: 26 scientifically proven approaches, how they work, and when to use them. New York, NY: WW Norton & Company; 2016 Jul 26.
  • 41. Szenes E, Tilakaratna N, Maton K. The knowledge practices of critical thinking. In: The Palgrave handbook of critical thinking in higher education. Palgrave Macmillan, New York; 2015. pp. 573–579.

Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

critical thinking experimental design

  • > Critical Thinking in Psychology
  • > Evaluating Experimental Research

critical thinking experimental design

Book contents

  • Frontmatter
  • List of Illustrations and Tables
  • List of Contributors
  • 1 The Nature and Nurture of Critical Thinking
  • 2 Evaluating Experimental Research
  • 3 Critical Thinking in Quasi-Experimentation
  • 4 Evaluating Surveys and Questionnaires
  • 5 Critical Thinking in Designing and Analyzing Research
  • 6 The Case Study Perspective on Psychological Research
  • 7 Informal Logical Fallacies
  • 8 Designing Studies to Avoid Confounds
  • 9 Evaluating Theories
  • 10 Not All Experiments Are Created Equal
  • 11 Making Claims in Papers and Talks
  • 12 Critical Thinking in Clinical Inference
  • 13 Evaluating Parapsychological Claims
  • 14 Why Would Anyone Do or Believe Such a Thing?
  • 15 The Belief Machine
  • 16 Critical Thinking and Ethics in Psychology
  • 17 Critical Thinking in Psychology
  • Author Index
  • Subject Index

2 - Evaluating Experimental Research

Critical Issues

Published online by Cambridge University Press:  05 June 2012

[T]he application of the experimental method to the problem of mind is the great outstanding event in the study of the mind, an event to which no other is comparable.

The author of this quote is Edwin G. Boring (1886–1968), one of the great psychologists of the 20th century and author of A History of Experimental Psychology (1929; the quote comes from p. 659). Contemporary psychologists take “the psychology experiment” as a given, but it is actually a relatively recent cultural invention. Although fascination with human behavior is doubtless as old as the emergence of Homo sapiens , the application of experimental methods to the study of the human mind and behavior is only 150 or so years old. Scientific methods, with heavy reliance on experimental technique, arose in Western civilization during the time of the Renaissance, when great insights and modes of thoughts from the ancient Greek, Roman, and Arab civilizations were rediscovered. The 17th century witnessed the great discoveries of Kepler, Galileo, and Newton in the physical world. Interest in chemistry and biology arose after the early development of physics. Experimental physiology arose as a discipline in the late 1700s and early 1800s. Still, despite great advances in these fields and despite the fact that scientists of the day usually conducted research in many different fields, no one at that time performed experiments studying humans or their mental life. The first physiologists and anatomists mostly contented themselves with the study of corpses.

Access options

Save book to kindle.

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service .

  • Evaluating Experimental Research
  • By Henry L. Roediger III , Washington University in St. Louis, David P. McCabe , Washington University in St. Louis
  • Edited by Robert J. Sternberg , Yale University, Connecticut , Henry L. Roediger III , Washington University, St Louis , Diane F. Halpern , Claremont McKenna College, California
  • Book: Critical Thinking in Psychology
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511804632.003

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Training doctoral students in critical thinking and experimental design using problem-based learning

Affiliations.

  • 1 Department of Biochemistry and Molecular Medicine, West Virginia University School of Medicine, Robert C. Byrd Health Sciences Center 64 Medical Center Drive, P.O. Box 9142, Morgantown, WV, 26506, USA. [email protected].
  • 2 Department of Biochemistry and Molecular Medicine, West Virginia University School of Medicine, Robert C. Byrd Health Sciences Center 64 Medical Center Drive, P.O. Box 9142, Morgantown, WV, 26506, USA.
  • PMID: 37587476
  • PMCID: PMC10428545
  • DOI: 10.1186/s12909-023-04569-7

Background: Traditionally, doctoral student education in the biomedical sciences relies on didactic coursework to build a foundation of scientific knowledge and an apprenticeship model of training in the laboratory of an established investigator. Recent recommendations for revision of graduate training include the utilization of graduate student competencies to assess progress and the introduction of novel curricula focused on development of skills, rather than accumulation of facts. Evidence demonstrates that active learning approaches are effective. Several facets of active learning are components of problem-based learning (PBL), which is a teaching modality where student learning is self-directed toward solving problems in a relevant context. These concepts were combined and incorporated in creating a new introductory graduate course designed to develop scientific skills (student competencies) in matriculating doctoral students using a PBL format.

Methods: Evaluation of course effectiveness was measured using the principals of the Kirkpatrick Four Level Model of Evaluation. At the end of each course offering, students completed evaluation surveys on the course and instructors to assess their perceptions of training effectiveness. Pre- and post-tests assessing students' proficiency in experimental design were used to measure student learning.

Results: The analysis of the outcomes of the course suggests the training is effective in improving experimental design. The course was well received by the students as measured by student evaluations (Kirkpatrick Model Level 1). Improved scores on post-tests indicate that the students learned from the experience (Kirkpatrick Model Level 2). A template is provided for the implementation of similar courses at other institutions.

Conclusions: This problem-based learning course appears effective in training newly matriculated graduate students in the required skills for designing experiments to test specific hypotheses, enhancing student preparation prior to initiation of their dissertation research.

Keywords: Critical thinking; Doctoral Student; Experimental design; Graduate; Problem-based learning; Training.

© 2023. BioMed Central Ltd., part of Springer Nature.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Similar articles

  • The Benefits of Using Case Study Focussed, Problem Based Learning Approaches to Unit Design for Biomedical Science Students. Posner MG, Dempsey NC, Unsworth AJ. Posner MG, et al. Br J Biomed Sci. 2023 Jun 29;80:11494. doi: 10.3389/bjbs.2023.11494. eCollection 2023. Br J Biomed Sci. 2023. PMID: 37457621 Free PMC article.
  • Introducing conflict resolution and negotiation training into a biomedical sciences graduate curriculum. Schaller MD, Gatesman-Ammer A. Schaller MD, et al. BMC Med Educ. 2022 Jun 2;22(1):419. doi: 10.1186/s12909-022-03494-5. BMC Med Educ. 2022. PMID: 35650575 Free PMC article.
  • Learning design in science education: perspectives from designing a graduate-level course in evidence-based teaching of science. Downey RM, Downey KB, Jacobs J, Korthas H, Melchor GS, Speidell A, Waguespack H, Mulroney SE, Myers AK. Downey RM, et al. Adv Physiol Educ. 2022 Dec 1;46(4):651-657. doi: 10.1152/advan.00069.2022. Epub 2022 Sep 29. Adv Physiol Educ. 2022. PMID: 36173341 Free PMC article.
  • A systematic review of selected evidence on developing nursing students' critical thinking through problem-based learning. Yuan H, Williams BA, Fan L. Yuan H, et al. Nurse Educ Today. 2008 Aug;28(6):657-663. doi: 10.1016/j.nedt.2007.12.006. Epub 2008 Feb 11. Nurse Educ Today. 2008. PMID: 18267348 Review.
  • Problem based learning in radiography education: A narrative review. Lawal O, Ramlaul A, Murphy F. Lawal O, et al. Radiography (Lond). 2021 May;27(2):727-732. doi: 10.1016/j.radi.2020.11.001. Epub 2020 Nov 19. Radiography (Lond). 2021. PMID: 33223417 Review.
  • National Institutes of Health . Biomedical research workforce working group report. Bethesda, MD: National Institutes of Health; 2012.
  • Sinche M, Layton RL, Brandt PD, O’Connell AB, Hall JD, Freeman AM, Harrell JR, Cook JG, Brennwald PJ. An evidence-based evaluation of transferrable skills and job satisfaction for science PhDs. PLoS ONE. 2017;12:e0185023. - PMC - PubMed
  • Ghaffarzadegan N, Hawley J, Larson R, Xue Y. A note on PhD Population Growth in Biomedical Sciences. Syst Res Behav Sci. 2015;23:402–5. - PMC - PubMed
  • National Academies of Sciences Engineering and Medicine . The next generation of biomedical and behavioral sciences researchers: breaking through. Washington, DC: National Academies Press (US); 2018. - PubMed
  • National Academies of Sciences Engineering and Medicine . Graduate STEM education for the 21st century. Washington, DC: National Academies Press; 2018.
  • Search in MeSH

Related information

Grants and funding.

  • T32 GM133369/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full text sources.

  • BioMed Central
  • Europe PubMed Central
  • PubMed Central

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • BMC Med Educ
  • PMC10428545

Logo of bmcmedu

Training doctoral students in critical thinking and experimental design using problem-based learning

Michael d. schaller.

Department of Biochemistry and Molecular Medicine, West Virginia University School of Medicine, Robert C. Byrd Health Sciences Center 64 Medical Center Drive, P.O. Box 9142, Morgantown, WV 26506 USA

Marieta Gencheva

Michael r. gunther, scott a. weed, associated data.

All data generated in this study are included in this published article and its supplementary information files.

Traditionally, doctoral student education in the biomedical sciences relies on didactic coursework to build a foundation of scientific knowledge and an apprenticeship model of training in the laboratory of an established investigator. Recent recommendations for revision of graduate training include the utilization of graduate student competencies to assess progress and the introduction of novel curricula focused on development of skills, rather than accumulation of facts. Evidence demonstrates that active learning approaches are effective. Several facets of active learning are components of problem-based learning (PBL), which is a teaching modality where student learning is self-directed toward solving problems in a relevant context. These concepts were combined and incorporated in creating a new introductory graduate course designed to develop scientific skills (student competencies) in matriculating doctoral students using a PBL format.

Evaluation of course effectiveness was measured using the principals of the Kirkpatrick Four Level Model of Evaluation. At the end of each course offering, students completed evaluation surveys on the course and instructors to assess their perceptions of training effectiveness. Pre- and post-tests assessing students’ proficiency in experimental design were used to measure student learning.

The analysis of the outcomes of the course suggests the training is effective in improving experimental design. The course was well received by the students as measured by student evaluations (Kirkpatrick Model Level 1). Improved scores on post-tests indicate that the students learned from the experience (Kirkpatrick Model Level 2). A template is provided for the implementation of similar courses at other institutions.

Conclusions

This problem-based learning course appears effective in training newly matriculated graduate students in the required skills for designing experiments to test specific hypotheses, enhancing student preparation prior to initiation of their dissertation research.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12909-023-04569-7.

Introduction

For over a decade there have been calls to reform biomedical graduate education. There are two main problems that led to these recommendations and therefore two different prescriptions to solve these problems. The first major issue is the pursuit of non-traditional (non-academic) careers by doctorates and concerns of adequate training [ 1 , 2 ]. The underlying factors affecting career outcomes are the number of PhDs produced relative to the number of available academic positions [ 1 , 3 – 5 ], and the changing career interests of doctoral students [ 6 – 9 ]. One aspect in the proposed reformation to address this problem is incorporation of broader professional skills training and creating awareness of a greater diversity of careers into the graduate curriculum [ 1 , 4 , 5 ]. The second issue relates to the curricula content and whether content knowledge or critical scientific skills should be the core of the curriculum [ 10 , 11 ]. The proposed reformation to address this issue is creation of curricula focusing upon scientific skills, e.g. reasoning, experimental design and communication, while simultaneously reducing components of the curricula that build a foundational knowledge base [ 12 , 13 ]. Components of these two approaches are not mutually exclusive, where incorporation of select specialized expertise in each area has the potential to concurrently address both issues. Here we describe the development, implementation and evaluation of a new problem-based learning (PBL)-based graduate course that provides an initial experience in introducing the scientific career-relevant core competencies of critical thinking and experimental design to incoming biomedical doctoral students. The purpose of this course is to address these issues by creating a vehicle to develop professional skills (communication) and critical scientific skills (critical thinking and experimental design) for first year graduate students.

One approach that prioritizes the aggregate scientific skill set required for adept biomedical doctorates is the development of core competencies for doctoral students [ 5 , 14 , 15 ], akin to set milestones that must be met by medical residents and fellows [ 16 ]. Key features of these competencies include general and field-specific scientific knowledge, critical thinking, experimental design, evaluation of outcomes, scientific rigor, ability to work in teams, responsible conduct of research, and effective communication [ 5 , 14 , 15 ]. Such competencies provide clear benchmarks to evaluate the progress of doctoral students’ development into an independent scientific professional and preparedness for the next career stage. Historically, graduate programs relied on traditional content-based courses and supervised apprenticeship in the mentor’s laboratory to develop such competencies. An alternative to this approach is to modify the graduate student curriculum to provide a foundation for these competencies early in the curriculum in a more structured way. This would provide a base upon which additional coursework and supervised dissertation research could build to develop competencies in doctoral students.

Analyses of how doctoral students learn scientific skills suggest a threshold model, where different skillsets are mastered (a threshold reached), before subsequent skillsets can be mastered [ 17 , 18 ]. Skills like using the primary literature, experimental design and placing studies in context are earlier thresholds than identifying alternatives, limitations and data analysis [ 18 ]. Timmerman et al. recommend revision of graduate curricula to sequentially build toward these thresholds using evidence-based approaches [ 18 ]. Several recent curricular modifications are aligned with these recommendations. One program, as cited above, offers courses to develop critical scientific skills early in the curriculum with content knowledge provided in later courses [ 12 , 13 ]. A second program has built training in experimental design into the coursework in the first semester of the curriculum. Improvements in students experimental design skills and an increase in self-efficacy in experimental design occurred over the course of the semester [ 19 ]. Other programs have introduced exercises into courses and workshops to develop experimental design skills using active learning. One program developed interactive sessions on experimental design, where students give chalk talks about an experimental plan to address a problem related to course content and respond to challenges from their peers [ 20 ]. Another program has developed a workshop drawing upon principles from design thinking to build problem solving skills and creativity, and primarily uses active learning and experiential learning approaches [ 21 ]. While these programs are well received by students, the outcomes of training have not been reported. Similar undergraduate curricula that utilize literature review with an emphasis on scientific thought and methods report increased performance in critical thinking, scientific reasoning and experimental design [ 22 , 23 ].

It is notable that the changes these examples incorporate into the curriculum are accompanied with a shift from didactic teaching to active learning. Many studies have demonstrated that active learning is more effective than a conventional didactic curriculum in STEM education [ 24 ]. Problem-based learning (PBL) is one active learning platform that the relatively new graduate program at the Van Andel Institute Graduate School utilizes for delivery of the formal curriculum [ 25 ]. First developed for medical students [ 26 ], the PBL learning approach has been adopted in other educational settings, including K-12 and undergraduate education [ 27 , 28 ]. A basic tenet of PBL is that student learning is self-directed [ 26 ]. Students are tasked to solve an assigned problem and are required to find the information necessary for the solution (self-directed). In practice, learning occurs in small groups where a faculty facilitator helps guide the students in identifying gaps in knowledge that require additional study [ 29 ]. As such, an ideal PBL course is “well organized” but “poorly structured”. The lack of a traditional restrictive structure allows students to pursue and evaluate different solutions to the problem.

The premise for PBL is that actively engaging in problem solving enhances learning in several ways [ 29 , 30 ]. First, activation of prior knowledge, as occurs in group discussions, aids in learning by providing a framework to incorporate new knowledge. Second, deep processing of material while learning, e.g. by answering questions or using the knowledge, enhances the ability to later recall key concepts. Third, learning in context, e.g. learning the scientific basis for clinical problems in the context of clinical cases, enables and improves recall. These are all effective strategies to enhance learning [ 31 ]. PBL opponents argue that acquisition of knowledge is more effective in a traditional didactic curriculum. Further, development of critical thinking skills requires the requisite foundational knowledge to develop realistic solutions to problems [ 32 ].

A comprehensive review of PBL outcomes from K-12 through medical school indicated that PBL students perform better in the application of knowledge and reasoning, but not in other areas like basic knowledge [ 33 ]. Other recent meta-analyses support the conclusion that PBL, project-based learning and other small group teaching modalities are effective in education from primary school to university, including undergraduate courses in engineering and technology, and pharmacology courses for professional students in health sciences [ 34 – 39 ]. While the majority of the studies reported in these meta-analyses demonstrate that PBL results in better academic performance, there are contrasting studies that demonstrate that PBL is ineffective. This prompts additional investigation to determine the salient factors that distinguish the two outcomes to establish best practices for better results using the PBL platform. Although few studies report the outcomes of PBL based approaches in graduate education, this platform may be beneficial in training biomedical science doctoral students for developing and enhancing critical thinking and practical problem-solving skills.

At our institution, biomedical doctoral students enter an umbrella program and take a core curriculum in the first semester prior to matriculating into one of seven biomedical sciences doctoral programs across a wide range of scientific disciplines in the second semester. Such program diversity created difficulty in achieving consensus on the necessary scientific foundational knowledge for a core curriculum. Common ground was achieved during a recent curriculum revision through the development of required core competencies for all students, regardless of field of study. These competencies and milestones for biomedical science students at other institutions [ 5 , 14 , 15 ], along with nontraditional approaches to graduate education [ 12 , 25 ], were used as guidelines for curriculum modification.

Course design

A course was created to develop competencies required by all biomedical sciences doctoral students regardless of their program of interest [ 14 ]. As an introductory graduate level course, this met the needs of all our seven diverse biomedical sciences doctoral programs where our first-year doctoral students matriculate. A PBL platform was chosen for the course to engage the students in an active learning environment [ 25 ]. The process of problem solving in small teams provided the students with experience in establishing working relationships and how to operate in teams. The students gained experience in researching material from the literature to establish scientific background, find current and appropriate experimental approaches and examples of how results are analyzed. This small group approach allowed each team to develop different hypotheses, experimental plans and analyses based upon the overall interests of the group. The course was designed following discussions with faculty experienced in medical and pharmacy school PBL, and considering course design principles from the literature [ 27 , 40 ]. The broad learning goals are similar to the overall objectives in another doctoral program using PBL as the primary course format [ 25 ], and are aligned with recommended core competencies for PhD scientists [ 14 ]. These goals are to:

  • Develop broad, general scientific knowledge (core competency 1 [ 14 ]).
  • Develop familiarity with technical approaches specific to each problem.
  • including: formulation of hypotheses, detailed experimental design, interpretation of data, statistical analysis (core competencies 3 and 4 [ 14 ]).
  • Practice communication skills: written and verbal communication skills (core competency 8 [ 14 ]).
  • Develop collaboration and team skills (core competency 6 [ 14 ]).
  • Practice using the literature.

Students were organized into groups of four or five based on their scientific background. Student expertise in each group was deliberately mixed to provide different viewpoints during discussion. A single faculty facilitator was assigned to each student group, which met formally in 13 separate sessions (Appendix II). In preparation for each session, the students independently researched topics using the literature (related to goal 6) and met informally without facilitator oversight to coordinate their findings and organize the discussion for each class session. During the formal one-hour session, one student served as the group leader to manage the discussion. The faculty facilitator guided the discussion to ensure coverage of necessary topics and helped the students identify learning issues, i.e. areas that required additional development, for the students to research and address for the subsequent session. At the end of each session, teams previewed the leading questions for the following class and organized their approach to address these questions prior to the next session. The whole process provided experiences related to goal 5.

As the course was developed during the COVID-19 pandemic, topics related to SARS-CoV2 and COVID-19 were selected as currently relevant problems in society. Session 1 prepared the students to work in teams by discussing about how to work in teams and manage conflict (related to goal 5). In session 2, the students met in their assigned groups to get to know each other, discuss problem-based learning and establish ground rules for the group. Sessions 3 and 4 laid the course background by focusing on the SARS-CoV2 virus and COVID-19-associated pathologies (related to goal 1). The subsequent nine sessions were organized into three separate but interrelated three-session blocks: one on COVID-19 and blood clotting, one on COVID-19 and loss of taste, and one on SARS-CoV2 and therapeutics. The first session in each of these blocks was devoted to covering background information (blood clotting, neurosensation and drug application)(related to goal 1). The second session of each block discussed hypothesis development (mechanisms that SARS-CoV2 infection might utilize to alter blood clotting, the sense of taste, and identification of therapeutic targets to attenuate SARS-CoV2 infection)(related to goal 3). In the second sessions the students also began to design experiments to test the hypothesis. The final session of each block fleshed out the details of the experimental design (related to goals 2 and 3).

The process was iterative, where the students had three opportunities to discuss hypothesis development, experimental design and analysis during sessions with their facilitators. Written and oral presentation assignments (Appendix V) provided additional opportunities to articulate a hypothesis, describe experimental approaches to test the hypotheses, propose analysis of experimental results and develop communication skills (related to goal 4).

Rigor and reproducibility was incorporated into the course. This was an important component given the emphasis recently placed on rigor and reproducibility by federal agencies. As the students built the experimental design to address the hypothesis, recurring questions were posed to encourage them to consider rigor. Examples include: “ Are the methods and experimental approaches rigorous? How could they be made more rigorous? ” “ Discuss how your controls validate the outcome of the experiment. What additional controls could increase confidence in your result? ” The facilitators were instructed to direct discussion to topics related to the rigor of the experimental design. The students were asked about numbers of replicates, number of animals, additional methods that could be applied to support the experiment, and other measurements to address the hypothesis in a complementary fashion. In the second iteration of the course, we introduced an exercise on rigor and reproducibility for the students using the NIH Rigor and Reproducibility Training Modules (see Appendix III). In this exercise, the students read a short introduction to rigor and reproducibility and viewed a number of short video modules to introduce lessons on rigor. The students were also provided the link to the National Institute of General Medical Sciences clearinghouse of training modules on rigor and reproducibility as reference for experimental design in their future (see Appendix III).

The first delivery of the course was during the COVID-19 pandemic and sessions were conducted on the Zoom platform. The thirteen PBL sessions, and two additional sessions dedicated to oral presentations, were spaced over the course of the first semester of the biomedical sciences doctoral curriculum. The second iteration of the course followed the restructuring of the graduate first year curriculum and the thirteen PBL sessions, plus one additional session devoted to oral presentations, were held during the first three and a half weeks of the first-year curriculum. During this period in the semester, this was the only course commitment for the graduate students. Due to this compressed format, only one written assignment and a single oral presentation were assigned. As the small group format worked well via Zoom in the first iteration of the course, the small groups continued to meet using this virtual platform.

IRB Approval. The West Virginia University Institutional Review Board approved the study (WVU IRB Protocol#: 2008081739). Informed consent was provided by the participants in writing and all information was collected anonymously.

Surveys. Evaluation of training effectiveness was measured in two ways corresponding to the first two levels of the Kirkpatrick Model of Evaluation [ 41 ]. First, students completed a questionnaire upon completion of the course to capture their perceptions of training (Appendix VII). Students were asked their level of agreement/disagreement on a Likert scale with 10 statements about the course and 7 statements about their facilitator. Second, students took a pre- and post-test to measure differences in their ability to design experiments before and after training (Appendix VIII). The pre- and post-tests were identical, asking the students to design an experiment to test a specific hypothesis, include controls, plan analyses, and state possible results and interpretation. Five questions were provided for the pre- and post-test, where each question posed a hypothesis from a different biomedical discipline, e.g. cancer biology or neuroscience. Students were asked to choose one of the five questions to answer.

Peer-to-peer evaluations were collected to provide feedback on professionalism and teamwork. This survey utilized a Goldilocks scale ranging from 1 to 7, with 4 being the desired score. An example peer question asked about accountability, where responses included not accountable, e.g. always late (score = 1), accountable, e.g. punctual, well prepared, follows up (score = 4) and controlling, e.g. finds fault in others (score = 7). Each student provided a peer-to-peer evaluation for each student in their group. (see Appendix VII). In the second course iteration, Goldilocks surveys were collected three times over the three-week course period due to the compressed time frame. This was necessary to provide rapid feedback to the students about their performance during the course in order to provide opportunities to address and rectify any deficits before making final performance assessments.

Evaluating Pre- and Post-Tests. All pre- and post-test answers were evaluated by three graders in a blind fashion, where the graders were unaware if an answer came from a pre- or post-test. Prior to grading, each grader made up individual answer keys based upon the question(s) on the tests. The graders then met to compare and deliberate these preliminary keys, incorporating changes and edits to produce a single combined key to use for rating answers. While the students were asked to answer one question, some students chose to answer several questions. Superfluous answers were used as a training dataset for the graders. The graders independently scored each answer, then met to review the results and discuss modification of the grading key. The established final grading key, with a perfect score of 16, was utilized by the graders in independently evaluating the complete experimental dataset consisting of all pre- and post-test answers (Appendix IX). To assess the ability of student cohorts to design experiments before and after the course, three of the authors graded all of the pre- and post-test answers. Grading was performed in a blind fashion and the scores of the three raters were averaged for each answer.

Statistical analysis. To measure the interrater reliability of the graders, the intraclass correlation coefficient (ICC) was calculated. A two-way mixed effects model was utilized to evaluate consistency between multiple raters/measurements. The ICC for grading the training dataset was 0.82, indicating a good inter-rater agreement. The ICC for grading the experimental dataset was also 0.82. For comparison of pre-test vs. post-test performance, the scores of the three raters were averaged for each answer. Since answers were anonymous, the analyses compared responses between individuals. Most, but not all scores, exhibited a Gaussian distribution and therefore a nonparametric statistic, a one-tailed Mann Whitney U test, was used for comparison. The pre-test and post-test scores for 2020 and 2021 could not be compared due to the different format used for the course in each year.

Thirty students participated in the course in the first offering, while 27 students were enrolled in the second year. The students took pre- and post-tests to measure their ability to design an experiment before and after training (Appendix VIII). As the course progressed, students were surveyed on their views of the professionalism of other students in their group (Appendix VII). At the end of the course, students were asked to respond to surveys evaluating the course and their facilitator (see Appendix VII).

Student reception of the course (Kirkpatrick Level 1) . In the first year, 23 students responded to the course evaluation (77% response rate) and 26 students submitted facilitator evaluations (87% response rate), whereas in the second year there were 25 responses to the course evaluation (93% response rate) and 26 for facilitators (96% response rate). Likert scores for the 2020 and 2021 course evaluations are presented in Fig.  1 . The median score for each question was 4 on a scale of 5 in 2020. In 2021, the median scores for the questions about active learning and hypothesis testing were 5 and the median score of the other questions was 4. The students appreciated the efforts of the facilitators in the course, based upon their evaluations of the facilitators. The median score for every facilitator across all survey questions is shown in Fig.  2 . The median score for a single question in 2020 and 2021 was 4.5 and the median score for all other questions was 5. The results of the peer-to-peer evaluations are illustrated in Fig.  3 . The average score for each student were plotted, with scores further from the desired score of 4 indicating perceived behaviors that were not ideal. The wide range of scores in the 2020 survey were noted. The students completed three peer-to-peer surveys during the 2021 course. The range of scores in the 2021 peer-to-peer evaluation was narrower than the range in the 2020 survey. The range of scores was expected to narrow from the first (initial) to third (final) survey as students learned and implemented improvements in their professional conduct based upon peer feedback. The narrow range of scores in the initial survey left little room for improvement.

An external file that holds a picture, illustration, etc.
Object name is 12909_2023_4569_Fig1_HTML.jpg

Results of Course Evaluations by Students. Student evaluations of the course were collected at the end of each offering. The evaluation surveys are in Appendix VII. Violin plots showing the distribution and median score for each question in the 2020 survey (A) and the 2021 survey (B) are shown. The survey used a Likert scale (1 – low to 5 – high)

An external file that holds a picture, illustration, etc.
Object name is 12909_2023_4569_Fig2_HTML.jpg

Results of Facilitator Evaluations by Students. Student evaluations of the facilitators were collected at the end of each offering of the course. The evaluation surveys are in Appendix VII. Violin plots showing the distribution and median score for each question in the 2020 survey (A) and the 2021 survey (B) are shown. The survey used a Likert scale (1 – low to 5 – high)

An external file that holds a picture, illustration, etc.
Object name is 12909_2023_4569_Fig3_HTML.jpg

Results of Student Peer-to-Peer Evaluations. Student peer-to-peer evaluations were collected at the end of the course in year 1 (A) , and at the beginning (B) , the middle (C) and the end (D) of the course in year 2. Each student evaluated the professionalism of each other student in their group using the evaluation survey shown in Appendix VII. The average score for each student is plotted as a data point. The survey used a Goldilocks scale (range of 1 to 7) where the desired professional behavior is reflected by a score of 4

Student learning (Kirkpatrick Level 2). Twenty-six students completed the pre-test in each year and consented to participate in this study (87% response in the first year and 96% response in the second year). Eighteen students completed the post-test at the end of the first year (60%) and 26 students completed the test at the end of the second year (96%). Question selection (excluding students that misunderstood the assignment and answered all questions) is shown in Table  1 . The most frequently selected questions were Question 1 (45 times) and Question 2 (23 times). Interestingly, the results in Table  1 also indicate that students did not necessarily choose the same question to answer on the pre-test and post-test.

Student Choice of Experimental Question to Answer (Only those who made a choice)

20202021
Pre-testPost-testPre-testPost-test
1081314
3578
2223
4140
2001

Average scores on pre-tests and post-tests were compared using a one-tailed Mann Whitney U test. Since the format of the course was different in the two iterations, comparison of test results between the two years could not be made. The average scores of the pre- and post-test in 2020 were not statistically different (p = 0.0673), although the post-test scores trended higher. In contrast, the difference between the pre- and post-test in 2021 did reach statistical significance (p = 0.0329). The results collectively indicate an overall improvement in student ability in experimental design (Fig.  4 ).

An external file that holds a picture, illustration, etc.
Object name is 12909_2023_4569_Fig4_HTML.jpg

Pre- and Post-Test Scores. At the beginning and end of each offering, the students completed a test to measure their ability to design an experiment (see Appendix VIII for the details of the exam). Three faculty graded every answer to the pre- and post-test using a common grading rubric (see Appendix IX). The maximum possible score was 16. The average score for each individual answer on the pre-test and post-test is represented as a single data point. The bar indicates the mean score across all answers +/- SD. The average scores of the pre- and post-test scores were compared using a one-tailed Mann Whitney U test. For the 2020 data (A) , p = 0.0673, and for the 2021 data (B) , p = 0.0329

This course was created in response to biomedical workforce training reports recommending increased training in general professional skills and scientific skills, e.g. critical thinking and experimental design. The course utilizes a PBL format, which is not extensively utilized in graduate education, to incorporate active learning throughout the experience. It was well received by students and analysis suggests that major goals of the course were met. This provides a template for other administrators and educators seeking to modify curricula in response to calls to modify training programs for doctoral students.

Student evaluations indicated the course was effective at motivating active learning and that students became more active learners. The evaluation survey questions were directly related to three specific course goals: (1) Students reported developing skills in problem solving, hypothesis testing and experimental design. (2) The course helped develop oral presentation skills and written communication skills (in one iteration of the course) and (3) students developed collaboration and team skills. Thus, from the students’ perspective, these three course goals were met. Student perceptions of peer professionalism was measured using peer-to-peer surveys. The wide range of Goldilocks scores in the first student cohort was unexpected. In the second student cohort changes in professional behavior were measured over time and the score ranges were narrower. The reasons for the difference between cohorts is unclear. One possibility for this discrepancy is that the first iteration of the course extended over one semester and was during the first full semester of the pandemic, impacting professional behavior and perceptions of professionalism. The second cohort completed a professionalism survey three times during the course. The narrow range of scores from this cohort in the initial survey made detection of improved professionalism over the course difficult. Results do indicate that professionalism improved in terms of respect and compassion between the first and last surveys. Finally, the results of the pre-test and post-test analysis demonstrated a trend of improved performance on the post-test relative to the pre-test for students in each year of the course and a statistical difference between the pre- and post-test scores in the second year.

Areas for improvement. The course was initially offered as a one-credit course. Student comments on course evaluations and comments in debriefing sessions with facilitators at the end of the course concurred that the work load exceeded that of a one credit course. As a result, the year two version was offered as a two-credit course to better align course credits with workload.

There were student misperceptions about the goals of the course in the first year. Some students equated experimental design with research methods and expressed disappointment that this was not a methods course. While learning appropriate methods is a goal of the course, the main emphasis is developing hypotheses and designing experiments to test the hypotheses. As such, the choice of methods was driven by the hypotheses and experimental design. This misperception was addressed in the second year by clearly elaborating on the course goals in an introductory class session.

The original course offering contained limited statistical exercises to simulate experimental planning and data analysis, e.g. students were required to conduct a power analysis. Between the first and second years of the course, the entire first semester biomedical sciences curriculum was overhauled with several new course offerings. This new curriculum contained an independent biostatistics workshop that students completed prior to the beginning of this course. Additional statistics exercises were incorporated into the PBL course to provide the students with more experience in the analysis of experimental results. Student evaluations indicated that the introduction of these additional exercises was not effective. Improved coordination between the biostatistics workshop and the PBL course is required to align expectations, better equipping students for the statistical analysis of experimental results encountered later in this course.

An important aspect that was evident from student surveys, facilitator discussions and debrief sessions was that improved coordination between the individual facilitators of the different groups is required to reduce intergroup variability. Due to class size, the students were divided into six groups, with each facilitator assigned to the same group for the duration of the course to maintain continuity. The facilitators met independent of the students throughout the course to discuss upcoming sessions and to share their experiences with their respective groups. This allowed the different facilitators to compare approaches and discuss emerging or perceived concerns/issues. In the second year, one facilitator rotated between different groups during each session to observe how the different student groups functioned. Such a real time faculty peer-evaluation process has the potential to reduce variability between groups, but was challenging to implement within the short three-week time period. Comprehensive training where all facilitators become well versed in PBL strategies and adhere to an established set of guidelines/script for each session is one mechanism that may reduce variability across different facilitator-group pairings.

Limitations. The current study has a number of limitations. The sample size for each class was small, with 30 students enrolled in the first year of the course and 27 students enrolled in the second. The response rates for the pre-tests were high (> 87%) but the response rate for the post-test varied between the first year (60%) and second year (96%) of the course. The higher response rate in the second year might be due to fewer end of semester surveys since this was the only course that the students took in that time period. Additionally, the post-test in the second year was conducted at a scheduled time, rather than on the student’s own time as was the case in year one. Due to restructuring of the graduate curriculum and the pandemic, the two iterations of the course were formatted differently. This precluded pooling the data from the two offerings and makes comparison between the outcomes difficult.

Presentation of the course was similar, but not identical, to all of the students. Six different PBL groups were required to accommodate the number of matriculating students in each year. Despite efforts to provide a consistent experience, there was variability between the different facilitators in running their respective groups. Further, the development of each session in each group was different, since discussion was driven by the students and their collective interests. These variables could be responsible for increasing the spread of scores on the post-tests and decreasing the value of the course for a subset of students.

The pre- and post-tests were conducted anonymously to encourage student participation. This prevented correlating the differential between pre- and post-test scores for each student and in comparing learning between different groups. The pre-test and post-test were identical, and provided the students with five options to design experiments (with identical instructions) in response to a different biomedical science problem. An alternative approach could have used isomorphic questions for the pre- and post-tests. It is clear that some students answered the same question on the pre- and post-test, and may benefit from answering the same question twice (albeit after taking the course). Some students clearly answered different questions on the pre- and post-test and the outcomes might be skewed if the two questions challenged the student differently.

While the course analysis captured the first two levels of the Kirkpatrick model of evaluation (reaction and learning), it did not attempt to measure the third level (behavior) or fourth level (results) [ 41 ]. Future studies are required to measure the third level. This could be achieved by asking students to elaborate on their experimental design used in recent experiments in their dissertation laboratory following completion of the course, or by evaluating the experimental design students incorporate into their dissertation proposals. The fourth Kirkpatrick level could potentially be assessed by surveying preceptors about their students’ abilities in experimental design in a longitudinal manner at semi- or annual committee meetings and accompanying written progress reports. The advantage of focusing on the first two Kirkpatrick levels of evaluation is that the measured outcomes can be confidently attributed to the course. Third and fourth level evaluations are more complicated, since they necessarily take place at some point after completion of the course. Thus, the third and fourth level outcomes can result from additional factors outside of the course (e.g. other coursework, working in the lab, attendance in student-based research forum, meeting with mentors, etc.). Another limiting factor is the use of a single test to measure student learning. Additional alternative approaches to measure learning might better capture differences between the pre- and post-test scores.

Implementation. This curriculum is readily scalable and can be modified for graduate programs of any size, with the caveat that larger programs will require more facilitators. At Van Andel, the doctoral cohorts are three to five new students per year and all are accommodated in one PBL group [ 25 ]. At our institution, we have scaled up to a moderate sized doctoral program with 25 to 30 matriculating students per year, dividing the students into six PBL groups (4–5 students each). Medical School classes frequently exceed 100 students (our program has 115–120 new students each fall) and typically have between five and eight students per group. Our graduate course has groups at the lower end of this range. This course could be scaled up by increasing the number of students in the group or by increasing the number of groups.

Consistency between groups is important so each group of students has a similar experience and reaps the full benefit of this experience. Regular meetings between the course coordinator and facilitators to discuss the content of upcoming sessions and define rubrics to guide student feedback and evaluation were mechanisms used to standardize between the different groups in this course (Appendix VI). In hindsight, the course would benefit from more rigorous facilitator training prior to participation in the course. While a number of our facilitators were veterans of a medical school PBL course, the necessary skillset required to effectively manage a graduate level PBL course that is centered on developing critical thinking and experimental design are different. Such training requires an extensive time commitment by the course coordinators and participating facilitators.

The most difficult task in developing this course involved the course conception and development of the problem-based assignments. Designing a COVID-19 based PBL course in 2020 required de novo development of all course material. This entailed collecting and compiling information about the virus and the disease to provide quick reference for facilitators to guide discussion in their groups, all in the face of constantly shifting scientific and medical knowledge, along with the complete lack of traditional peer-based academic social engagement due to the pandemic. In development of this course, three different COVID-based problems were identified, with appropriate general background material for each problem requiring extensive research and development. Background material on cell and animal models, general strategies for experimental manipulation and methods to measure specific outcomes were collected in each case. Student copies for each session were designed to contain a series of questions as a guide to identifying important background concepts. Facilitator copies for each session were prepared with the goal of efficiently and effectively guiding each class meeting. These guidelines contained ideas for discussion points, areas of elaboration and a truncated key of necessary information to guide the group (Appendix IV). Several PBL repositories exist (e.g. https://itue.udel.edu/pbl/problems/ , https://www.nsta.org/case-studies ) and MedEdPORTAL ( https://www.mededportal.org/ ) publishes medical-specific cases. These provide valuable resources for case-based ideas, but few are specifically geared for research-focused biomedical graduate students. As such, modification of cases germane to first year biomedical graduate students with a research-centered focus is required prior to implementation. Finally, appropriate support materials for surveys and evaluation rubrics requires additional development and refinement of current or existing templates to permit improved evaluation of learning outcomes (Appendix VI).

Development of an effective PBL course takes considerable time and effort to conceive and construct. Successful implementation requires the requisite higher administrative support to identify and devote the necessary and appropriate faculty needed for course creation, the assignment of skilled faculty to serve as facilitators and staff support to coordinate the logistics for the course. It is critical that there is strong faculty commitment amongst the facilitators to devote the time and energy necessary to prepare and to successfully facilitate a group of students. Strong institutional support is linked to facilitator satisfaction and commitment to the PBL-based programs [ 42 ]. Institutional support can be demonstrated in multiple ways. The time commitment for course developers, coordinators and facilitators should be accurately reflected in teaching assignments. Performance in these roles in PBL should factor into decisions about support for professional development, e.g. travel awards, and merit based pay increases. Further, efforts in developing, implementing and executing a successful PBL course should be recognized as important activities during annual faculty evaluations by departmental chairs and promotion and tenure committees.

Key Takeaways. The creation and implementation of this course was intellectually stimulating and facilitators found their interactions with students gratifying. From student survey responses and test results the course was at least modestly successful at achieving its goals. Based upon our experience, important issues to consider when deciding to implement such a curriculum include: (1) support of the administration for developing the curriculum, (2) facilitator buy-in to the approach, (3) continuity (not uniformity) between PBL groups, (4) other components of the curriculum and how they might be leveraged to enhance the effectiveness of PBL and (5) effort required to develop and deliver the course, which must be recognized by the administration.

Future Directions. Novel curriculum development is an often overlooked but important component to contemporary graduate student education in the biomedical sciences. It is critical that modifications incorporated in graduate education are evidence based. We report the implementation of a novel PBL course for training in the scientific skill sets required for developing and testing hypotheses, and demonstrate its effectiveness. Additional measures to assess the course goals in improving critical thinking, experimental design and self-efficacy in experimental design will be implemented using validated tests [ 22 , 43 – 45 ]. Further studies are also required to determine the long-term impact of this training on student performance in the laboratory and progression towards degree. It will be interesting to determine if similar curriculum changes to emphasize development of skills will shorten the time to degree, a frequent recommendation for training the modern biomedical workforce [ 1 , 46 – 48 ].

Incorporation of courses emphasizing development of skills can be done in conjunction with traditional didactic instruction to build the necessary knowledge base for modern biomedical research. Our PBL course was stand-alone, requiring the students to research background material prior to hypothesis development and experimental design. Coordination between the two modalities would obviate the need for background research in the PBL component, reinforce the basic knowledge presented didactically through application, and prepare students for higher order thinking about the application of the concepts learned in the traditional classroom. Maintaining a balance between problem-based and traditional instruction may also be key in improving faculty engagement into such new and future initiatives. Continued investments in the creation and improvement of innovative components of graduate curricula centered around developing scientific skills of doctoral students can be intellectually stimulating for faculty and provide a better training environment for students. The effort may be rewarded by streamlining training and strengthening the biomedical workforce of the future.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Acknowledgements

Thanks to Mary Wimmer and Drew Shiemke for many discussions over the years about PBL in the medical curriculum and examples of case studies. We thank Steve Treisenberg for initial suggestions and discussions regarding PBL effectiveness in the Van Andel Institute. Thanks to Paul and Julie Lockman for discussions about PBL in School of Pharmacy curricula and examples of case studies. Special thanks to the facilitators of the groups, Stan Hileman, Hunter Zhang, Paul Chantler, Yehenew Agazie, Saravan Kolandaivelu, Hangang Yu, Tim Eubank, William Walker, and Amanda Gatesman-Ammer. Without their considerable efforts the course could never have been successfully implemented. Thanks to the Department of Biochemistry and Molecular Medicine for supporting the development of this project. MS is the director of the Cell & Molecular Biology and Biomedical Engineering Training Program (T32 GM133369).

Abbreviations

PBLProblem-based learning
STEMScience, technology, engineering, and math
K-12kindergarten through grade 12
ICCIntraclass coefficient>
SARS-CoV2severe acute respiratory syndrome coronavirus 2
COVID-19Coronavirus disease 19

Author contributions

SW and MS developed the concept for the course. MS was responsible for creation and development of all of the content, for the implementation of the course, the design of the study and creating the first draft of the manuscript. MG, MRG and SW graded the pre- and post-test answers in a blind fashion. MS, MG, MRG and SW analyzed the data and edited the manuscript.

There was no funding available for this work.

Data Availability

Declarations.

The authors declare no competing interests.

The West Virginia University Institutional Review Board approved the study (WVU IRB Protocol#: 2008081739). Informed consent was provided in writing and all information was collected anonymously. All methods were carried out in accordance with relevant guidelines and regulations.

Not applicable.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 September 2024

Multi-objective optimization design of low-power-driven, large-flux, and fast-response three-stage valve

  • Aixiang Ma 1 ,
  • Heruizhi Xiao 1 ,
  • Yue Hao 1 ,
  • Xihao Yan 1 &
  • Sihai Zhao 1  

Scientific Reports volume  14 , Article number:  21575 ( 2024 ) Cite this article

Metrics details

  • Energy infrastructure
  • Mechanical engineering

Intrinsically safe solenoids drive solenoid valves in coal mining equipment. The low power consumption of these solenoids limits the response time of the solenoid valves. Additionally, the low viscosity and high susceptibility to dust contamination of the emulsion fluid often lead to leakage and sticking of hydraulic valves. This study proposes a low-power-driven, large-flux, fast-response three-stage valve structure with an internal displacement feedback device to address these issues. The critical parameters of the valve were optimized using a novel multi-objective optimization algorithm. A prototype was manufactured based on the obtained parameters and subjected to simulation and experimental verification. The results demonstrate that the valve has an opening time of 21 ms, a closing time of 12 ms, and a maximum flow rate of approximately 225 L/min. The driving power of this structure is less than 1.2 W. By utilizing this valve for hydraulic cylinder control, a positioning accuracy of ± 0.15 mm was achieved. The comparative test results show that the proposed structural control error fluctuation is smaller than that of the 3/4 proportional valve.

Introduction

High-pressure water-based hydraulic valves are widely used in coal mining equipment. However, the water-based transmission medium in the hydraulic system of comprehensive mining faces is highly susceptible to dust contamination, leading to common issues such as valve components jamming and leaking during operation 1 , 2 . Therefore, enhancing the pollution resistance of valves and reducing internal leakage have become urgent issues to address 3 . The failure of hydraulic valves can result in numerous malfunctions of controlled equipment, directly threatening the safety of coal miners, damaging mechanical equipment in the work area, increasing maintenance workload, and severely affecting production efficiency on the work surface 4 .

The working medium of the hydraulic valves in the fully mechanized mining face is a high water-based fluid with 5% oil and 95% water, forming an oil-in-water (O/W) emulsion, which has low viscosity and operates at high pressures, typically around 32 MPa 5 . Under the exact clearance and pressure drop, the leakage loss of high-pressure water-based hydraulic valves of the same specifications is tens of times that of oil pressure control valves, requiring a tight matching in their structure 6 . Additionally, internal leakage must be almost zero when the spool is closed, necessitating the avoidance of slide-valve structures 7 . To meet these performance indicators, it is necessary to consider the sole physicochemical properties of water-based medium comprehensively, improve the structure and design parameters of hydraulic valves, or develop entirely new designs.

In addition to the issues of clutch and leakage, the driving solenoid power of control valves in coal mining equipment should be less than 1.2 W 8 . To meet the system's mass flow rate requirements, existing research often adopts a two-stage or three-stage valve structure 9 , 10 . In addition to the high flow rate, the hydraulic system in the fully mechanized mining face imposes stricter demands on response time than conventional hydraulic systems. However, due to factors such as large flux, spool mass, and large stroke, a conflict exists between large flux and fast response, making the design of such control valves challenging 11 .

High-speed on–off valves (HSV), belonging to a type of digital hydraulic component, are always in either fully open or fully closed states. They can convert discrete control signals into discrete flow rates, offering advantages such as fast response, simple structure, high reliability, and insensitivity to hydraulic fluid contamination 12 . However, due to their inability to provide significant flow rates during rapid responses, multiple actuators are often employed for small-scale equipment or pilot valves 13 . To address this limitation, without altering the structure and type of high-speed valves, multiple high-speed valves are typically connected in parallel to form a Digital Flow Control Unit (DFCU) to expand flow rate gain and resolve the contradiction mentioned above 14 .

Previous research on high-flow and fast-response valves has primarily focused on proportional/servo or HSV 15 , 16 , 17 . In recent years, there has also been significant progress in pilot valves for HSV 18 , 19 . However, it is almost inevitable to have spool structures when controlling flow direction in hydraulic systems using proportional/servo valves 20 . Some HSVs, driven by armature coils with permanent magnet structures in on–off valves, can generate significant flow rates, but they operate at low pressures and have internal piston-type structures 21 , 22 . On the other hand, standard high-flow cartridge valves cannot meet the demand due to their lack of fast response 23 .

Multi-objective optimization algorithms are commonly employed in the optimization design of valve structures 24 , 25 , 26 . The proposed three-stage valve (TSV) exhibits strong coupling relationships among its stages, making its critical design parameters difficult to determine. With multiple design parameters and mutual constraints, obtaining the optimal solution by optimizing a single parameter is highly challenging. The optimization design problem of the TSV is a multi-objective optimization problem, with advanced solutions achieved through optimization algorithms. Among them, metaheuristic methods find extensive engineering applications. Metaheuristic algorithms refer to an algorithmic framework independent of specific problems, where heuristic methods are inspired by natural phenomena, biological behaviors, or even mathematical principles 27 . Metaheuristic methods possess advantages such as randomness, ease of implementation, and consideration of black-box scenarios, making them adept at tackling complex engineering problems 28 .

Among various metaheuristic algorithms, the Electric Eel Foraging Algorithm (EEFO) has recently gained significant attention in the literature. This algorithm mathematically models four key foraging behaviors of electric eel groups: “Interacting, Resting, Hunting, and Migrating.” It provides exploration and exploitation during the optimization process. An energy factor is developed to manage the transition from global to local search and balance exploration and exploitation in the search space. Test results have shown that EEFO exhibits excellent performance in exploitation, balancing exploration and exploitation, and avoiding local optima 29 .

Proposed structure and hydraulic system

Structural adjustments are made to the poppet valve to avoid using a slide-valve structure. In this study, the displacement-flow feedback principle is utilized, where the piston in the lift valve structure is replaced with a pressure control chamber. Four displacement feedback orifices are installed on the valve spool to control the valve opening based on the pressure control chamber's pressure.

The displacement-flow feedback principle, also known as the “Valvistor” valve control principle, was proposed by ANDERSSON from Linköping University in Sweden in the 1980s. It has advantages such as a simple structure and gratifying dynamic and static characteristics and has been widely applied 30 , 31 . By replacing the slide structure in the poppet valve with a pressure control chamber, a TSV structure, as shown in Fig.  1 , is proposed in this research. The three-stage poppet valve comprises a pilot HSV, a secondary stage, and a main valve. The two pilot valves are a bidirectional poppet valve and an HSV. The modified poppet valve still has a slide-valve structure, which means there may be leakage. However, the sealing of this valve does not rely on the piston structure. The leakage flow from the piston structure towards the C port is reliably blocked by the poppet valve in the previous stage, together with the flow passing through the feedback orifice 32 . Therefore, the novel TSV structure does not have leakage issues. Since there is no requirement for the sealing performance of the piston structure and a displacement feedback orifice is present on the mating surface, the spool will not experience clutch phenomena.

figure 1

Working principle of the three-stage valve and structure of the spool.

The flow controlled by the secondary stage should be sufficiently large to achieve a fast step response at the main stage. Therefore, the size and flow capacity of the secondary valve are always more extensive than those of conventional high-flow directional valves and HSVs. As a result, the HSV's electromagnetic driving force is insufficient to drive the secondary valve directly without a pilot valve. Adding a pilot stage to the secondary stage drives the secondary stage by pressure instead of magnetic force. Since the steady-state hydrodynamic force acting on the hydraulic lift valve is much weaker than the pressure, hydrodynamic forces' influence on the spool motion is neglected 28 . At the same time, the pilot valve's size and the electromagnetic valve's driving force are reduced, resulting in a shorter response time for the pilot stage. Due to these advantages, the valve adopts a three-stage structure. The secondary stage and the main valve are poppet valves with displacement feedback structures that connect ports A and C . Due to the larger area of the top end of the poppet valve compared to the bottom end, the poppet valve can reliably close under static pressure. At the same time, the internal leakage between the spool and the sleeve can be shielded by the pilot stage. The oil supply port P is connected to high-pressure oil, and the oil return port T is connected to the load.

The working principle of the TSV is as follows. When a control signal is input, the pilot HSV opens, causing a pressure decrease at port C 2 . As a result, the secondary stage opens, leading to a pressure decrease at port C 1, and the main valve opens. The closing process is the same as the opening process, and after the pilot HSV is closed, the secondary and main valves close sequentially. This valve has a proportional throttling function in the A-B direction and a check valve cutoff function in the other direction.

The hydraulic system discussed in this article is depicted in Fig.  2 . As shown, the position of the inertial load is driven by an asymmetric single-rod hydraulic cylinder, actuated by four TSVs (Throttle Servo Valves). The flow input of the TSVs is adjusted via the control signal u . When u  > 0, valves (a) and (d) are open, and valves (b) and (c) are closed. High-pressure fluid from the oil pump flows through valve (a) into the rodless chamber of the hydraulic cylinder, causing it to extend under the pressure differential. Conversely, when u  < 0, valves (b) and (c) are open, and valves (a) and (d) are closed. High-pressure fluid from the pump flows through valve (c) into the rod chamber of the hydraulic cylinder, causing it to retract due to the pressure differential. P s  = 7 MPa, P t  = 0 MPa, and each TSV’s flow rate should exceed 100 L/min ( Q 01  > 100 L/min, Q 02  > 100 L/min). The response time should be minimized while meeting the required flow rates.

figure 2

Hydraulic system.

System modeling

Before analyzing the system's dynamic characteristics, it is necessary to determine the displacements of the spools in each stage of the TSV at the steady state. This will provide information on the flow rates at each stage and the relationship between the total flow rate and critical design parameters.

Poppet displacement modeling

The flow through the pilot HSV can be represented as:

The flow through the secondary stage can be represented as:

The flow through the feedback throttling grooves of the secondary stage can be represented as:

The flow through the main stage can be represented as:

The flow through the feedback throttling grooves of the main valve can be represented as:

The total flow equation through the TSV can be represented as:

When the valve is in a steady state, the flow through the feedback throttling grooves of the main stage is equal to the sum of the flows through the secondary stage and the pilot HSV:

The flow through the secondary feedback throttle slot is equal to the flow through the pilot HSV:

In the stable open state of the valve, neglecting the influence of hydrodynamic forces, the force balance equations for the main and secondary spools can be derived:

To obtain the displacement of the main stage spool when the pilot HSV is opened, the following assumptions are made: the flow coefficient of each orifice is the same, the flow inside the pressure control chamber has no effect on the pressure distribution, and the pressure acting on the surfaces of the spool is uniformly distributed, and the force exerted by the spring on each spool is neglected. Firstly, when the spool is at maximum lift, the net force acting on the spool should be zero in the axial direction, and the pressures inside each pressure control chamber are as follows:

Substituting Eqs. ( 1 ), and ( 3 ) into ( 8 ), the displacement of the secondary stage spool is:

Substituting Eqs. ( 1 ), ( 2 ), and ( 5 ) into ( 7 ), the displacement of the main valve spool is:

Equations ( 11 ), ( 12 ), ( 13 ), and ( 14 ) can be used to solve for the openings of the main and secondary stages. An iterative method can be employed here, starting with a reasonable initial value for Eqs. ( 9 ) and ( 10 ), then substituting the calculated values of P c 1 and P c 2 into Eqs. ( 13 ) and ( 14 ), and obtaining x 1 and x 2 . This process is repeated until the interpolated result of two iterations is small enough.

If the influence of the spring force is neglected, by observing Eqs. ( 13 ) and ( 14 ), it can be concluded that W t 2 affects the opening of the secondary stage spool; increasing x 02 or W t 2 will decrease the value of x 2 . And x 2 affects the opening of the main stage; increasing x 01 , decreasing x 2 , or increasing W t 1 will decrease the value of x 2 . Therefore, the design parameters that impact the system flow most are W t 1 , W t 2 , x 01 , and x 02 . To improve the flow capacity of the valve, further optimization design should be carried out.

Dynamic modeling

Even without considering the effects of the circuit, the HSV system is still a third-order system. Since no adjustments are made to the structure of the HSV to simplify calculations, it is assumed that the movement of the secondary stage and the main valve starts after the HSV is fully opened. At this point, the flow through the HSV can be simplified as a function of the control chamber pressure P c2 . The analysis of the dynamic characteristics of the TSV involves the damping analysis of the orifices in the hydraulic control circuit and the force analysis of the poppet valves at each stage, as shown in Fig.  3 . It can be observed that the shared control chamber pressure P ci serves as the bridge and link between the stages. There is a strong coupling relationship between each spool, and the interaction between the stages achieves the implementation of the displacement feedback principle and the dynamic and static performance of the hydraulic valve. The dynamic equations for the poppet valve components can be derived based on Newton’s second law:

figure 3

Force analysis of the moving parts.

When considering dynamics, the instantaneous flow through each orifice no longer has a simple mathematical relationship, and real-time calculations need to be performed based on the dynamic pressures before and after the orifices. The pressure in the pressure control chambers also needs to be integrated using the flow continuity equation. According to the chamber-node method, each control chamber is treated as a flow-node. The dynamic equations for the pressure in each pressure control chamber can be written as follows:

For the general flow Eq. ( 1 ) of an orifice, if the flow area A has an analytical relationship with the displacement x of a particular moving component, the flow equation can be linearized near the operating point using a first-order Taylor expansion:

Here, Q ω , A ω , P ω in , P ω out , and x ω represent the flow, flow area, inlet pressure, outlet pressure, and displacement of the orifice at the operating point, while P in , P out , and x represent the incremental changes in inlet pressure, outlet pressure, and displacement near the operating point. k p and k x are the incremental gain factors for the flow-pressure and flow-displacement relationships of the orifice at the operating point. The above equation can be rewritten to express the incremental flow:

Here, δQ represents the incremental flow of the orifice near the operating point. By neglecting the influence of the displacement of the moving component on the dynamic chamber volume, the dynamic Eqs. ( 16 ) and ( 17 ) for the pressure control chambers can be linearized to obtain their linearized forms:

The coefficients k p 1 , k p 2 , k p 3 , k p 4 , k x 1 , and k x 2 can be calculated as follows:

The input variables P s and P t are proposed, while the incremental displacements x 1 and x 2 of the spools in the poppet valves are selected as the respective output variables. The incremental displacements x 1 and x 2 between the second-stage and main spools, their first-order derivatives concerning time \(\dot{x}_{1}\) and \(\dot{x}_{2}\) , and the pressures P c 1 and P c 2 in the control chambers form six state variables. By rearranging Eqs. ( 15 )–( 28 ), the state-space representation of the linearized feedback principle of displacement flow within the TSV near the operating point can be obtained.

where the state variables X are structured as follows:

The input variable U and the output variable Y are structured as follows:

The components of the state matrix A , input matrix B , and output matrix C for the linearized state-space model of the expanded TSV are as follows:

Using the system's state-space model, the unit step response can be obtained, which allows for the analysis of the system’s time response. The step response curve of the system is shown in Fig.  3 , where the TSV's response time is considered the adjustment time. Adjusting the input parameters can achieve different system response times. The system's stability can be checked during the computation based on the system’s poles and zeros distribution.

As shown in Table 1 , there are six key design parameters determining the TSV system, including W t1 , W t2 , D 1 , D 2 , α 1 , and α 2 , representing the width of the main valve spool throttle slot, the width of the secondary valve spool throttle slot, the diameter of the main spool, the diameter of the secondary school, the area ratio of the main spool, and the area ratio of the secondary spool, respectively. The initial values and optimization ranges of these parameters are given in Table 1 , where the initial values are based on the flow requirements of the valve to meet system stability. The optimization ranges comprehensively consider factors such as the volume of the valve, output flow rate, machining difficulty, etc.

The orifice area of the throttle slot in the TSV should exceed that of the second-stage valve. Hence, W t1 is set with a minimum value of 2 mm. Increasing W t1 reduces the opening of the third-stage spool, so a more prominent upper bound is set to allow the optimization algorithm to compute over a broader range. Although choosing a smaller value for W t2 can achieve a more significant flow amplification coefficient, the throttle slot area should exceed that of the pilot HSV. Considering machining precision and potential liquid contamination during TSV operation, W t2 should not be set too small. Hence, its minimum is set at 1mm. Furthermore, a conservative upper limit is defined to prevent system instability. The initial values of D 1 , D 2 , α 1 , and α 2 are selected based on the flow amplification formula ( 14 ), aiming for roughly equal openings of the two relief valves to ensure simultaneous closure of their spools while maintaining system stability.

The preliminary design of the system parameters results in the step response curve of the system shown in Fig.  4 . After the control signal input, the system exhibits an overshoot of 73.6% and reaches a steady-state value of 0.557 after three oscillations. The required settling time is 18.17 ms.

figure 4

System step response.

  • Multi-objective optimization

Optimization problem description

Optimization refers to finding the optimal solution or acceptable approximations among numerous solutions for a given problem under certain conditions. Optimization can significantly improve problem-solving efficiency, reduce computational requirements, and save financial resources. The optimization design problem of the novel TSV is a MOO problem. In general, the sub-objectives in an MOO problem are conflicting, and improving one sub-objective may lead to a decrease in the performance of another or several other sub-objectives. It is not possible to simultaneously achieve the optimal values for multiple sub-objectives. Still, coordination and compromise among them are required to achieve the best possible values for each sub-objective 33 . The essential difference between MOO and single-objective optimization problems is that the solution is not unique but consists of a set of Pareto optimal solutions composed of numerous non-dominated solutions. The decision vectors in the solution set are called non-dominated solutions. The corresponding objective functions to the non-dominated vectors in the Pareto optimal set are represented graphically as the Pareto frontier.

The main parameters that affect the performance of the TSV are W t1 , W t2 , A c1 , A c2 , ɑ 1 , and ɑ 2 . When optimizing these parameters, they are interrelated and have trade-offs: increasing the area ratio ɑ 1 , ɑ 2 can enhance the valve spool lift and flux, but it also increases the response time. Increasing the width of the feedback grooves W t1 and W t2 can improve the response time of the poppet valves but may reduce the valve lift and result in reduced flow. The fluctuation of each parameter makes it challenging to obtain an optimal solution by optimizing individual parameters.

This study's critical performance indicators for the designed TSV include response time, flow capacity, volume, weight, and manufacturing complexity. Response time denotes how quickly the valve reacts to input signals; a shorter response time enables precise control over fluid or pressure changes, thereby enhancing system controllability. Flow capacity refers to the amount of fluid the valve can process; higher flow output reduces actuator response times and improves system efficiency. Minimizing the valve’s size and weight facilitates easier installation and transportation. However, these parameters are not primary optimization targets due to the inherent conflict between volume/weight and flow capacity in three-way valves. Manufacturing complexity primarily pertains to technical challenges and cost factors during production and assembly, influencing structural dimensions to avoid excessively narrow channels and intricate grooves. Therefore, this study focuses on optimizing two objectives: response time and flow capacity.

This study utilizes an optimizer called EEFO, proposed in reference 29 , to simulate the foraging behavior of electric eels in a socially intelligent manner. EEFO incorporates four foraging behaviors: interaction, rest, hunting, and migration. It simulates interaction behavior for better exploration and rest, hunting, and migration behaviors for better utilization. The energy factor used in EEFO improves the balance between exploration and exploitation. The algorithm demonstrates excellent performance in development and exploration, balancing growth and exploration, and avoiding local optima. Further details of the EEFO algorithm can be found in reference 29 .

The computational flowchart for MOO is shown in Fig.  5 . Before performing the objective optimization, the system is preliminarily designed, and the following parameters are determined: the lengths of the second-stage valve and the main valve, the stiffness of the reset spring K 1 and K 2 , the pre-opening x 01 and x 02 , and all parameters related to the HSV. The parameters to be optimized are W t1 , W t2 , D 1 , D 2 , ɑ 1 , and ɑ 2 . Based on the dynamic analysis of the system in “ Dynamic modeling ” section, the TSV system needs to ensure stability while achieving the most muscular flow capacity in the shortest response time. The optimization design problem of the TSV can be described as follows: under given constraints, select appropriate design variables x to optimize the objective function f ( x ) to its optimal value. The mathematical model is as follows:

figure 5

Design flowchart in the present study.

Here, x  = ( x 1 , x 2 , …, x 6 ) represents the design variables, f ( x ) represents the fitness function, and f ( s ) means the stability constraint condition, which should be equal to logic 1 when the system is stable. Considering the difficulty of machining, X min and X max are the lower and upper bounds of the design variables, respectively. The optimization has two fitness functions, representing the valve response time and flow rate. The response time can be calculated based on the adjusting time of the system’s step response. Among them, U ( s ) is the Laplace transform of the input signal, and H ( s ) is the system's transfer function. The system's flow rate can be calculated using Eq. ( 35 ).

The initial particle swarm is randomly generated and introduced into the main program loop based on the abovementioned constraints and optimization objectives. The particle swarm continues to search for the ideal values. The fitness values for the TSV's output flow rate and response time are calculated. The program can determine the maximum and minimum values of the fitness function and generate a series of Pareto solutions while comparing the distances between the solutions to ensure a certain distance is maintained between each solution and the others. Compared to the previous generations, the current generation's extreme values and positions will be updated. Then, the positions and velocities of the particles are updated, and the next generation of the particle swarm is generated. The loop continues until the specified number of generations. It is worth noting that the solver settings in this research are slightly different from the simulation model analysis in reference 29 . The solver in this research can simultaneously solve multiple objective functions and generate a Pareto solution set rather than solving the optimal value of a single objective function.

Calculation results

The parameter ranges are set as follows (mm): W t1   ∈ [2,10], W t2   ∈  [1,3], D 1   ∈  [20,100], D 2   ∈  [10,30], ɑ 1   ∈  [0.1,1] and ɑ 2   ∈  [0.1,1]. The number of iterations for EEFO is set to 500, and the population size is set to 500.

As shown in Fig.  6 b, as the iterative calculations progress, the overall trend of the number of obtained solutions shows an increase. From the trend of the solution quantity, it can be observed that at the beginning of the calculation, particles can quickly approach the targets and obtain a significant number of solutions within a relatively short number of iterations. As the iterations proceed, the particles gradually converge near the optimal solution until they finally converge to a specific region, and the number of solutions in the set no longer increases. This indicates that the population and the number of iterations achieve convergence in the calculation. At the same time, a decrease in the number of solutions in the set can be observed. This is because the solutions generated in this iteration dominate one or more solutions in the existing solution set, requiring the removal of dominated solutions. Therefore, a decrease in the total number of solutions in the iteration calculation is expected.

figure 6

( a ) Solutions distribution; ( b ) changes in the number of solutions.

After the iterative calculations, all non-dominated solutions generated are stored, and completely identical non-dominated solutions are removed. This process results in the Pareto frontier (Fig.  6 a), plotted as a two-dimensional scatter plot. From the plot, it can be seen that the obtained solution set does not have any dominance relationships among them and is uniformly distributed along the Pareto frontier. The flow rate is positively correlated with the response time, meaning that as the flow rate increases, the system’s response time also increases. This is because obtaining a larger flow rate requires an increase in the valve spool lift and diameter, which leads to a longer valve action time and increased inertia.

Directly obtaining usable design parameters from the solutions obtained through the optimization algorithm is still impossible. Here, an Analytical Hierarchy Process (AHP) method is introduced. AHP is a simple method for making decisions on complex and ambiguous problems, especially those that are difficult to analyze quantitatively. It was proposed by Professor T. L. Saaty, an American operations researcher, in the early 1970s as a convenient, flexible, and practical multi-criteria decision-making method 34 . When determining the weights of factors that influence a specific criterion, these weights are often difficult to quantify. When there are many factors, decision-makers may inconsistently provide data that does not reflect their perceived importance due to incomplete consideration. The pairwise comparison method can establish pairwise comparison judgment matrices for the factors. The comparison judgment matrix for the TSV is shown in Table 3 . In the analysis, two additional design parameters, D 1 and D 2 , are included and given relatively low weights. We hope the TSV can have a smaller volume and weight while meeting the optimized performance criteria.

The values in Table 2 indicate the relative importance of the abscissa compared to the ordinate. An immense numerical value implies a greater significance of the parameter corresponding to the abscissa than the parameter corresponding to the ordinate. Therefore, the values on the diagonal are all 1. In this study, the valve's response time ( T ) is considered the most important performance indicator. Thus, the importance of T is three times that of the flow rate ( Q v ), five times that of the diameter of the main valve spool ( D 1 ), and seven times that of the diameter of the secondary valve spool ( D 2 ). Flow rate ( Q v ) is the second most important performance indicator. The influence of the diameter of the secondary valve spool ( D 2 ) on the valve volume is smaller than that of the main valve spool ( D 1 ). Hence, it is given a weight lower than that of the main valve spool.

First, the obtained design parameters are normalized, and then the processed data is multiplied by the weights obtained from the AHP method and summed. The results are shown in Fig.  7 . Taking into account the machining accuracy and the influence of the AHP analysis results, the final values of the design parameters are obtained as follows.

figure 7

AHP overall weight percent for solutions on the Pareto front.

The optimized values of each design parameter are shown in Table 3 .

Simulations and experiments

Response time

A prototype of the valve was manufactured to validate the performance of the proposed TSV, and an experimental setup was constructed based on the calculations from the EEFO algorithm, as shown in Fig.  8 .

figure 8

( a ) Secondary stage and main spool, ( b ) TSV structure.

To test the response time of the HSV, the experimental setup shown in Fig.  9 was utilized for conducting response time tests. A step signal with a duration of 200 ms and a voltage of DC12V was applied as the input to the HSV. The change in current was observed to determine the switching time. The experimental results are shown in Fig.  9 . When the step control signal was applied, the electromagnetic coil exhibited inductance, impeding the rise of the current in the coil. The input current was insufficient to overcome the spring’s preloading force, resulting in an increase in current without any movement of the spool. This corresponds to the period from 50 ms to point T A in Fig.  9 , during which the HSV had no flow output. In the second phase, the spool began to release, and the reverse cutting of the magnetic flux lines induced a counter-electromotive force, causing the driving current to decrease. This corresponds to the segment from T A to T B in Fig.  9 . However, the armature quickly stopped moving, and the driving current continued to rise to its maximum value, corresponding to the portion from point T B to just before the current started to decrease. Similarly, the closing process of the solenoid valve can be divided into two parts: after the control voltage is reduced to zero, due to the combined effect of coil inductance, eddy currents, and residual magnetism in the armature, the armature is still attracted for a certain period, corresponding to the segment from 150 ms to T A′ in Fig.  9 . When the current decreases to a level where the armature cannot remain attracted, the armature is pushed out by the resetting spring until it reaches the maximum air gap, corresponding to the segment from T A′ to T B′ in the graph. The opening time of the ball valve is approximately 8 ms, while the closing time is around 9 ms.

figure 9

Experiment result of dynamic performance.

To test the flow capacity of the TSV, a test setup was designed, as shown in Fig.  10 . The output pressure of the hydraulic pump was set to 31.5MPa, and after passing through the TSV, the fluid was connected to a flow meter. The flow meter provided flow signals to a data acquisition device, and fluid flow through the flow transmitter entered the relief valve, which simulated the load. Subsequently, the fluid flowed into the tank. The TSV was controlled by a step signal generated by a signal generator, and the oscilloscope collected the current output from the signal generator. A step signal with a duration of 1s and a voltage of DC12V was applied to the TSV system, and flow data was collected. The selection of experimental components is shown in Table 4 .

figure 10

Simulation model and experimental setup for dynamic performance.

Due to the unmeasurable valve spool position, as shown in Fig.  11 , an AMESim2021.1 dynamic simulation model was developed, and the total flow output of the TSV is shown in Fig.  11 . In the simulation, the maximum flow rate was 235 L/min when the pressure drop was 5 MPa. Experimental results indicated an actual output flow rate of approximately 225 L/min, slightly lower than the simulation results. One possible reason for the reduced output flow rate is that the clearance between the poppet valve and the valve sleeve in the manufactured prototype is larger than the value set in the simulation, resulting in an increased flow to the pressure control chamber, causing a decrease in the spool lift. The spool position curve shows that the maximum lift of the main valve is approximately 0.55 mm, the opening of the secondary stage is around 0.21 mm, and the output of the HSV is in terms of flow, with a maximum flow rate of approximately 3.8 L/min. The trend of flow changes is similar to the motion trend of the solenoid coil. When the main valve opens, the startup delay and opening time are 6 ms and 15 ms, respectively. This means that the main valve can fully open within approximately 21 ms. The response time of the main valve is consistent with the response time obtained from the flow changes, indicating that the rapid response of the main valve is crucial for the total flow rate.

figure 11

Experiment and simulation result of dynamic performance. ( a ) Flow rate; ( b ) spool displacement and flow rate of the HSV.

Proportional characteristic

Coal mining machinery requires control valves continuously controlling pressure, flow, and direction. By adjusting the duty cycle of the PWM signal, proportional output of the HSV can be achieved. The proportional control of the TSV can be realized by utilizing the proportional characteristics of the HSV. By inputting a 2 kHz PWM signal to the system and adjusting the duty cycle, the flow output of the HSV can be controlled, thereby controlling the lift of the main spool. Due to the smaller dead zone of the HSV compared to the secondary and main stages, the separate control of the TSV can be achieved by utilizing the dead zone characteristics of each stage.

Figure  12 a illustrates the comparison between simulated and experimental flow outputs of the TSV as the system input signal duty cycle increases. Both exhibit similar trends: a significant increase in flow begins around a duty cycle of approximately 0.62 in simulation, whereas in experimentation, this increase starts around a duty cycle of 0.7. Beyond a duty cycle of 0.75, the flow outputs from simulation and experimentation align closely, consistent with the results shown in Fig.  10 .

figure 12

Experiment and simulation result of proportional characteristic, ( a ) total flow rate; ( b ) spool displacement.

As shown in Fig.  12 b, by observing the displacement of the valve spool in the simulation, it is found that when the PWM ratio is less than 0.56, only the HSV can output flow, with a maximum flow rate of approximately 3 L/min. When the duty cycle increases, the secondary stage starts production flow, with a maximum flow rate of roughly 16 L/min. When the duty cycle reaches 0.62, all stages start to output flow, and the system’s output flow rate increases almost linearly with the increase in the duty cycle.

By comparing Fig.  12 a and b, it can be inferred that the difference in output flow is likely due to factors that interfere with the opening of the main valve spool in experiments, which start around a duty cycle of 0.7. These factors may include friction between the valve spool and sleeve, fluctuations in supply hydraulic pressure, and variations in component machining precision.

Cylinder control

The performance of the directional valve ultimately hinges on the precision of the positioning of the controlled hydraulic cylinder. To validate its control accuracy, an experimental setup was constructed, as depicted in Fig.  13 a. Four sets of TSVs were employed to govern the extension and retraction of the hydraulic cylinder, with a displacement sensor installed inside the cylinder to record displacement data. Figure  13 b illustrates the arrangement of a hydraulic cylinder positioning test bench controlled by a 3/4 proportional valve, utilized for comparative purposes with the performance of the TSVs.

figure 13

Experimental setup for cylinder control performance.

Open-loop control experiments were conducted on the position control system to verify the output characteristics of the TSV. As shown in Fig.  14 , the speed and position control of the hydraulic cylinder extension were achieved by adjusting the duty cycle of the input PWM signal. The overall adjustable range of the system’s speed exhibits three regions. When the duty cycle is less than 0.56, the extension speed is proportional to the signal ratio. The growth rate increases from a ratio of 0.56 to 0.62, reaching the maximum speed at a ratio of 0.62. The speed does not increase further with an increase in the duty cycle. The displacement curve of the hydraulic cylinder shows that the TSV allows for a rapid approach to the target and more precise displacement control by reducing the duty cycle when the target is close.

figure 14

Experiment result of cylinder control.

The experimental results of the hydraulic cylinder displacement tracking controlled by the TSV are shown in Fig.  15 . A trapezoidal displacement signal was input, and the hydraulic cylinder could extend and retract following the displacement signal. Based on the system’s tracking error curve, the positioning accuracy of the hydraulic cylinder controlled by the TSV is ± 0.15 mm. In contrast, the displacement tracking accuracy is ± 3 mm, indicating that the system's positioning accuracy is higher than the tracking accuracy. The positioning error curve exhibits continuous fluctuations, and the higher tracking speed of the displacement causes the fluctuation of the error value during displacement tracking. The TSV can only control the flow by continuously switching, resulting in repeated increases and decreases in the error. The error value still fluctuates during positioning but with a smaller amplitude. This may be due to minor leakage in the hydraulic cylinder or deviations introduced by the displacement sensor during signal transmission, affecting the valve opening.

figure 15

The experiment result of the following control.

To assess the control performance of the TSV, a PI control was applied using a sinusoidal position signal with control parameters k p  = 8 and k i  = 1. As a performance benchmark, a 3/4 proportional valve control system (depicted in Fig.  16 b) was employed, with the TSV system containing an emulsion and the proportional valve control system using hydraulic oil.

figure 16

Experiment result of cylinder PI control.

The experimental results are illustrated in Fig.  16 , where (a) shows the actuator displacement of the proportional valve control system, (b) depicts the actuator displacement error of the proportional system, (c) displays the actuator displacement of the TSV control system, and (d) presents the actuator displacement error of the TSV control system. Due to the PI controller utilized, both systems exhibit some lag error in position tracking. The error profiles of the two control systems are similar, with the TSV system showing smoother error curves and a maximum error magnitude smaller than that of the proportional valve control system.

Driven power

The entire TSV system is governed by the pilot HSV. Power control experiments were independently conducted on the pilot HSV, and the experimental findings are illustrated in Fig.  17 . Panel (a) displays the current within the HSV coil, with the control signal applied at 1 s. A dual-duty cycle control method was utilized to minimize control power consumption: the duty cycle was set to 1 at T  = 1 s and reduced to 0.7 after 0.1 s. Panel (b) demonstrates the flow rate of the pilot HSV, achieving an output flow rate of approximately 2.8 L/min. The PWM signal operates at 12 V, indicating that the TSV can be driven by power less than 1.2W.

figure 17

System control power test.

Conclusions

A novel low-power-driven three-stage poppet valve with an internal feedback structure is proposed in this study based on the displacement feedback principle. The structure utilizes a high-speed on/off valve as the pilot stage, enabling rapid response in high-pressure and water-based environments. It effectively avoids valve leakage and sensitivity to medium contamination, making it a potential replacement for traditional proportional directional valves in coal mining equipment.

Static analysis of the TSV structure was conducted to determine the lift of each stage under steady-state conditions. The state-space equations between the spools and the pressure control chamber system were established, and the step response was analyzed. A MOO design of critical parameters for the secondary and main stages was performed using the electric eel foraging algorithm. The optimization results showed that the solutions obtained by the optimization algorithm could approximate the Pareto front. The design parameters were selected using the analytic hierarchy process.

An experimental setup for the flow characteristics of the TSV was constructed. A prototype was manufactured based on the design parameters and subjected to experimental verification. The results showed that the output flow of the novel TSV was slightly smaller than the theoretical calculation, and the response time was somewhat longer than the theoretical calculation. Proportional control of the TSV was achieved by adjusting the ratio of the PWM control signal. Hydraulic cylinder control experiments were conducted, and a positioning accuracy of ± 0.15 mm was achieved. The TSV system shows smoother error curves and a maximum error magnitude smaller than that of the proportional valve control system.

Data availability

All data generated or analysed during this study are included in this published article.

Abbreviations

Control chamber section area ( A ci  = π( R u ) 2 )

Throttle grooves flow area ( A ti  =  W ti ( x i + x 0i ))

Flow area at the operating point

Flow coefficient

Throttle grooves flow coefficient

Damping rate

Medium bulk modulus

Friction between the spool and valve sleeve

1 For the main valve, 2 for the secondary stage, 3 for the HSV

Part “ i ” flow pressure gain

Part “ i ” flow displacement gain

Main valve moving parts mass

Return spring stiffness

Supply pressure

Load pressure

Pressure control chamber pressure

Inlet pressure at the operating point

Outlet pressure at operating point

Inlet pressure increment at the operating point

Outlet pressure increment at the operating point

Flow rate increment at the operating point

Spool displacement

Throttle grooves pre-opening

Displacement increment at the operating point

Area ratio of upper and lower end faces of the spool ( α i  =  R L / R u )

Medium density

Trechera, P. et al. Comprehensive evaluation of potential coal mine dust emissions in an open-pit coal mine in Northwest China. Int. J. Coal Geol. 235 , 103677 (2021).

Article   Google Scholar  

Dai, J., Tang, J., Huang, S. & Wang, Y. Signal-based intelligent hydraulic fault diagnosis methods: Review and prospects. Chin. J. Mech. Eng. 32 (1), 75 (2019).

Ng, F., Harding, J. A. & Glass, J. Improving hydraulic excavator performance through in line hydraulic oil contamination monitoring. Mech. Syst. Signal Process. 83 , 176–193 (2017).

Article   ADS   Google Scholar  

Shi, J. C., Ren, Y., Tang, H. S. & Xiang, J. W. Hydraulic directional valve fault diagnosis using a weighted adaptive fusion of multi-dimensional features of a multi-sensor. J. Zhejiang Univ.-Sci. A 23 (4), 257–271 (2022).

Kiama, N. & Ponchio, C. Photoelectrocatalytic reactor improvement towards oil-in-water emulsion degradation. J. Environ. Manag. 279 , 111568 (2021).

Li, D., Ma, X., Wang, S., Lu, Y. & Liu, Y. Failure analysis on the loose closure of the slipper ball-socket pair in a water hydraulic axial piston pump. Eng. Fail. Anal. 155 , 107718 (2024).

Fischer, F., Bady, D. & Schmitz, K. Leakage of metallic ball seat valves with anisotropic surfaces. Chem. Eng. Technol. 46 (1), 102–109 (2023).

Jing, L., Xi, Z., & Zhi, L. The experimental study of mine intrinsically safe electromagnet. In 2015 Joint International Mechanical, Electronic and Information Technology Conference (JIMET-15) , 967–970 (Atlantis Press, 2015).

Zhong, Q. et al. Dynamic performance and control accuracy of a novel proportional valve with a switching technology-controlled pilot stage. J. Zhejiang Univ.-Sci. A 23 (4), 272–285 (2022).

Qian, J. Y., Gao, Z. X., Wang, J. K. & Jin, Z. J. Experimental and numerical analysis of spring stiffness on flow and valve spool movement in pilot control globe valve. Int. J. Hydrogen Energy 42 (27), 17192–17201 (2017).

Liao, Y., Lian, Z., Feng, J., Yuan, H. & Zhao, R. Effects of multiple factors on water hammer induced by a large flow directional valve. Strojniski Vestnik/J. Mech. Eng. 64 (5), 329–338 (2018).

Google Scholar  

Wang, H., Chen, Z., Huang, J., Quan, L. & Zhao, B. Development of high-speed on–off valves and their applications. Chin. J. Mech. Eng. 35 (1), 67 (2022).

Qiu, H. & Su, Q. Simulation research of hydraulic stepper drive technology based on high speed on/off valves and miniature plunger cylinders. Micromachines 12 (4), 438 (2021).

Article   PubMed   PubMed Central   Google Scholar  

Elsaed, E., Abdelaziz, M. & Mahmoud, N. A. Using a neural network to minimize pressure spikes for binary-coded digital flow control units. Int. J. Fluid Power 20 , 323–352 (2019).

Yuan, X. et al. Dynamic modeling method for an electro-hydraulic proportional valve coupled mechanical–electrical-electromagnetic-fluid subsystems. J. Magn. Magn. Mater. 587 , 171312 (2023).

Zhichuang, C. H. E. N., Shenghong, G. E., Jiang, Y., Cheng, W. & Yuchuan, Z. H. U. Refined modeling and experimental verification of a torque motor for an electro-hydraulic servo valve. Chin. J. Aeronaut. 36 , 302–317 (2023).

Wu, S., Zhao, X., Li, C., Jiao, Z. & Qu, F. Multiobjective optimization of a hollow plunger type solenoid for high speed on/off valve. IEEE Trans. Ind. Electron. 65 (4), 3115–3124 (2017).

Tamburrano, P. et al. Full simulation of a piezoelectric double nozzle flapper pilot valve coupled with a main stage spool valve. Energy Procedia 148 , 487–494 (2018).

Ling, M., He, X., Wu, M. & Cao, L. Dynamic design of a novel high-speed piezoelectric flow control valve based on compliant mechanism. IEEE/ASME Trans. Mechatron. 27 (6), 4942–4950 (2022).

Zhao, R., Liao, Y., Lian, Z., Li, R. & Guo, Y. Research on the performance of a novel electro-hydraulic proportional directional valve with position-feedback groove. Proc. Inst. Mech. Eng. Part E J. Process Mech. Eng. 235 (6), 1930–1944 (2021).

Madsen, E. L., Jørgensen, J. M., Nørgård, C., & Bech, M. M. Design optimization of moving magnet actuated valves for digital displacement machines. In Fluid Power Systems Technology , Vol. 58332, V001T01A026 (American Society of Mechanical Engineers, 2017).

Noergaard, C., Bech, M. M., Roemer, D. B., & Pedersen, H. C. Optimization of moving coil actuators for digital displacement machines. In Proceedings of the 8th Workshop on Digital Fluid Power (DFP16) , 39–54 (Tampere University of Technology, 2016).

Xu, B., Ding, R., Zhang, J. & Su, Q. Modeling and dynamic characteristics analysis on a three-stage fast-response and large-flow directional valve. Energy Convers. Manag. 79 , 187–199 (2014).

Li-mei, Z. H. A. O., Huai-chao, W. U., Lei, Z. H. A. O., Yun-xiang, L. O. N. G., Guo-qiao, L. I., & Shi-hao, T. A. N. G. Optimization of the high-speed on-off valve of an automatic transmission. In IOP Conference Series: Materials Science and Engineering , Vol. 339, No. 1, 012035 (IOP Publishing, 2018).

Qingtong, L., Fanglong, Y., Songlin, N., Ruidong, H. & Hui, J. Multi-objective optimization of high-speed on-off valve based on surrogate model for water hydraulic manipulators. Fusion Eng. Des. 173 , 112949 (2021).

Liu, P., Fan, L., Xu, D., Ma, X., & Song, E. Multi-objective optimization of high-speed solenoid valve based on response surface and genetic algorithm (No. 2015-01-1350). SAE Technical Paper (2015).

António, C. C. Memeplex-based memetic algorithm for the multi-objective optimal design of composite structures. Compos. Struct. 329 , 117789 (2023).

Zhong, C., Li, G. & Meng, Z. Beluga whale optimization: A novel nature-inspired metaheuristic algorithm. Knowl.-Based Syst. 251 , 109215 (2022).

Zhao, W. et al. Electric eel foraging optimization: A new bio-inspired optimizer for engineering applications. Expert Syst. Appl. 238 , 122200 (2024).

Andersson, B. R. On the Valvistor, a proportionally controlled seat valve. (Linkoping Studies in Science and Technology, Dissertations, 1984).

Wang, H., Wang, X., Huang, J. & Quan, L. Flow control for a two-stage proportional valve with hydraulic position feedback. Chin. J. Mech. Eng. 33 (1), 1–13 (2020).

Park, S. H. Development of a proportional poppet-type water hydraulic valve. Proc. Inst. Mech. Eng. C J. Mech. Eng. Sci. 223 (9), 2099–2107 (2009).

Tavakkoli-Moghaddam, R. et al. Multi-objective boxing match algorithm for multi-objective optimization problems. Expert Syst. Appl. 239 , 122394 (2024).

Saaty, T. L. Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1 (1), 83–98 (2008).

Download references

Author information

Authors and affiliations.

School of Mechanical and Electrical Engineering, China University of Mining & Technology, Beijing, 100083, China

Aixiang Ma, Heruizhi Xiao, Yue Hao, Xihao Yan & Sihai Zhao

You can also search for this author in PubMed   Google Scholar

Contributions

Aixiang Ma and Heruizhi Xiao authored the main manuscript text, Yue Hao prepared Figs.  1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , Xihao Yan revised the manuscript format, and Sihai Zhao performed content review. All authors have reviewed the manuscript.

Corresponding author

Correspondence to Sihai Zhao .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ma, A., Xiao, H., Hao, Y. et al. Multi-objective optimization design of low-power-driven, large-flux, and fast-response three-stage valve. Sci Rep 14 , 21575 (2024). https://doi.org/10.1038/s41598-024-70353-2

Download citation

Received : 06 May 2024

Accepted : 14 August 2024

Published : 16 September 2024

DOI : https://doi.org/10.1038/s41598-024-70353-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Control valve
  • Engineering design
  • Hydraulic systems
  • High speed on/off valve

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

critical thinking experimental design

Experimental Study on the Behavior of Single Piles Under Combined Torsional and Vertical Loads in Contaminated Sand

  • Original Paper
  • Published: 16 September 2024

Cite this article

critical thinking experimental design

  • Nada Osama Ramadan 1 ,
  • Ahmed Mohamed Nasr 2 &
  • Waseim Ragab Azzam 2  

Contaminated soil can reduce the stability of structures and infrastructure, endangering their structural integrity. Hence, this study tries to determine how oil pollution influences the torsion behavior of model steel piles at varied soil densities. This study is critical for determining piles' structural integrity and stability in oil-contaminated situations. A mixture of heavy motor oil and clean sand samples was prepared in proportions ranging from 0 to 8% of the dry weight of the soil. In this study, the relative densities (Dr), pile slenderness ratio (Lp/Dp), oil concentration (O.C%), and contaminated sand layer thickness (LC) all varied. The study also includes an examination of piles of combined load (vertical and torsional). Results revealed that the pre-applied torsion force reduced the pile's vertical bearing capabilities. Furthermore, at Dr = 30%, we determined that the maximum vertical load under amalgamated load at constant torsional load T = (1/3Tu, 2/3Tu, and Tu) in cases of (Lc/Lp) = 1 and (Lp/Dp) = 13.3 is 1.67, 3.4, and 5% less than piles under pure vertical load, respectively. This highlights the importance of considering torsional forces in pile design to guarantee precise load-bearing capabilities. Engineers should carefully assess both vertical and torsional loads to optimize the performance and stability of piles in various conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

critical thinking experimental design

Data Availability

Not applicable.

Abdelhalim R et al (2020) Experimental and numerical studies of later ally loaded piles located near oil-contaminated sand slope. Eng Sci Technol Int J. https://doi.org/10.1016/j.jestch.2020.03.001

Article   Google Scholar  

ASTM (2010) ASTM D2487: standard practice for classification of soils for engineering purposes (Unified Soil Classification System). ASTM International, West Conshohocken

Google Scholar  

Basack S, Nimbalkar S (2017) Numerical solution of single pile subjected to torsional cyclic load. Int J Geomech 17(8):04017016. https://doi.org/10.1061/(ASCE)GM.1943-5622.0000905

Bizaliele MM (1992) Torsional cyclic loading response of a single pile in sand. Dissertation, presented to Schriftenreihe des Instituts fur Grundbau, Ruhr-Univ., Bochum, Germany

Bolton MD et al (1999) Centrifuge cone penetration tests in sand. Geotechnique 49(4):543–552. https://doi.org/10.1680/geot.1999.49.4.543

Craig WH, Sabagh SK (1994) Stress-level effects in model tests on piles. Can Geotech J 31(1):28–41. https://doi.org/10.1139/t94-004

Dhatrak A, Sabale S, Thakare S (2017) Performance of single pile in oil contaminated sand under axial loading. Int J Eng Sci Res Technol. https://doi.org/10.5281/zenodo.817966

Dutt RN (1976) Torsional response of piles in sand. Ph.D. thesis, University of Houston, Houston, TX, USA, 7 115–123

Dutt RN, O’Neill MW (1983) Torsional behaviour of model piles in sand. Geotech Pract Offshore Eng 315:334

Franke E, Muth G (1985) Scale effect in 1g model tests on horizontally loaded piles. In: Proceedings of the 11th international conference on soil mechanics and foundation engineering, San Francisco, CA, USA, vol 2, pp 1011–1014

Garnier J (2001) Physical models in geotechnics: state of the art and recent advances, 1st Coulomb Lecture, Paris

Garnier J et al (2007) Catalogue of scaling laws and similitude questions in geotechnical centrifuge modeling. Int J Phys Model Geotech 7(3):1–23. https://doi.org/10.1680/ijpmg.2007.070301

Georgiadis M, Saflekou S (1990) Piles under axial and torsional loads. Comput Geotech 9(4):291–305. https://doi.org/10.1016/0266352X(90)90043-U

Guo P, Zou X (2018) Bearing capacity of a single pile in sand under combined vertical force-horizontal force-torque load. Chin J Rock Mech Eng 31(11):2593–2600 ( in Chinese )

Khamehchiyan M, Charkhabi AH, Tajik M (2007) Effects of crude oil contamination on geotechnical properties of clayey and sandy soils. Eng Geol 89:220–229

Laue J, Sonntag T (1998) Pile subjected to torsion. In: Proceedings of centrifuge ’98, Balkema, Rotterdam, the Netherlands, pp 187–192

Leblanc C, Houlsby GT, Byrne BW (2010) Response of stiff piles in sand to long-term cyclic lateral loading. Geotechnique 60(2):79–90. https://doi.org/10.1680/geot.7.00196

Li Q, Stuedlein AW (2018) Simulation of torsionally-loaded deep foundations considering state-dependent load transfer. J Geotech Geoenviron Eng 144(8):0401805. https://doi.org/10.1061/(ASCE)GT.1943-5606.0001930

Li Q, Stuedlein AW, Barbosa AR (2017) Torsional load transferred of drilled shaft foundations. J Geotech Geoenviron Eng 143(8):725–735. https://doi.org/10.1061/(ASCE)GT.1943-5606.0001701

McVay MC, Hu Z (2003) Determine optimum depths of drilled shafts subject to combined torsion and lateral loads from centrifuge test ing (using KB polymer slurry)-supplemental research. Report no. BC-545, WO #4. Florida Department of Transportation, Tal lahassee, FL

McVay MC, Herrera R, Hu Z (2003) Determine optimum depths of drilled shafts subject to combined torsion and lateral loads using centrifuge testing. Rep. contract no. BC-354, RPWO #9

McVay MC, Bloomquist D, Thiyyakkandi S (2014) Field testing of jet grouted piles and drilled shafts. Report BDK75-977-41, 91977. Florida Department of Transportation, Tallahassee

Mohammadi A et al (2018) Axial compressive bearing capacity of piles in oil-contaminated sandy soil using FCV. Mar Georesour Geotechnol. https://doi.org/10.1080/1064119X.2017.1414904

Nabil FI (2001) Axial load tests on bored piles and pile groups in cemented sands. J Geotech Geoenviron Eng 127(9):766–773

Nasr AMA (2009) Experimental and theoretical studies for the behavior of strip fppting on oil-contaminated sand. J Geotech Geoenviron Eng ASCE 135(12):1814–1822

Nasr A (2013) Uplift behavior of vertical piles embedded in oil contaminated sand. J Geotech Geoenviron Eng. https://doi.org/10.1061/(ASCE)GT.1943-5606.0000739

Nasr A (2014) Experimental and theoretical studies of laterally loaded finned piles in sand. Can Geotech J 51:381–393

Nasr AAM, Azzam W (2017) Behaviour of eccentrically loaded strip footings resting on sand. Int J Phys Model Geotech. https://doi.org/10.1680/jphmg.16.00008

Nasr A, Krishna Rao S (2016) Behaviour of laterally loaded pile groups embedded in oil-contaminated sand. Géotechnique 66(1):58–70. https://doi.org/10.1680/jgeot.15.P.076

Nasr et al (2023) Exemplary studies on the torsional response of single pile in contaminated sand. J Eng Res 7(6): Article 3. https://digitalcommons.aaru.edu.jo/erjeng/vol7/iss6/3

Poulos HG (1975) Torsional response of piles. J Geotech Eng Div 101(10):1019–1035. https://doi.org/10.1061/AJGEB6.0000203

Poulos HG, Davis EH (1980) Pile foundation analysis and design. In: Rainbow bridge book. University of Sydney

Ramadan et al (2023) Model study of the geotechnical behavior of a single pile under torsional load in contaminated sand. Arab J Geosci 16:674. https://doi.org/10.1007/s12517-023-11793-4

Randolph MF (1983) Design consideration for offshore piles. Geotech Pract Offshore Eng. https://doi.org/10.1016/0148-9062(85)92244-2

Ratnaweera P, Meegoda JN (2006) Shear strength and stress strain behavior of contaminated soils. Geotech Test J 29(2):1–8

Reddy KM, Ayothiraman R (2015) Experimental studies on behaviour of single pile under combined uplift and lateral loading. J Geotech Geoenviron Eng 141(7):04015030

Sakr M, Nazir AK, Azzam WR, Sallam A (2016) Behavior of grouted single screw piles under inclined tensile loads in sand. Electron J Geotech Eng 21:571–591

Sakr M, Nazir AK, Azzam WR, Sallam A (2020) Model study of single pile with wings under uplift loads. Appl Ocean Res 100(102187):1–17

Stoll UW (1972) Torque shear test of cylindrical friction piles. J Civ Eng 42(4):63–65

Tawfiq K (2000) Drilled shafts under torsional loading conditions. Report no. B-9191, Florida Department of Transportation, Tallahassee

Thiyyakkandi S, McVay MC, Lai P, Herrera R (2016) Full-scale coupled torsion and lateral response of mast arm drilled shaft foundations Can. Geotech J 53(12):1928–1938. https://doi.org/10.1139/cgj-2016-0241

Wang WS, Jiang J, Fu ZC, Ou DX, Tang J (2020) Analysis of the bearing characteristics of single piles under vertical and torsional combined loads. J Hindawi. https://doi.org/10.1155/2021/8896673

Zhang LM, Kong LG (2006) Centrifuge modeling of torsional response of piles in sand. Can Geotech J 43(5):500–515. https://doi.org/10.1139/t06-020

Zou XI, Zhou CH, Wang YI (2021) Experimental studies on the behaviour of single pile under combined vertical-torsional loads in layered soil. J Appl Ocean Res. https://doi.org/10.1016/j.apor.2020.102457

Download references

The authors did not receive support from any organization for the submitted work. No funding was received to assist with the preparation of this manuscript. No funding was received for conducting this study and No funds, grants, or other support was received.

Author information

Authors and affiliations.

Structural Engineering Department, Faculty of Engineering, Tanta University, Tanta, Egypt

Nada Osama Ramadan

Faculty of Engineering, Tanta University, Tanta, Egypt

Ahmed Mohamed Nasr & Waseim Ragab Azzam

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Nada Osama Ramadan .

Ethics declarations

Competing interests.

The authors have no competing interests in advertising related to the content of this article.

Ethical Approval and Consent to Participate

This research did not contain any studies involving animal or human participants, nor did it take place on any private or protected areas. No specific permissions were required for corresponding locations. All authors agree to Participate.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Ramadan, N.O., Nasr, A.M. & Azzam, W.R. Experimental Study on the Behavior of Single Piles Under Combined Torsional and Vertical Loads in Contaminated Sand. Geotech Geol Eng (2024). https://doi.org/10.1007/s10706-024-02921-2

Download citation

Received : 29 May 2024

Accepted : 07 August 2024

Published : 16 September 2024

DOI : https://doi.org/10.1007/s10706-024-02921-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Torsion-vertical load
  • Oil-contaminated sand
  • Twist angle
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. 15 Experimental Design Examples (2024)

    critical thinking experimental design

  2. 5 Ways to Make Experimental Design A More Approachable Topic

    critical thinking experimental design

  3. Experimental Study Design: Research, Types of Design, Methods and

    critical thinking experimental design

  4. What Should You Consider Before Designing Your Research Study?

    critical thinking experimental design

  5. Experimental Design Steps

    critical thinking experimental design

  6. Critical thinking

    critical thinking experimental design

VIDEO

  1. Research Design || Research Design Types || Research Methodology

  2. Design Thinking

  3. Graphic Design Trends 2024 (All you need to know)

  4. Black Consciousness Yesterday and Today

  5. Thinking With Portals: An Experimental Machinima

  6. 💡 Welcome to Design Thinking: Ignite Creativity and Innovation

COMMENTS

  1. Guide to Experimental Design

    Table of contents. Step 1: Define your variables. Step 2: Write your hypothesis. Step 3: Design your experimental treatments. Step 4: Assign your subjects to treatment groups. Step 5: Measure your dependent variable. Other interesting articles. Frequently asked questions about experiments.

  2. Training doctoral students in critical thinking and experimental design

    Additional measures to assess the course goals in improving critical thinking, experimental design and self-efficacy in experimental design will be implemented using validated tests [22, 43,44,45]. Further studies are also required to determine the long-term impact of this training on student performance in the laboratory and progression ...

  3. Relationship between creative thinking and experimental design thinking

    Creative thinking shares several attributes with experimental design thinking which is similar with scientific process skills comprised by comparing, classifying, inferring and predicting, hypothesizing, defining, and controlling variables etc. (Dikici et al., 2018). Experimental design thinking and creative thinking may be related.

  4. Mapping the Relationship Between Critical Thinking and Design Thinking

    Critical thinking has been a longstanding goal of education, while design thinking has gradually emerged as a popular method for supporting entrepreneurship, innovation, and problem solving in modern business. While some scholars have posited that design thinking may support critical thinking, empirical research examining the relationship between these two modes of thinking is lacking because ...

  5. Evaluating Experimental Research (Chapter 11)

    Summary. Experiments allow researchers to determine cause and effect relations between variables. As such, they are a critical component in the advance of scientific psychology. In this chapter, we discuss the theory behind the design of good experiments, and provide a sample study for evaluation. We outline three important types of replication ...

  6. Part 1: Introduction to Experimental Design

    1.01 Identify and create questions and hypotheses that can be answered through scientific investigations. 1.02 Develop appropriate experimental procedures for: Given questions. Student generated questions. 1.04 Analyze variables in scientific investigations: Identify dependent and independent. Use of a control.

  7. How Students Think about Experimental Design: Novel Conceptions

    Experimental design is a fundamental skill, essential for achieving success in science (Coil et al. 2010) and gaining fluency in scientific literacy and critical thinking in general (Brewer and Smith 2011).However, explicit instruction and practice in experimental design is often lacking in introductory biology lecture courses because of perceived time pressures, large class sizes, and the ...

  8. Relationship between creative thinking and experimental design thinking

    Experimental design thinking and creative thinking have been increasingly recognized as the crucial basic thinking to promote scientific and technological innovation. ... whereas divergers are likely to be arts specialists (Hudson, 1966, 1968). Critical thinking is considered crucial factor to facilitating problem solving in design fields such ...

  9. CREATE Cornerstone: Introduction to Scientific Thinking, a New Course

    Critical-Thinking and Experimental Design Skills—Tools of Science. A significant number of students show interest in science in high school or before (often significantly before [Gopnik, 2012]), but do not pursue STEM studies at the tertiary level.

  10. An introduction to different types of study design

    Prospective: we follow the individuals in the future to know who will develop the disease. Retrospective: we look to the past to know who developed the disease (e.g. using medical records) This design is the strongest among the observational studies. For example - to find out the relative risk of developing chronic obstructive pulmonary ...

  11. What influences students' abilities to critically evaluate scientific

    Critical thinking is the process by which people make decisions about what to trust and what to do. Many undergraduate courses, such as those in biology and physics, include critical thinking as an important learning goal. ... To help students discern information about experimental design, we suggest that instructors consider providing them ...

  12. PDF Training Doctoral Students in Critical Thinking and Experimental Design

    Keywords: Graduate, Training, Critical Thinking, Experimental Design, Problem-Based Learning Introduction For over a decade there have been calls to reform biomedical graduate education. There are two main problems that led to these recommendations and therefore two different prescriptions to solve these problems.

  13. Experimental design

    Experiments in science are creative, iterative, & source critical thinking. We naturally experiment in art, science, and life. Here, we hone these skills through principles and practice. The principles are here, and the practice is in the form a lab manual entitled Designcraft for experiments.

  14. 8.1 Experimental design: What is it and when should it be used?

    Experimental group- the group in an experiment that receives the intervention; Posttest- a measurement taken after the intervention; Posttest-only control group design- a type of experimental design that uses random assignment, and an experimental and control group, but does not use a pretest; Pretest- a measurement taken prior to the intervention

  15. What influences students' abilities to critically evaluate ...

    Critical thinking is the process by which people make decisions about what to trust and what to do. Many undergraduate courses, such as those in biology and physics, include critical thinking as an important learning goal. ... To help students discern information about experimental design, we suggest that instructors consider providing them ...

  16. Design Thinking: Experimental Evidence of Ideation Strategies That

    Organizations and universities use Design Thinking (DT) to facilitate team innovation. However, few empirical DT studies have quantified it. Across two experiments, each based on semester-long DT projects to generate innovative solutions to sustainability problems, several different DT strategies were compared.

  17. Full article: Cultivating Critical Thinking Skills: a Pedagogical Study

    Second, our quasi-experimental design results provide evidence that the ICM has positively impacted students' critical thinking skills. Results have shown improvement in all critical thinking dimensions researched in this study: presenting evidence, explaining issues, articulating influence of context and assumptions, and systematically ...

  18. Evaluating Experimental Research (Chapter 2)

    Critical Thinking in Psychology - September 2006. The author of this quote is Edwin G. Boring (1886-1968), one of the great psychologists of the 20th century and author of A History of Experimental Psychology (1929; the quote comes from p. 659). Contemporary psychologists take "the psychology experiment" as a given, but it is actually a relatively recent cultural invention.

  19. Designing Learning Environments for Critical Thinking: Examining

    Fostering the development of students' critical thinking (CT) is regarded as an essential outcome of higher education. However, despite the large body of research on this topic, there has been little consensus on how educators best support the development of CT. ... This study employed a quasi-experimental design involving 147 first-year ...

  20. Training doctoral students in critical thinking and experimental design

    Pre- and post-tests assessing students' proficiency in experimental design were used to measure student learning. Results: The analysis of the outcomes of the course suggests the training is effective in improving experimental design. The course was well received by the students as measured by student evaluations (Kirkpatrick Model Level 1).

  21. Enhancing students' critical thinking and creative thinking: An

    A quasi-experimental design was adopted to explore the effects of the proposed learning approach on students' performance in art appreciation, digital painting creation, creative thinking tendency, and critical thinking awareness. A total of 48 students from two classes in a university in central Taiwan were recruited to participate in this ...

  22. How to teach critical thinking: an experimental study with three

    The aim of this study was to examine the effects of critical thinking (CT) teaching involving general, immersion, and mixed approaches on the CT skills and dispositions of high-school students. The study, which had three experimental groups (EG) and one control group, employed a pretest-posttest control-group quasi-experimental design.

  23. Teaching critical thinking about health information and choices in

    Conclusion Teaching critical thinking about health is possible within the current Kenyan lower secondary school curriculum, but the learning resources will need to be designed for inclusion in and ...

  24. Training doctoral students in critical thinking and experimental design

    Additional measures to assess the course goals in improving critical thinking, experimental design and self-efficacy in experimental design will be implemented using validated tests [22, 43-45]. Further studies are also required to determine the long-term impact of this training on student performance in the laboratory and progression towards ...

  25. Supporting the evaluation of authentic assessment in environmental

    2.2. Evaluate and redesign project-based learning (PBL) course. We used the eight "critical questions" proposed by Ashford-Rowe et al. (Citation 2014) to assess the existing syllabus and redesign the authentic assessment.Although the course comprised most of the critical elements, several elements were added (Table 1): First, we provided more structure to the project by including a mid ...

  26. Multi-objective optimization design of low-power-driven, large-flux

    A prototype was manufactured based on the obtained parameters and subjected to simulation and experimental verification. The results demonstrate that the valve has an opening time of 21 ms, a ...

  27. How to teach critical thinking: an experimental study with three

    The aim of this study was to examine the effects of critical thinking (CT) teaching involving general, immersion, and mixed approaches on the CT skills and dispositions of high-school students. The study, which had three experimental groups (EG) and one control group, employed a pretest-posttest control-group quasi-experimental design. CT teaching was initiated with a general approach in EG ...

  28. Experimental Study on the Behavior of Single Piles Under Combined

    Contaminated soil can reduce the stability of structures and infrastructure, endangering their structural integrity. Hence, this study tries to determine how oil pollution influences the torsion behavior of model steel piles at varied soil densities. This study is critical for determining piles' structural integrity and stability in oil-contaminated situations. A mixture of heavy motor oil and ...