4 DOE Case Studies

Peter Goos presents four case studies from the Optimal Design of Experiments book he co-authored with Bradley Jones. He introduces each case by describing the type of experiment and when and why it is useful. Next, he demonstrates how to set up the experiment in the JMP software and how to analyze the resulting data.

Presentation material taken from Optimal Design of Experiments: A Case Study Approach by P. Goos and B. Jones .

CASE STUDIES

Yield maximization experiment: a response surface design in an irregularly shaped factor region.

The presenter designs an experiment for investigating a chemical reaction using a full cubic model in two factors. Because many factor-level combinations are known to be infeasible in advance, the two factors cannot be varied completely independently of each other.

Stability Improvement Experiment: A Screening Design in Blocks

The presenter creates a blocking design to determine what factors impact the stability of a vitamin. The process mean shifts randomly from day to day and the investigator wants to explicitly account for these shifts in the statistical model.

Wind Tunnel Experiment: A Split-plot Design

The presenter designs an experiment run in a wind tunnel where it is possible to vary certain factors without shutting down the tunnel and other factors can only be changed by shutting down the tunnel.

Laser Etching Experiment: A Definitive Screening Design in Blocks

The presenter creates a definitive screening design to determine optimal settings for a laser etching process.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Experimental Design: Definition and Types

By Jim Frost 3 Comments

What is Experimental Design?

An experimental design is a detailed plan for collecting and using data to identify causal relationships. Through careful planning, the design of experiments allows your data collection efforts to have a reasonable chance of detecting effects and testing hypotheses that answer your research questions.

An experiment is a data collection procedure that occurs in controlled conditions to identify and understand causal relationships between variables. Researchers can use many potential designs. The ultimate choice depends on their research question, resources, goals, and constraints. In some fields of study, researchers refer to experimental design as the design of experiments (DOE). Both terms are synonymous.

Scientist who developed an experimental design for her research.

Ultimately, the design of experiments helps ensure that your procedures and data will evaluate your research question effectively. Without an experimental design, you might waste your efforts in a process that, for many potential reasons, can’t answer your research question. In short, it helps you trust your results.

Learn more about Independent and Dependent Variables .

Design of Experiments: Goals & Settings

Experiments occur in many settings, ranging from psychology, social sciences, medicine, physics, engineering, and industrial and service sectors. Typically, experimental goals are to discover a previously unknown effect , confirm a known effect, or test a hypothesis.

Effects represent causal relationships between variables. For example, in a medical experiment, does the new medicine cause an improvement in health outcomes? If so, the medicine has a causal effect on the outcome.

An experimental design’s focus depends on the subject area and can include the following goals:

  • Understanding the relationships between variables.
  • Identifying the variables that have the largest impact on the outcomes.
  • Finding the input variable settings that produce an optimal result.

For example, psychologists have conducted experiments to understand how conformity affects decision-making. Sociologists have performed experiments to determine whether ethnicity affects the public reaction to staged bike thefts. These experiments map out the causal relationships between variables, and their primary goal is to understand the role of various factors.

Conversely, in a manufacturing environment, the researchers might use an experimental design to find the factors that most effectively improve their product’s strength, identify the optimal manufacturing settings, and do all that while accounting for various constraints. In short, a manufacturer’s goal is often to use experiments to improve their products cost-effectively.

In a medical experiment, the goal might be to quantify the medicine’s effect and find the optimum dosage.

Developing an Experimental Design

Developing an experimental design involves planning that maximizes the potential to collect data that is both trustworthy and able to detect causal relationships. Specifically, these studies aim to see effects when they exist in the population the researchers are studying, preferentially favor causal effects, isolate each factor’s true effect from potential confounders, and produce conclusions that you can generalize to the real world.

To accomplish these goals, experimental designs carefully manage data validity and reliability , and internal and external experimental validity. When your experiment is valid and reliable, you can expect your procedures and data to produce trustworthy results.

An excellent experimental design involves the following:

  • Lots of preplanning.
  • Developing experimental treatments.
  • Determining how to assign subjects to treatment groups.

The remainder of this article focuses on how experimental designs incorporate these essential items to accomplish their research goals.

Learn more about Data Reliability vs. Validity and Internal and External Experimental Validity .

Preplanning, Defining, and Operationalizing for Design of Experiments

A literature review is crucial for the design of experiments.

This phase of the design of experiments helps you identify critical variables, know how to measure them while ensuring reliability and validity, and understand the relationships between them. The review can also help you find ways to reduce sources of variability, which increases your ability to detect treatment effects. Notably, the literature review allows you to learn how similar studies designed their experiments and the challenges they faced.

Operationalizing a study involves taking your research question, using the background information you gathered, and formulating an actionable plan.

This process should produce a specific and testable hypothesis using data that you can reasonably collect given the resources available to the experiment.

  • Null hypothesis : The jumping exercise intervention does not affect bone density.
  • Alternative hypothesis : The jumping exercise intervention affects bone density.

To learn more about this early phase, read Five Steps for Conducting Scientific Studies with Statistical Analyses .

Formulating Treatments in Experimental Designs

In an experimental design, treatments are variables that the researchers control. They are the primary independent variables of interest. Researchers administer the treatment to the subjects or items in the experiment and want to know whether it causes changes in the outcome.

As the name implies, a treatment can be medical in nature, such as a new medicine or vaccine. But it’s a general term that applies to other things such as training programs, manufacturing settings, teaching methods, and types of fertilizers. I helped run an experiment where the treatment was a jumping exercise intervention that we hoped would increase bone density. All these treatment examples are things that potentially influence a measurable outcome.

Even when you know your treatment generally, you must carefully consider the amount. How large of a dose? If you’re comparing three different temperatures in a manufacturing process, how far apart are they? For my bone mineral density study, we had to determine how frequently the exercise sessions would occur and how long each lasted.

How you define the treatments in the design of experiments can affect your findings and the generalizability of your results.

Assigning Subjects to Experimental Groups

A crucial decision for all experimental designs is determining how researchers assign subjects to the experimental conditions—the treatment and control groups. The control group is often, but not always, the lack of a treatment. It serves as a basis for comparison by showing outcomes for subjects who don’t receive a treatment. Learn more about Control Groups .

How your experimental design assigns subjects to the groups affects how confident you can be that the findings represent true causal effects rather than mere correlation caused by confounders. Indeed, the assignment method influences how you control for confounding variables. This is the difference between correlation and causation .

Imagine a study finds that vitamin consumption correlates with better health outcomes. As a researcher, you want to be able to say that vitamin consumption causes the improvements. However, with the wrong experimental design, you might only be able to say there is an association. A confounder, and not the vitamins, might actually cause the health benefits.

Let’s explore some of the ways to assign subjects in design of experiments.

Completely Randomized Designs

A completely randomized experimental design randomly assigns all subjects to the treatment and control groups. You simply take each participant and use a random process to determine their group assignment. You can flip coins, roll a die, or use a computer. Randomized experiments must be prospective studies because they need to be able to control group assignment.

Random assignment in the design of experiments helps ensure that the groups are roughly equivalent at the beginning of the study. This equivalence at the start increases your confidence that any differences you see at the end were caused by the treatments. The randomization tends to equalize confounders between the experimental groups and, thereby, cancels out their effects, leaving only the treatment effects.

For example, in a vitamin study, the researchers can randomly assign participants to either the control or vitamin group. Because the groups are approximately equal when the experiment starts, if the health outcomes are different at the end of the study, the researchers can be confident that the vitamins caused those improvements.

Statisticians consider randomized experimental designs to be the best for identifying causal relationships.

If you can’t randomly assign subjects but want to draw causal conclusions about an intervention, consider using a quasi-experimental design .

Learn more about Randomized Controlled Trials and Random Assignment in Experiments .

Randomized Block Designs

Nuisance factors are variables that can affect the outcome, but they are not the researcher’s primary interest. Unfortunately, they can hide or distort the treatment results. When experimenters know about specific nuisance factors, they can use a randomized block design to minimize their impact.

This experimental design takes subjects with a shared “nuisance” characteristic and groups them into blocks. The participants in each block are then randomly assigned to the experimental groups. This process allows the experiment to control for known nuisance factors.

Blocking in the design of experiments reduces the impact of nuisance factors on experimental error. The analysis assesses the effects of the treatment within each block, which removes the variability between blocks. The result is that blocked experimental designs can reduce the impact of nuisance variables, increasing the ability to detect treatment effects accurately.

Suppose you’re testing various teaching methods. Because grade level likely affects educational outcomes, you might use grade level as a blocking factor. To use a randomized block design for this scenario, divide the participants by grade level and then randomly assign the members of each grade level to the experimental groups.

A standard guideline for an experimental design is to “Block what you can, randomize what you cannot.” Use blocking for a few primary nuisance factors. Then use random assignment to distribute the unblocked nuisance factors equally between the experimental conditions.

You can also use covariates to control nuisance factors. Learn about Covariates: Definition and Uses .

Observational Studies

In some experimental designs, randomly assigning subjects to the experimental conditions is impossible or unethical. The researchers simply can’t assign participants to the experimental groups. However, they can observe them in their natural groupings, measure the essential variables, and look for correlations. These observational studies are also known as quasi-experimental designs. Retrospective studies must be observational in nature because they look back at past events.

Imagine you’re studying the effects of depression on an activity. Clearly, you can’t randomly assign participants to the depression and control groups. But you can observe participants with and without depression and see how their task performance differs.

Observational studies let you perform research when you can’t control the treatment. However, quasi-experimental designs increase the problem of confounding variables. For this design of experiments, correlation does not necessarily imply causation. While special procedures can help control confounders in an observational study, you’re ultimately less confident that the results represent causal findings.

Learn more about Observational Studies .

For a good comparison, learn about the differences and tradeoffs between Observational Studies and Randomized Experiments .

Between-Subjects vs. Within-Subjects Experimental Designs

When you think of the design of experiments, you probably picture a treatment and control group. Researchers assign participants to only one of these groups, so each group contains entirely different subjects than the other groups. Analysts compare the groups at the end of the experiment. Statisticians refer to this method as a between-subjects, or independent measures, experimental design.

In a between-subjects design , you can have more than one treatment group, but each subject is exposed to only one condition, the control group or one of the treatment groups.

A potential downside to this approach is that differences between groups at the beginning can affect the results at the end. As you’ve read earlier, random assignment can reduce those differences, but it is imperfect. There will always be some variability between the groups.

In a  within-subjects experimental design , also known as repeated measures, subjects experience all treatment conditions and are measured for each. Each subject acts as their own control, which reduces variability and increases the statistical power to detect effects.

In this experimental design, you minimize pre-existing differences between the experimental conditions because they all contain the same subjects. However, the order of treatments can affect the results. Beware of practice and fatigue effects. Learn more about Repeated Measures Designs .

Assigned to one experimental condition Participates in all experimental conditions
Requires more subjects Fewer subjects
Differences between subjects in the groups can affect the results Uses same subjects in all conditions.
No order of treatment effects. Order of treatments can affect results.

Design of Experiments Examples

For example, a bone density study has three experimental groups—a control group, a stretching exercise group, and a jumping exercise group.

In a between-subjects experimental design, scientists randomly assign each participant to one of the three groups.

In a within-subjects design, all subjects experience the three conditions sequentially while the researchers measure bone density repeatedly. The procedure can switch the order of treatments for the participants to help reduce order effects.

Matched Pairs Experimental Design

A matched pairs experimental design is a between-subjects study that uses pairs of similar subjects. Researchers use this approach to reduce pre-existing differences between experimental groups. It’s yet another design of experiments method for reducing sources of variability.

Researchers identify variables likely to affect the outcome, such as demographics. When they pick a subject with a set of characteristics, they try to locate another participant with similar attributes to create a matched pair. Scientists randomly assign one member of a pair to the treatment group and the other to the control group.

On the plus side, this process creates two similar groups, and it doesn’t create treatment order effects. While matched pairs do not produce the perfectly matched groups of a within-subjects design (which uses the same subjects in all conditions), it aims to reduce variability between groups relative to a between-subjects study.

On the downside, finding matched pairs is very time-consuming. Additionally, if one member of a matched pair drops out, the other subject must leave the study too.

Learn more about Matched Pairs Design: Uses & Examples .

Another consideration is whether you’ll use a cross-sectional design (one point in time) or use a longitudinal study to track changes over time .

A case study is a research method that often serves as a precursor to a more rigorous experimental design by identifying research questions, variables, and hypotheses to test. Learn more about What is a Case Study? Definition & Examples .

In conclusion, the design of experiments is extremely sensitive to subject area concerns and the time and resources available to the researchers. Developing a suitable experimental design requires balancing a multitude of considerations. A successful design is necessary to obtain trustworthy answers to your research question and to have a reasonable chance of detecting treatment effects when they exist.

Share this:

case study design of experiments

Reader Interactions

' src=

March 23, 2024 at 2:35 pm

Dear Jim You wrote a superb document, I will use it in my Buistatistics course, along with your three books. Thank you very much! Miguel

' src=

March 23, 2024 at 5:43 pm

Thanks so much, Miguel! Glad this post was helpful and I trust the books will be as well.

' src=

April 10, 2023 at 4:36 am

What are the purpose and uses of experimental research design?

Comments and Questions Cancel reply

W

  • General & Introductory Statistics
  • Applied Probability & Statistics
  • Experimental Design

case study design of experiments

Optimal Design of Experiments: A Case Study Approach

ISBN: 978-0-470-74461-1

August 2011

case study design of experiments

Peter Goos , Bradley Jones

"It's been said: 'Design for the experiment, don't experiment for the design.' This book ably demonstrates this notion by showing how tailor-made, optimal designs can be effectively employed to meet a client's actual needs. It should be required reading for anyone interested in using the design of experiments in industrial settings." — Christopher J. Nachtsheim , Frank A Donaldson Chair in Operations Management, Carlson School of Management, University of Minnesota 

This book demonstrates the utility of the computer-aided optimal design approach using real industrial examples. These examples address questions such as the following:

  • How can I do screening inexpensively if I have dozens of factors to investigate?
  • What can I do if I have day-to-day variability and I can only perform 3 runs a day?
  • How can I do RSM cost effectively if I have categorical factors?
  • How can I design and analyze experiments when there is a factor that can only be changed a few times over the study?
  • How can I include both ingredients in a mixture and processing factors in the same study?
  • How can I design an experiment if there are many factor combinations that are impossible to run?
  • How can I make sure that a time trend due to warming up of equipment does not affect the conclusions from a study?
  • How can I take into account batch information in when designing experiments involving multiple batches?
  • How can I add runs to a botched experiment to resolve ambiguities?

While answering these questions the book also shows how to evaluate and compare designs. This allows researchers to make sensible trade-offs between the cost of experimentation and the amount of information they obtain.

Bradley Jones , Senior Manager, Statistical Research and Development in the JMP division of SAS, where he leads the development of design of experiments (DOE) capabilities in JMP software. Dr. Jones is widely published on DOE in research journals and the trade press. His current interest areas are design of experiments, PLS, computer aided statistical pedagogy, and graphical user interface design.

From David E. Gray \(2014\). Doing Research in the Real World \(3rd ed.\) London, UK: Sage.

  • How it works
  • DOE for assay development
  • DOE for media optimization
  • DOE for purification process development
  • Purification process development
  • DNA assembly
  • Case studies
  • DOE training course
  • DOE Masterclass
  • Automation Masterclass
  • Synthesis newsletter
  • Dilution calculator

Request a Demo

  • Applications overview
  • DOE overview
  • DOE for miniaturized purification
  • Assays overview
  • Bioprocess development overview
  • Miniaturized purification
  • Molecular biology overview
  • Other resources
  • Life at Synthace

Implementing Design of Experiments (DOE): A practical example

You may have an understanding of what Design of Experiments (DOE) is in theory. But what happens when DOE collides with the real world? 

Implementing DOE in a busy laboratory is, of course, a nuanced topic—and there’s plenty of ways to approach it. 

DOE implementation with a practical example: 7 elements to consider

Let’s jump straight in with a real-life example. Imagine that we want to optimize the expression of a target protein in bacterial cell culture.

Based on our experience with DOE campaigns, here are the most important elements to consider, and how we’d approach them in this scenario:

1. Using DOE tools vs doing it manually

In theory, you can create, execute, and analyze this DOE example (and any other DOE, for that matter) with little more than a pipette, pen, and paper. 

But it’ll be hard to do more than scratch the surface without the proper tools. For something as complex as protein expression, you’re going to need a hefty toolbox to help you with each stage.

Software for DOE

Let’s begin with DOE software. 

DOE rests on a well-established and robust mathematical foundation. Technically, you can do the math by hand. But it’s hard work, error-prone, and requires specialized mathematical knowledge. Using DOE software helps reduce the risk of a mathematical slip. 

And thankfully, over the last few years, DOE software has become more accessible to scientists—which lowers the barriers to entry for non-statisticians. 

By creating and assessing different designs, analyzing the data, and building models with software, you’ll also find it easier to decide your next action or iteration.

Automation hardware for DOE

Biological DOEs—including our example of optimizing protein production—typically involves liquid handling and analytics. 

Manually handling small quantities of liquid is feasible. 

That being said, DOEs are more complex than most protocols. They typically employ dozens or hundreds of runs—and the variations between runs are minute. For DOEs that surpass a certain scale, you’d end up driving yourself mad trying to manually pipette 10 or more liquids into wells that are millimeters apart. All at variable volumes, in an unpredictable layout. 

To unleash the full potential of DOE, working by hand without making any errors would be nothing short of a miracle, even for the most practiced pipettor. 

Automation hardware would instantly relieve you of the burden, and radically speed up time to insight.

Worth noting: If you go down the automation route, you will, of course, need to integrate the output of your DOE software with the software that controls their lab automation. Automation engineers can help make the transition from manual to automated liquid handling and ease DOE implementation. Though we know that this can create a new bottleneck. And DOE can be complex enough without worrying about shifting toward fully automated experimentation. 

That’s why at Synthace, we’ve created a more accessible kind of DOE software —the kind that doesn’t require an automation engineer’s specialist scripting or coding knowledge. 

Synthace – a zero-code DOE software for practical automated DOE implementation

But we digress...

2. Framing your question as a hypothesis

A large part of the power of DOE resides in the process . It’s a campaign approach encompassing screening, refinement and iteration, optimization, and assessing robustness. 

So before you begin, you need to sketch out a plan for your campaign. And as with every scientific experiment, you always start by framing your question as a hypothesis. 

Returning to our growth experiment: Producing compound ‘X’ in bacteria depends on a complex interaction between genetics and environment. 

So our hypothesis would be: By varying aspects of genetics and environment, we will discover what’s important, and how they affect one another. This will help us optimize production.

3. Choosing factors based on what you know...

After forming a hypothesis, the next stage is to start thinking about which factors to investigate, and how to change them. 

By factors, we mean variables in your experiment. There can be genetic factors and environmental factors. In our protein optimization with DOE, an example of a genetic factor could be which promoter we use, and an environmental one could be the overnight growth temperature. 

To avoid spending too much effort re-learning things that are already known, we recommend using all of the knowledge you can get. 

For instance, if you know which growth media achieve high yields when you’re trying to optimize protein production with DOE, there’s usually no need to confirm this experimentally. 

However, you can investigate a biologically plausible change to the media (e.g., zinc availability may be limiting) alongside other media, genetic and process factors, and interactions (e.g., between zinc and manganese).

4. ... Without relying on your knowledge completely

Having said that, familiarity should not breed complacency. There’s a line to toe here: It’s all too easy to develop experiments that confirm, rather than test, hypotheses. Not having a well-developed and robust theoretical framework for your experiment will prevent you from getting to grips with the complexity of your system.

So, be open-minded. Don’t assume you know everything. DOE helps you investigate your system in an unbiased way, which often reveals new insights and generates novel hypotheses. 

For instance, the formulae for many cell growth media are handed down and used unquestioningly by generations of scientists. After all, why would you risk taking something out if your cells might not grow properly? 

But calculated risks are part of science.

Cell growth is complex and there’s no perfect medium that gives excellent results in every possible case. It’s likely that many ingredients aren’t necessary for specific applications or may even be harmful: High levels of zinc may inhibit the growth of certain bacteria, for example. 

Investigating the composition of such apparently standard parts of the workflow can be useful: Some “unnecessary” components of the media can be very expensive, while others are actively harmful for the specific application.

5. Getting your measurements right

Results for your DOE are only as good as the quality of your measurement data. So for your DOE to work, your measurements have to be in order. 

What can go wrong? There are two related problems: Noise and sensitivity. 

Noise is about how reproducible the signal is. If you measure the same thing 3 times, how much do the results vary? This will define the resolution of your experiment. Noisier assays make it harder to distinguish between real changes and random variations. Noise is often something to watch out for during the earlier stages, where many runs will produce low or no signal. Distinguishing these to inform the next iteration will be critical.

Sensitivity is more about the range of signals that you can detect. This usually comes down to a device’s upper and lower detection limits. If you don’t take these into account, you risk losing a lot of information on signals outside those limits, which is a big problem when it comes to modeling DOE data.

Sensitivity could come up in our working DOE example as a side-effect of the assay protocol. The simplest way to detect protein expression might be using crude lysates with a Bradford assay. But you’d need to ensure that the dynamic range of the plate reader doesn’t restrict sensitivity. Testing multiple dilutions is one common way to mitigate this. Mitigating noise issues from background expression of non-target proteins using a proper negative control strategy is also something you’d want to consider.

6. Avoiding the "big bang" —and breaking up your experiment into stages

DOE lets you investigate lots of factors at once—so naturally, you’ll have plenty of factors to choose from. Though you can’t test them all at once. You’ll need to avoid the temptation of creating a “big bang”. In other words, trying to investigate all your factors in depth with 1 massive experiment. This would be impractical, if not impossible.  

When thinking about what influences the optimal expression of a target protein, for example, you’ll have to choose between dozens of factors, like variations in the genetic payload—plasmid type, the coding, promoter or terminator sequences. The molecular biology techniques used to assemble and transform the payload, the host strain details and growth conditions, such as temperatures and times, could also influence expression.

Most of the possible combinations will have little if any effect on the expression profile. The problem is you don’t know which! 

Thankfully, the solution is simple: It’s best to do your experiment in stages. Begin your DOE campaign by investigating a broad set of factors in limited detail , as you’ll eliminate dead-ends—and produce a smaller, more interesting and influential set. Later experiments can fill in the missing details.

7. Giving your DOE campaign a sanity check

Before you start, look over your DOE campaign. Make sure that you understand exactly what you’re proposing to do in each stage, and whether it makes biological sense.

Will all your runs be biologically plausible?

When you’re looking at the early stages of your DOE campaign, remember that the aim  is to investigate high and low levels of continuous factors. For our protein optimization example, we’d want to focus on things like concentrations of media components, to establish ranges to investigate. 

And while each of the highest and lowest levels for your factors may make sense in isolation, the combination may not be possible. For instance, investigating the effect of several carbon sources on bacterial growth could involve a low level or zero for each source individually. Bacteria may, however, thrive on more than one carbon source. But giving bacteria no carbon would obviously prevent growth. Equally, large amounts of different carbon sources could overwhelm the bacteria. So, you may want to set limits for total carbon.

Biologically, implausible runs waste time and resources, and can compromise the overall results. Especially if they occur multiple times. Trying to understand how the combination of levels would influence the system is critical: It will make a huge difference to the success of your run. No DOE design package or statistician can give you these answers.

Have you also thought about your positive and negative controls?

It's good scientific practice to use positive and negative controls. But these aren't included in the DOE experimental design, and they're important to think about.

The experimental design will contain the points required to estimate the effects and interactions that you are investigating. DOE also assumes that you can easily measure the response for each run. 

You should consider all of these as experimental runs. While they can sometimes include runs that could function as controls (e.g., the zero carbon example above) that's not their purpose. Which means you need to make sure that you add the required controls and replicate runs separately.

Can you make iterating easier by making some of your runs identical?

We also advocate, particularly when iterating, for including a few repeated runs from earlier stages to help understand if your system is behaving the same way. 

Otherwise, you could end up in a situation where all your runs look suspiciously different from what you expect given earlier experiments. But as your runs have little to nothing in common, it can be difficult to identify errors that affect large sets of runs, like a machine not functioning correctly.  

What we learned from this example? For DOE, the scientist holds the key

If these 7 elements are too much for you to take in all in one go, just remember this: Software and automation, as well as experts in statistics and lab automation, are all valuable allies. 

But your greatest ally is your scientific knowledge and instincts: It's up to you to make sure that your experiments ask the right questions in the right way. 

Just remember to temper this with open-mindedness: Be critical of what you think you already know. After all, you have nothing to lose but your cognitive bias.

Interested in learning more about DOE? Make sure to check out our other DOE blogs , download our DOE for biologists ebook , or watch our DOE Masterclass webinar series .

Michael "Sid" Sadowski, PhD

Michael Sadowski, aka Sid, is the Director of Scientific Software at Synthace, where he leads the company’s DOE product development. In his 10 years at the company he has consulted on dozens of DOE campaigns, many of which included aspects of QbD.

Other posts you might be interested in

What is design of experiments (doe).

Michael

Why Design of Experiments (DOE) is important for biologists

James Arpino, PhD

A Biologist’s Guide to Design of Experiments

Synthace Team

Subscribe to email updates

Additional content around the benefits of subscribing to this blog feed.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Optimal Design of Experiments: A Case Study Approach by Peter Goos and Bradley Jones

Profile image of Bradley Jones

2012, International Statistical Review

Related Papers

Boletín de Estadística e Investigación Operativa

Jesús López-Fidalgo

case study design of experiments

Experimental design is concerned with the skillful interrogation of nature. Unfortunately, nature is reluctant to reveal her secrets. Joan Fisher Box (1978) observed in her autobiography of her father, Ronald A. Fisher, “Far from behaving consistently, however, Nature appears vacillating, coy, and ambiguous in her answers ” (p. 140). Her most effective tool for confusing researchers is variability—in particular, variability among participants or experimental units. But two can play the variability game. By comparing the variability among participants treated differently to the variability among participants treated alike, researchers can make informed choices between competing hypotheses in science and technology. We must never underestimate nature—she is a formidable foe. Carefully designed and executed experiments are required to learn her secrets. An experimental design is a plan for assigning participants to experimental conditions and the statistical analysis associated with th...

Raoul Edouard

Interpret the experimental results. The importance of overall planning for efficient experimen- Achieving efficient experimental work without applying at tation is discussed anti stressed. One of the most important least part of these procedures would be difficult, if not impos- steps of this planning, the selection of an experimental design, sible. It is probable that they are used, at least intuitively, by is studied with particular reference to the use of factorial all experimenters. designs. The advantages of this type of design as well as a One of the most important steps in the above procedure is recent application are described. the selection of the experimental design. of one de~ends on the level of the other. These designs also Secondly, it highlights the advantages in using properly planned, statistically designed experiments instead of com- pleting a series of experimental trials and then posing the questions "How do .we analyse the data?" "What conclusions...

Percy Soto Becerra

Experimental economics represents a strong growth industry. In the past several decades the method has expanded beyond intellectual curiosity, now meriting consideration alongside the other more traditional empirical approaches used in economics. Accompanying this growth is an influx of new experimenters who are in need of straightforward direction to make their designs more powerful. This study provides several simple rules of thumb that researchers can apply to improve the efficiency of their experimental designs. We buttress these points by including empirical examples from the literature.

ForsChem Research Reports

Hugo Hernandez

Experimentation is the core of scientific research. Performing an experiment can be considered equivalent to asking a question to Nature and waiting for an answer. Understanding a natural phenomenon usually requires doing many experiments until a satisfactory model of such phenomenon is obtained. There are infinite possible ways to plan a set of experiments for researching a certain phenomenon, and some are more efficient than others. Experimental Design, also known as Design of Experiments (DoE), provides a systematic approach to obtain efficient experimental arrangements for different research problems. Experimental Design emerged almost a Century ago based on statistical analysis. Some decades after the development of DoE methods, they became widely used in all fields of Science and Engineering. Unfortunately, these valuable tools have been presently employed without a proper knowledge resulting in potentially erroneous conclusions. The purpose of this essay is discussing several mistakes that may occur due to the incorrect use of DoE methods.

jiju1968 antony

Rajender Parsad

Dr. Peeraya Thongkruer

Viatcheslav Melas

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Martín Tanco , Elisabeth Viles

Organic Process Research & Development

Trevor Laird

Computational Statistics & Data Analysis

Martina Vandebroek

Investigaciones Europeas de Dirección y Economía de la Empresa

Henriqueta Nóvoa

International Journal of Human-Computer Studies

Kevin Dunbar

Technometrics

Valerii Fedorov

Urban Forssell

Rosa Arboretti

European Journal of Operational Research

Patrick Whitcomb , 穎俐 李

Olivier Toubia

Abdullah Al Zabir

The American Mathematical Monthly

Jonathan Shuster

Bradley Jones

Journal of Quality Technology

Abhijit Banerjee

Australian & New Zealand Journal of Statistics

Nam-Ky Nguyen

Artificial Intelligence and Soft Computing--ICAISC 2006; pp. 324-333

Janos Abonyi

Chemometrics and Intelligent Laboratory Systems

Torbjörn Lundstedt

Benjamin Durakovic

'AGARD Lecture Series 178: Rotorcraft System Identification (AGARD-LS-178)', pp 2.1-2.8

David Murray-Smith

zaharah wahid

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Single-Case Experimental Designs: A Systematic Review of Published Research and Current Standards

Justin d. smith.

Child and Family Center, University of Oregon

This article systematically reviews the research design and methodological characteristics of single-case experimental design (SCED) research published in peer-reviewed journals between 2000 and 2010. SCEDs provide researchers with a flexible and viable alternative to group designs with large sample sizes. However, methodological challenges have precluded widespread implementation and acceptance of the SCED as a viable complementary methodology to the predominant group design. This article includes a description of the research design, measurement, and analysis domains distinctive to the SCED; a discussion of the results within the framework of contemporary standards and guidelines in the field; and a presentation of updated benchmarks for key characteristics (e.g., baseline sampling, method of analysis), and overall, it provides researchers and reviewers with a resource for conducting and evaluating SCED research. The results of the systematic review of 409 studies suggest that recently published SCED research is largely in accordance with contemporary criteria for experimental quality. Analytic method emerged as an area of discord. Comparison of the findings of this review with historical estimates of the use of statistical analysis indicates an upward trend, but visual analysis remains the most common analytic method and also garners the most support amongst those entities providing SCED standards. Although consensus exists along key dimensions of single-case research design and researchers appear to be practicing within these parameters, there remains a need for further evaluation of assessment and sampling techniques and data analytic methods.

The single-case experiment has a storied history in psychology dating back to the field’s founders: Fechner (1889) , Watson (1925) , and Skinner (1938) . It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see Morgan & Morgan, 2001 ). In recent years the single-case experimental design (SCED) has been represented in the literature more often than in past decades, as is evidenced by recent reviews ( Hammond & Gast, 2010 ; Shadish & Sullivan, 2011 ), but it still languishes behind the more prominent group design in nearly all subfields of psychology. Group designs are often professed to be superior because they minimize, although do not necessarily eliminate, the major internal validity threats to drawing scientifically valid inferences from the results ( Shadish, Cook, & Campbell, 2002 ). SCEDs provide a rigorous, methodologically sound alternative method of evaluation (e.g., Barlow, Nock, & Hersen, 2008 ; Horner et al., 2005 ; Kazdin, 2010 ; Kratochwill & Levin, 2010 ; Shadish et al., 2002 ) but are often overlooked as a true experimental methodology capable of eliciting legitimate inferences (e.g., Barlow et al., 2008 ; Kazdin, 2010 ). Despite a shift in the zeitgeist from single-case experiments to group designs more than a half century ago, recent and rapid methodological advancements suggest that SCEDs are poised for resurgence.

Single case refers to the participant or cluster of participants (e.g., a classroom, hospital, or neighborhood) under investigation. In contrast to an experimental group design in which one group is compared with another, participants in a single-subject experiment research provide their own control data for the purpose of comparison in a within-subject rather than a between-subjects design. SCEDs typically involve a comparison between two experimental time periods, known as phases. This approach typically includes collecting a representative baseline phase to serve as a comparison with subsequent phases. In studies examining single subjects that are actually groups (i.e., classroom, school), there are additional threats to internal validity of the results, as noted by Kratochwill and Levin (2010) , which include setting or site effects.

The central goal of the SCED is to determine whether a causal or functional relationship exists between a researcher-manipulated independent variable (IV) and a meaningful change in the dependent variable (DV). SCEDs generally involve repeated, systematic assessment of one or more IVs and DVs over time. The DV is measured repeatedly across and within all conditions or phases of the IV. Experimental control in SCEDs includes replication of the effect either within or between participants ( Horner et al., 2005 ). Randomization is another way in which threats to internal validity can be experimentally controlled. Kratochwill and Levin (2010) recently provided multiple suggestions for adding a randomization component to SCEDs to improve the methodological rigor and internal validity of the findings.

Examination of the effectiveness of interventions is perhaps the area in which SCEDs are most well represented ( Morgan & Morgan, 2001 ). Researchers in behavioral medicine and in clinical, health, educational, school, sport, rehabilitation, and counseling psychology often use SCEDs because they are particularly well suited to examining the processes and outcomes of psychological and behavioral interventions (e.g., Borckardt et al., 2008 ; Kazdin, 2010 ; Robey, Schultz, Crawford, & Sinner, 1999 ). Skepticism about the clinical utility of the randomized controlled trial (e.g., Jacobsen & Christensen, 1996 ; Wachtel, 2010 ; Westen & Bradley, 2005 ; Westen, Novotny, & Thompson-Brenner, 2004 ) has renewed researchers’ interest in SCEDs as a means to assess intervention outcomes (e.g., Borckardt et al., 2008 ; Dattilio, Edwards, & Fishman, 2010 ; Horner et al., 2005 ; Kratochwill, 2007 ; Kratochwill & Levin, 2010 ). Although SCEDs are relatively well represented in the intervention literature, it is by no means their sole home: Examples appear in nearly every subfield of psychology (e.g., Bolger, Davis, & Rafaeli, 2003 ; Piasecki, Hufford, Solham, & Trull, 2007 ; Reis & Gable, 2000 ; Shiffman, Stone, & Hufford, 2008 ; Soliday, Moore, & Lande, 2002 ). Aside from the current preference for group-based research designs, several methodological challenges have repressed the proliferation of the SCED.

Methodological Complexity

SCEDs undeniably present researchers with a complex array of methodological and research design challenges, such as establishing a representative baseline, managing the nonindependence of sequential observations (i.e., autocorrelation, serial dependence), interpreting single-subject effect sizes, analyzing the short data streams seen in many applications, and appropriately addressing the matter of missing observations. In the field of intervention research for example, Hser et al. (2001) noted that studies using SCEDs are “rare” because of the minimum number of observations that are necessary (e.g., 3–5 data points in each phase) and the complexity of available data analysis approaches. Advances in longitudinal person-based trajectory analysis (e.g., Nagin, 1999 ), structural equation modeling techniques (e.g., Lubke & Muthén, 2005 ), time-series forecasting (e.g., autoregressive integrated moving averages; Box & Jenkins, 1970 ), and statistical programs designed specifically for SCEDs (e.g., Simulation Modeling Analysis; Borckardt, 2006 ) have provided researchers with robust means of analysis, but they might not be feasible methods for the average psychological scientist.

Application of the SCED has also expanded. Today, researchers use variants of the SCED to examine complex psychological processes and the relationship between daily and momentary events in peoples’ lives and their psychological correlates. Research in nearly all subfields of psychology has begun to use daily diary and ecological momentary assessment (EMA) methods in the context of the SCED, opening the door to understanding increasingly complex psychological phenomena (see Bolger et al., 2003 ; Shiffman et al., 2008 ). In contrast to the carefully controlled laboratory experiment that dominated research in the first half of the twentieth century (e.g., Skinner, 1938 ; Watson, 1925 ), contemporary proponents advocate application of the SCED in naturalistic studies to increase the ecological validity of empirical findings (e.g., Bloom, Fisher, & Orme, 2003 ; Borckardt et al., 2008 ; Dattilio et al., 2010 ; Jacobsen & Christensen, 1996 ; Kazdin, 2008 ; Morgan & Morgan, 2001 ; Westen & Bradley, 2005 ; Westen et al., 2004 ). Recent advancements and expanded application of SCEDs indicate a need for updated design and reporting standards.

Many current benchmarks in the literature concerning key parameters of the SCED were established well before current advancements and innovations, such as the suggested minimum number of data points in the baseline phase(s), which remains a disputed area of SCED research (e.g., Center, Skiba, & Casey, 1986 ; Huitema, 1985 ; R. R. Jones, Vaught, & Weinrott, 1977 ; Sharpley, 1987 ). This article comprises (a) an examination of contemporary SCED methodological and reporting standards; (b) a systematic review of select design, measurement, and statistical characteristics of published SCED research during the past decade; and (c) a broad discussion of the critical aspects of this research to inform methodological improvements and study reporting standards. The reader will garner a fundamental understanding of what constitutes appropriate methodological soundness in single-case experimental research according to the established standards in the field, which can be used to guide the design of future studies, improve the presentation of publishable empirical findings, and inform the peer-review process. The discussion begins with the basic characteristics of the SCED, including an introduction to time-series, daily diary, and EMA strategies, and describes how current reporting and design standards apply to each of these areas of single-case research. Interweaved within this presentation are the results of a systematic review of SCED research published between 2000 and 2010 in peer-reviewed outlets and a discussion of the way in which these findings support, or differ from, existing design and reporting standards and published SCED benchmarks.

Review of Current SCED Guidelines and Reporting Standards

In contrast to experimental group comparison studies, which conform to generally well agreed upon methodological design and reporting guidelines, such as the CONSORT ( Moher, Schulz, Altman, & the CONSORT Group, 2001 ) and TREND ( Des Jarlais, Lyles, & Crepaz, 2004 ) statements for randomized and nonrandomized trials, respectively, there is comparatively much less consensus when it comes to the SCED. Until fairly recently, design and reporting guidelines for single-case experiments were almost entirely absent in the literature and were typically determined by the preferences of a research subspecialty or a particular journal’s editorial board. Factions still exist within the larger field of psychology, as can be seen in the collection of standards presented in this article, particularly in regard to data analytic methods of SCEDs, but fortunately there is budding agreement about certain design and measurement characteristics. A number of task forces, professional groups, and independent experts in the field have recently put forth guidelines; each has a relatively distinct purpose, which likely accounts for some of the discrepancies between them. In what is to be a central theme of this article, researchers are ultimately responsible for thoughtfully and synergistically combining research design, measurement, and analysis aspects of a study.

This review presents the more prominent, comprehensive, and recently established SCED standards. Six sources are discussed: (1) Single-Case Design Technical Documentation from the What Works Clearinghouse (WWC; Kratochwill et al., 2010 ); (2) the APA Division 12 Task Force on Psychological Interventions, with contributions from the Division 12 Task Force on Promotion and Dissemination of Psychological Procedures and the APA Task Force for Psychological Intervention Guidelines (DIV12; presented in Chambless & Hollon, 1998 ; Chambless & Ollendick, 2001 ), adopted and expanded by APA Division 53, the Society for Clinical Child and Adolescent Psychology ( Weisz & Hawley, 1998 , 1999 ); (3) the APA Division 16 Task Force on Evidence-Based Interventions in School Psychology (DIV16; Members of the Task Force on Evidence-Based Interventions in School Psychology. Chair: T. R. Kratochwill, 2003); (4) the National Reading Panel (NRP; National Institute of Child Health and Human Development, 2000 ); (5) the Single-Case Experimental Design Scale ( Tate et al., 2008 ); and (6) the reporting guidelines for EMA put forth by Stone & Shiffman (2002) . Although the specific purposes of each source differ somewhat, the overall aim is to provide researchers and reviewers with agreed-upon criteria to be used in the conduct and evaluation of SCED research. The standards provided by WWC, DIV12, DIV16, and the NRP represent the efforts of task forces. The Tate et al. scale was selected for inclusion in this review because it represents perhaps the only psychometrically validated tool for assessing the rigor of SCED methodology. Stone and Shiffman’s (2002) standards were intended specifically for EMA methods, but many of their criteria also apply to time-series, daily diary, and other repeated-measurement and sampling methods, making them pertinent to this article. The design, measurement, and analysis standards are presented in the later sections of this article and notable concurrences, discrepancies, strengths, and deficiencies are summarized.

Systematic Review Search Procedures and Selection Criteria

Search strategy.

A comprehensive search strategy of SCEDs was performed to identify studies published in peer-reviewed journals meeting a priori search and inclusion criteria. First, a computer-based PsycINFO search of articles published between 2000 and 2010 (search conducted in July 2011) was conducted that used the following primary key terms and phrases that appeared anywhere in the article (asterisks denote that any characters/letters can follow the last character of the search term): alternating treatment design, changing criterion design, experimental case*, multiple baseline design, replicated single-case design, simultaneous treatment design, time-series design. The search was limited to studies published in the English language and those appearing in peer-reviewed journals within the specified publication year range. Additional limiters of the type of article were also used in PsycINFO to increase specificity: The search was limited to include methodologies indexed as either quantitative study OR treatment outcome/randomized clinical trial and NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study.

Study selection

The author used a three-phase study selection, screening, and coding procedure to select the highest number of applicable studies. Phase 1 consisted of the initial systematic review conducted using PsycINFO, which resulted in 571 articles. In Phase 2, titles and abstracts were screened: Articles appearing to use a SCED were retained (451) for Phase 3, in which the author and a trained research assistant read each full-text article and entered the characteristics of interest into a database. At each phase of the screening process, studies that did not use a SCED or that either self-identified as, or were determined to be, quasi-experimental were dropped. Of the 571 original studies, 82 studies were determined to be quasi-experimental. The definition of a quasi-experimental design used in the screening procedure conforms to the descriptions provided by Kazdin (2010) and Shadish et al. (2002) regarding the necessary components of an experimental design. For example, reversal designs require a minimum of four phases (e.g., ABAB), and multiple baseline designs must demonstrate replication of the effect across at least three conditions (e.g., subjects, settings, behaviors). Sixteen studies were unavailable in full text in English, and five could not be obtained in full text and were thus dropped. The remaining articles that were not retained for review (59) were determined not to be SCED studies meeting our inclusion criteria, but had been identified in our PsycINFO search using the specified keyword and methodology terms. For this review, 409 studies were selected. The sources of the 409 reviewed studies are summarized in Table 1 . A complete bibliography of the 571 studies appearing in the initial search, with the included studies marked, is available online as an Appendix or from the author.

Journal Sources of Studies Included in the Systematic Review (N = 409)

Journal Title
45
15
14
14
13
12
12
10
10
9
9
9
9
8
8
8
8
6
6
5
5
4
4
4

Note: Each of the following journal titles contributed 1 study unless otherwise noted in parentheses: Augmentative and Alternative Communication; Acta Colombiana de Psicología; Acta Comportamentalia; Adapted Physical Activity Quarterly (2); Addiction Research and Theory; Advances in Speech Language Pathology; American Annals of the Deaf; American Journal of Education; American Journal of Occupational Therapy; American Journal of Speech-Language Pathology; The American Journal on Addictions; American Journal on Mental Retardation; Applied Ergonomics; Applied Psychophysiology and Biofeedback; Australian Journal of Guidance & Counseling; Australian Psychologist; Autism; The Behavior Analyst; The Behavior Analyst Today; Behavior Analysis in Practice (2); Behavior and Social Issues (2); Behaviour Change (2); Behavioural and Cognitive Psychotherapy; Behaviour Research and Therapy (3); Brain and Language (2); Brain Injury (2); Canadian Journal of Occupational Therapy (2); Canadian Journal of School Psychology; Career Development for Exceptional Individuals; Chinese Mental Health Journal; Clinical Linguistics and Phonetics; Clinical Psychology & Psychotherapy; Cognitive and Behavioral Practice; Cognitive Computation; Cognitive Therapy and Research; Communication Disorders Quarterly; Developmental Medicine & Child Neurology (2); Developmental Neurorehabilitation (2); Disability and Rehabilitation: An International, Multidisciplinary Journal (3); Disability and Rehabilitation: Assistive Technology; Down Syndrome: Research & Practice; Drug and Alcohol Dependence (2); Early Childhood Education Journal (2); Early Childhood Services: An Interdisciplinary Journal of Effectiveness; Educational Psychology (2); Education and Training in Autism and Developmental Disabilities; Electronic Journal of Research in Educational Psychology; Environment and Behavior (2); European Eating Disorders Review; European Journal of Sport Science; European Review of Applied Psychology; Exceptional Children; Exceptionality; Experimental and Clinical Psychopharmacology; Family & Community Health: The Journal of Health Promotion & Maintenance; Headache: The Journal of Head and Face Pain; International Journal of Behavioral Consultation and Therapy (2); International Journal of Disability; Development and Education (2); International Journal of Drug Policy; International Journal of Psychology; International Journal of Speech-Language Pathology; International Psychogeriatrics; Japanese Journal of Behavior Analysis (3); Japanese Journal of Special Education; Journal of Applied Research in Intellectual Disabilities (2); Journal of Applied Sport Psychology (3); Journal of Attention Disorders (2); Journal of Behavior Therapy and Experimental Psychiatry; Journal of Child Psychology and Psychiatry; Journal of Clinical Psychology in Medical Settings; Journal of Clinical Sport Psychology; Journal of Cognitive Psychotherapy; Journal of Consulting and Clinical Psychology (2); Journal of Deaf Studies and Deaf Education; Journal of Educational & Psychological Consultation (2); Journal of Evidence-Based Practices for Schools (2); Journal of the Experimental Analysis of Behavior (2); Journal of General Internal Medicine; Journal of Intellectual and Developmental Disabilities; Journal of Intellectual Disability Research (2); Journal of Medical Speech-Language Pathology; Journal of Neurology, Neurosurgery & Psychiatry; Journal of Paediatrics and Child Health; Journal of Prevention and Intervention in the Community; Journal of Safety Research; Journal of School Psychology (3); The Journal of Socio-Economics; The Journal of Special Education; Journal of Speech, Language, and Hearing Research (2); Journal of Sport Behavior; Journal of Substance Abuse Treatment; Journal of the International Neuropsychological Society; Journal of Traumatic Stress; The Journals of Gerontology: Series B: Psychological Sciences and Social Sciences; Language, Speech, and Hearing Services in Schools; Learning Disabilities Research & Practice (2); Learning Disability Quarterly (2); Music Therapy Perspectives; Neurorehabilitation and Neural Repair; Neuropsychological Rehabilitation (2); Pain; Physical Education and Sport Pedagogy (2); Preventive Medicine: An International Journal Devoted to Practice and Theory; Psychological Assessment; Psychological Medicine: A Journal of Research in Psychiatry and the Allied Sciences; The Psychological Record; Reading and Writing; Remedial and Special Education (3); Research and Practice for Persons with Severe Disabilities (2); Restorative Neurology and Neuroscience; School Psychology International; Seminars in Speech and Language; Sleep and Hypnosis; School Psychology Quarterly; Social Work in Health Care; The Sport Psychologist (3); Therapeutic Recreation Journal (2); The Volta Review; Work: Journal of Prevention, Assessment & Rehabilitation.

Coding criteria amplifications

A comprehensive description of the coding criteria for each category in this review is available from the author by request. The primary coding criteria are described here and in later sections of this article.

  • Research design was classified into one of the types discussed later in the section titled Predominant Single-Case Experimental Designs on the basis of the authors’ stated design type. Secondary research designs were then coded when applicable (i.e., mixed designs). Distinctions between primary and secondary research designs were made based on the authors’ description of their study. For example, if an author described the study as a “multiple baseline design with time-series measurement,” the primary research design would be coded as being multiple baseline, and time-series would be coded as the secondary research design.
  • Observer ratings were coded as present when observational coding procedures were described and/or the results of a test of interobserver agreement were reported.
  • Interrater reliability for observer ratings was coded as present in any case in which percent agreement, alpha, kappa, or another appropriate statistic was reported, regardless of the amount of the total data that were examined for agreement.
  • Daily diary, daily self-report, and EMA codes were given when authors explicitly described these procedures in the text by name. Coders did not infer the use of these measurement strategies.
  • The number of baseline observations was either taken directly from the figures provided in text or was simply counted in graphical displays of the data when this was determined to be a reliable approach. In some cases, it was not possible to reliably determine the number of baseline data points from the graphical display of data, in which case, the “unavailable” code was assigned. Similarly, the “unavailable” code was assigned when the number of observations was either unreported or ambiguous, or only a range was provided and thus no mean could be determined. Similarly, the mean number of baseline observations was calculated for each study prior to further descriptive statistical analyses because a number of studies reported means only.
  • The coding of the analytic method used in the reviewed studies is discussed later in the section titled Discussion of Review Results and Coding of Analytic Methods .

Results of the Systematic Review

Descriptive statistics of the design, measurement, and analysis characteristics of the reviewed studies are presented in Table 2 . The results and their implications are discussed in the relevant sections throughout the remainder of the article.

Descriptive Statistics of Reviewed SCED Characteristics

SubjectsObserver ratingsDiary/EMABaseline observations Method of analysis (%)
M Range%IRR%Mean RangeVisualStatisticalVisual & statisticalNot reported
Research design
 •Alternating condition264.773.341–1784.695.53.88.449.502–3923.17.719.246.2
 •Changing/shifting criterion181.941.061–477.885.70.05.292.932–1027.8
 •Multiple baseline/combined series2837.2918.081–20075.698.17.110.408.842–8921.613.46.455.8
 •Reversal70 6.6410.641–7578.6100.04.311.6913.781–7217.112.95.762.9
 •Simultaneous condition2 850.0100.00.02.0050.050.00.00.0
•Time-series10 26.7835.432–11450.040.010.06.212.593–100.070.030.00.0
 Mixed designs
  •Multiple baseline with reversal126.898.241–3292.9100.07.113.019.593–3314.321.40.064.3
  •Multiple baseline with changing criterion63.171.331–583.380.016.711.009.615–30
  •Multiple baseline with time-series65.001.793–816.7100.050.017.3015.684–420.066.716.716.7
Total of reviewed studies4096.6314.611–20076.097.16.110.229.591–8920.813.97.352.3

Note. % refers to the proportion of reviewed studies that satisfied criteria for this code: For example, the percent of studies reporting observer ratings.

Discussion of the Systematic Review Results in Context

The SCED is a very flexible methodology and has many variants. Those mentioned here are the building blocks from which other designs are then derived. For those readers interested in the nuances of each design, Barlow et al., (2008) ; Franklin, Allison, and Gorman (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992) , among others, provide cogent, in-depth discussions. Identifying the appropriate SCED depends upon many factors, including the specifics of the IV, the setting in which the study will be conducted, participant characteristics, the desired or hypothesized outcomes, and the research question(s). Similarly, the researcher’s selection of measurement and analysis techniques is determined by these factors.

Predominant Single-Case Experimental Designs

Alternating/simultaneous designs (6%; primary design of the studies reviewed).

Alternating and simultaneous designs involve an iterative manipulation of the IV(s) across different phases to show that changes in the DV vary systematically as a function of manipulating the IV(s). In these multielement designs, the researcher has the option to alternate the introduction of two or more IVs or present two or more IVs at the same time. In the alternating variation, the researcher is able to determine the relative impact of two different IVs on the DV, when all other conditions are held constant. Another variation of this design is to alternate IVs across various conditions that could be related to the DV (e.g., class period, interventionist). Similarly, the simultaneous design would occur when the IVs were presented at the same time within the same phase of the study.

Changing criterion design (4%)

Changing criterion designs are used to demonstrate a gradual change in the DV over the course of the phase involving the active manipulation of the IV. Criteria indicating that a change has occurred happen in a step-wise manner, in which the criterion shifts as the participant responds to the presence of the manipulated IV. The changing criterion design is particularly useful in applied intervention research for a number of reasons. The IV is continuous and never withdrawn, unlike the strategy used in a reversal design. This is particularly important in situations where removal of a psychological intervention would be either detrimental or dangerous to the participant, or would be otherwise unfeasible or unethical. The multiple baseline design also does not withdraw intervention, but it requires replicating the effects of the intervention across participants, settings, or situations. A changing criterion design can be accomplished with one participant in one setting without withholding or withdrawing treatment.

Multiple baseline/combined series design (69%)

The multiple baseline or combined series design can be used to test within-subject change across conditions and often involves multiple participants in a replication context. The multiple baseline design is quite simple in many ways, essentially consisting of a number of repeated, miniature AB experiments or variations thereof. Introduction of the IV is staggered temporally across multiple participants or across multiple within-subject conditions, which allows the researcher to demonstrate that changes in the DV reliably occur only when the IV is introduced, thus controlling for the effects of extraneous factors. Multiple baseline designs can be used both within and across units (i.e., persons or groups of persons). When the baseline phase of each subject begins simultaneously, it is called a concurrent multiple baseline design. In a nonconcurrent variation, baseline periods across subjects begin at different points in time. The multiple baseline design is useful in many settings in which withdrawal of the IV would not be appropriate or when introduction of the IV is hypothesized to result in permanent change that would not reverse when the IV is withdrawn. The major drawback of this design is that the IV must be initially withheld for a period of time to ensure different starting points across the different units in the baseline phase. Depending upon the nature of the research questions, withholding an IV, such as a treatment, could be potentially detrimental to participants.

Reversal designs (17%)

Reversal designs are also known as introduction and withdrawal and are denoted as ABAB designs in their simplest form. As the name suggests, the reversal design involves collecting a baseline measure of the DV (the first A phase), introducing the IV (the first B phase), removing the IV while continuing to assess the DV (the second A phase), and then reintroducing the IV (the second B phase). This pattern can be repeated as many times as is necessary to demonstrate an effect or otherwise address the research question. Reversal designs are useful when the manipulation is hypothesized to result in changes in the DV that are expected to reverse or discontinue when the manipulation is not present. Maintenance of an effect is often necessary to uphold the findings of reversal designs. The demonstration of an effect is evident in reversal designs when improvement occurs during the first manipulation phase, compared to the first baseline phase, then reverts to or approaches original baseline levels during the second baseline phase when the manipulation has been withdrawn, and then improves again when the manipulation in then reinstated. This pattern of reversal, when the manipulation is introduced and then withdrawn, is essential to attributing changes in the DV to the IV. However, maintenance of the effects in a reversal design, in which the DV is hypothesized to reverse when the IV is withdrawn, is not incompatible ( Kazdin, 2010 ). Maintenance is demonstrated by repeating introduction–withdrawal segments until improvement in the DV becomes permanent even when the IV is withdrawn. There is not always a need to demonstrate maintenance in all applications, nor is it always possible or desirable, but it is paramount in the learning and intervention research contexts.

Mixed designs (10%)

Mixed designs include a combination of more than one SCED (e.g., a reversal design embedded within a multiple baseline) or an SCED embedded within a group design (i.e., a randomized controlled trial comparing two groups of multiple baseline experiments). Mixed designs afford the researcher even greater flexibility in designing a study to address complex psychological hypotheses, but also capitalize on the strengths of the various designs. See Kazdin (2010) for a discussion of the variations and utility of mixed designs.

Related Nonexperimental Designs

Quasi-experimental designs.

In contrast to the designs previously described, all of which constitute “true experiments” ( Kazdin, 2010 ; Shadish et al., 2002 ), in quasi-experimental designs the conditions of a true experiment (e.g., active manipulation of the IV, replication of the effect) are approximated and are not readily under the control of the researcher. Because the focus of this article is on experimental designs, quasi-experiments are not discussed in detail; instead the reader is referred to Kazdin (2010) and Shadish et al. (2002) .

Ecological and naturalistic single-case designs

For a single-case design to be experimental, there must be active manipulation of the IV, but in some applications, such as those that might be used in social and personality psychology, the researcher might be interested in measuring naturally occurring phenomena and examining their temporal relationships. Thus, the researcher will not use a manipulation. An example of this type of research might be a study about the temporal relationship between alcohol consumption and depressed mood, which can be measured reliably using EMA methods. Psychotherapy process researchers also use this type of design to assess dyadic relationship dynamics between therapists and clients (e.g., Tschacher & Ramseyer, 2009 ).

Research Design Standards

Each of the reviewed standards provides some degree of direction regarding acceptable research designs. The WWC provides the most detailed and specific requirements regarding design characteristics. Those guidelines presented in Tables 3 , ​ ,4, 4 , and ​ and5 5 are consistent with the methodological rigor necessary to meet the WWC distinction “meets standards.” The WWC also provides less-stringent standards for a “meets standards with reservations” distinction. When minimum criteria in the design, measurement, or analysis sections of a study are not met, it is rated “does not meet standards” ( Kratochwill et al., 2010 ). Many SCEDs are acceptable within the standards of DIV12, DIV16, NRP, and in the Tate et al. SCED scale. DIV12 specifies that replication occurs across a minimum of three successive cases, which differs from the WWC specifications, which allow for three replications within a single-subject design but does not necessarily need to be across multiple subjects. DIV16 does not require, but seems to prefer, a multiple baseline design with a between-subject replication. Tate et al. state that the “design allows for the examination of cause and effect relationships to demonstrate efficacy” (p. 400, 2008). Determining whether or not a design meets this requirement is left up to the evaluator, who might then refer to one of the other standards or another source for direction.

Research Design Standards and Guidelines

What Works ClearinghouseAPA Division 12 Task Force on Psychological InterventionsAPA Division 16 Task Force on Evidence-Based Interventions in School PsychologyNational Reading PanelThe Single-Case Experimental Design Scale ( )Ecological Momentary Assessment ( )
1. Experimental manipulation (independent variable; IV)The independent variable (i.e., the intervention) must be systematically manipulated as determined by the researcherNeed a well-defined and replicable intervention for a specific disorder, problem behavior, or conditionSpecified intervention according to the classification systemSpecified interventionScale was designed to assess the quality of interventions; thus, an intervention is requiredManipulation in EMA is concerned with the sampling procedure of the study (see Measurement and Assessment table for more information)
2. Research designs
 General guidelinesAt least 3 attempts to demonstrate an effect at 3 different points in time or with 3 different phase repetitionsMany research designs are acceptable beyond those mentionedThe stage of the intervention program must be specified (see )The design allows for the examination of cause and effect to demonstrate efficacyEMA is almost entirely concerned with measurement of variables of interest; thus, the design of the study is determined solely by the research question(s)
 Reversal (e.g., ABAB)Minimum of 4 A and B phases(Mentioned as acceptable. See Analysis table for specific guidelines)Mentioned as acceptableN/AMentioned as acceptableN/A
 Multiple baseline/combined seriesAt least 3 baseline conditionsAt least 3 different, successive subjectsBoth within and between subjects
Considered the strongest because replication occurs across individuals
Single-subject or aggregated subjectsMentioned as acceptableN/A
 Alternating treatmentAt least 3 alternating treatments compared with a baseline condition or two alternating treatments compared with each otherN/AMentioned as acceptableN/AMentioned as acceptableN/A
 Simultaneous treatmentSame as for alternating treatment designsN/AMentioned as acceptableN/AMentioned as acceptableN/A
 Changing/shifting criterionAt least 3 different criteriaN/AN/AN/AN/AN/A
 Mixed designsN/AN/AMentioned as acceptableN/AN/AN/A
 Quasi-experimentalN/AN/AN/AMentioned as acceptableN/A
3. Baseline (see also Measurement and Assessment Standards)Minimum of 3 data pointsMinimum of 3 data pointsMinimum of 3 data points, although more observations are preferredNo minimum specifiedNo minimum (“sufficient sampling of behavior occurred pretreatment”)N/A
4. Randomization specifications providedN/AN/AYesYesN/AN/A

Measurement and Assessment Standards and Guidelines

What Works ClearinghouseAPA Division 12 Task Force on Psychological InterventionsAPA Division 16 Task Force on Evidence-Based Interventions in School PsychologyNational Reading PanelThe Single-Case Experimental Design Scale ( )Ecological Momentary Assessment ( )
1. Dependent variable (DV)
 Selection of DVN/A≥ 3 clinically important behaviors that are relatively independentOutcome measures that produce reliable scores (validity of measure reported)Standardized or investigator-constructed outcomes measures (report reliability)Measure behaviors that are the target of the interventionDetermined by research question(s)
 Assessor(s)/reporter(s)More than one (self-report not acceptable)N/AMultisource (not always applicable)N/AIndependent (implied minimum of 2)Determined by research question(s)
 Interrater reliabilityOn at least 20% of the data in each phase and in each condition

Must meet minimal established thresholds
N/AN/AN/AInterrater reliability is reportedN/A
 Method(s) of measurement/assessmentN/AN/AMultimethod (e.g., at least 2 assessment methods to evaluate primary outcomes; not always applicable)Quantitative or qualitative measureN/ADescription of prompting, recording, participant-initiated entries, data acquisition interface (e.g., diary)
 Interval of assessmentMust be measured repeatedly over time (no minimum specified) within and across different conditions and levels of the IVN/AN/AList time points when dependent measures were assessedSampling of the targeted behavior (i.e., DV) occurs during the treatment periodDensity and schedule are reported and consistent with addressing research question(s)

Define “immediate and timely response”
 Other guidelinesRaw data record provided (represent the variability of the target behavior)
2. Baseline measurement (see also Research Design Standards in )Minimum of 3 data points across multiple phases of a reversal or multiple baseline design; 5 data points in each phase for highest rating

1 or 2 data points can be sufficient in alternating treatment designs
Minimum of 3 data points (to establish a linear trend) No minimum specifiedNo minimum (“sufficient sampling of behavior [i.e., DV] occurred pretreatment”)N/A
3. Compliance and missing data guidelinesN/AN/AN/AN/AN/ARationale for compliance decisions, rates reported, missing data criteria and actions

Analysis Standards and Guidelines

What Works ClearinghouseAPA Division 12 Task Force on Psychological InterventionsAPA Division 16 Task Force on Evidence-Based Interventions in School PsychologyNational Reading PanelThe Single-Case Experimental Design Scale ( )Ecological Momentary Assessment ( )
1. Visual analysis4-step, 6-variable procedure (based on )Acceptable (no specific guidelines or procedures offered) )N/ANot acceptable (“use statistical analyses or describe effect sizes” p. 389)N/A
2. Statistical analysis proceduresEstimating effect sizes: nonparametric and parametric approaches, multilevel modeling, and regression (recommended)Preferred when the number of data points warrants statistical procedures (no specific guidelines or procedures offered)Rely on the guidelines presented by Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999)Type not specified – report value of the effect size, type of summary statistic, and number of people providing the effect size informationSpecific statistical methods are not specified, only their presence or absence is of interest in completing the scale
3. Demonstrating an effect ABAB - stable baseline established during first A period, data must show improvement during the first B period, reversal or leveling of improvement during the second A period, and resumed improvement in the second B period (no other guidelines offered) N/AN/AN/A
4. Replication N/AReplication occurs across subjects, therapists, or settingsN/A

The Stone and Shiffman (2002) standards for EMA are concerned almost entirely with the reporting of measurement characteristics and less so with research design. One way in which these standards differ from those of other sources is in the active manipulation of the IV. Many research questions in EMA, daily diary, and time-series designs are concerned with naturally occurring phenomena, and a researcher manipulation would run counter to this aim. The EMA standards become important when selecting an appropriate measurement strategy within the SCED. In EMA applications, as is also true in some other time-series and daily diary designs, researcher manipulation occurs as a function of the sampling interval in which DVs of interest are measured according to fixed time schedules (e.g., reporting occurs at the end of each day), random time schedules (e.g., the data collection device prompts the participant to respond at random intervals throughout the day), or on an event-based schedule (e.g., reporting occurs after a specified event takes place).

Measurement

The basic measurement requirement of the SCED is a repeated assessment of the DV across each phase of the design in order to draw valid inferences regarding the effect of the IV on the DV. In other applications, such as those used by personality and social psychology researchers to study various human phenomena ( Bolger et al., 2003 ; Reis & Gable, 2000 ), sampling strategies vary widely depending on the topic area under investigation. Regardless of the research area, SCEDs are most typically concerned with within-person change and processes and involve a time-based strategy, most commonly to assess global daily averages or peak daily levels of the DV. Many sampling strategies, such as time-series, in which reporting occurs at uniform intervals or on event-based, fixed, or variable schedules, are also appropriate measurement methods and are common in psychological research (see Bolger et al., 2003 ).

Repeated-measurement methods permit the natural, even spontaneous, reporting of information ( Reis, 1994 ), which reduces the biases of retrospection by minimizing the amount of time elapsed between an experience and the account of this experience ( Bolger et al., 2003 ). Shiffman et al. (2008) aptly noted that the majority of research in the field of psychology relies heavily on retrospective assessment measures, even though retrospective reports have been found to be susceptible to state-congruent recall (e.g., Bower, 1981 ) and a tendency to report peak levels of the experience instead of giving credence to temporal fluctuations ( Redelmeier & Kahneman, 1996 ; Stone, Broderick, Kaell, Deles-Paul, & Porter, 2000 ). Furthermore, Shiffman et al. (1997) demonstrated that subjective aggregate accounts were a poor fit to daily reported experiences, which can be attributed to reductions in measurement error resulting in increased validity and reliability of the daily reports.

The necessity of measuring at least one DV repeatedly means that the selected assessment method, instrument, and/or construct must be sensitive to change over time and be capable of reliably and validly capturing change. Horner et al. (2005) discusses the important features of outcome measures selected for use in these types of designs. Kazdin (2010) suggests that measures be dimensional, which can more readily detect effects than categorical and binary measures. Although using an established measure or scale, such as the Outcome Questionnaire System ( M. J. Lambert, Hansen, & Harmon, 2010 ), provides empirically validated items for assessing various outcomes, most measure validation studies conducted on this type of instrument involve between-subject designs, which is no guarantee that these measures are reliable and valid for assessing within-person variability. Borsboom, Mellenbergh, and van Heerden (2003) suggest that researchers adapting validated measures should consider whether the items they propose using have a factor structure within subjects similar to that obtained between subjects. This is one of the reasons that SCEDs often use observational assessments from multiple sources and report the interrater reliability of the measure. Self-report measures are acceptable practice in some circles, but generally additional assessment methods or informants are necessary to uphold the highest methodological standards. The results of this review indicate that the majority of studies include observational measurement (76.0%). Within those studies, nearly all (97.1%) reported interrater reliability procedures and results. The results within each design were similar, with the exception of time-series designs, which used observer ratings in only half of the reviewed studies.

Time-series

Time-series designs are defined by repeated measurement of variables of interest over a period of time ( Box & Jenkins, 1970 ). Time-series measurement most often occurs in uniform intervals; however, this is no longer a constraint of time-series designs (see Harvey, 2001 ). Although uniform interval reporting is not necessary in SCED research, repeated measures often occur at uniform intervals, such as once each day or each week, which constitutes a time-series design. The time-series design has been used in various basic science applications ( Scollon, Kim-Pietro, & Diener, 2003 ) across nearly all subspecialties in psychology (e.g., Bolger et al., 2003 ; Piasecki et al., 2007 ; for a review, see Reis & Gable, 2000 ; Soliday et al., 2002 ). The basic time-series formula for a two-phase (AB) data stream is presented in Equation 1 . In this formula α represents the step function of the data stream; S represents the change between the first and second phases, which is also the intercept in a two-phase data stream and a step function being 0 at times i = 1, 2, 3…n1 and 1 at times i = n1+1, n1+2, n1+3…n; n 1 is the number of observations in the baseline phase; n is the total number of data points in the data stream; i represents time; and ε i = ρε i −1 + e i , which indicates the relationship between the autoregressive function (ρ) and the distribution of the data in the stream.

Time-series formulas become increasingly complex when seasonality and autoregressive processes are modeled in the analytic procedures, but these are rarely of concern for short time-series data streams in SCEDs. For a detailed description of other time-series design and analysis issues, see Borckardt et al. (2008) , Box and Jenkins (1970) , Crosbie (1993) , R. R. Jones et al. (1977) , and Velicer and Fava (2003) .

Time-series and other repeated-measures methodologies also enable examination of temporal effects. Borckardt et al. (2008) and others have noted that time-series designs have the potential to reveal how change occurs, not simply if it occurs. This distinction is what most interested Skinner (1938) , but it often falls below the purview of today’s researchers in favor of group designs, which Skinner felt obscured the process of change. In intervention and psychopathology research, time-series designs can assess mediators of change ( Doss & Atkins, 2006 ), treatment processes ( Stout, 2007 ; Tschacher & Ramseyer, 2009 ), and the relationship between psychological symptoms (e.g., Alloy, Just, & Panzarella, 1997 ; Hanson & Chen, 2010 ; Oslin, Cary, Slaymaker, Colleran, & Blow, 2009 ), and might be capable of revealing mechanisms of change ( Kazdin, 2007 , 2009 , 2010 ). Between- and within-subject SCED designs with repeated measurements enable researchers to examine similarities and differences in the course of change, both during and as a result of manipulating an IV. Temporal effects have been largely overlooked in many areas of psychological science ( Bolger et al., 2003 ): Examining temporal relationships is sorely needed to further our understanding of the etiology and amplification of numerous psychological phenomena.

Time-series studies were very infrequently found in this literature search (2%). Time-series studies traditionally occur in subfields of psychology in which single-case research is not often used (e.g., personality, physiological/biological). Recent advances in methods for collecting and analyzing time-series data (e.g., Borckardt et al., 2008 ) could expand the use of time-series methodology in the SCED community. One problem with drawing firm conclusions from this particular review finding is a semantic factor: Time-series is a specific term reserved for measurement occurring at a uniform interval. However, SCED research appears to not yet have adopted this language when referring to data collected in this fashion. When time-series data analytic methods are not used, the matter of measurement interval is of less importance and might not need to be specified or described as a time-series. An interesting extension of this work would be to examine SCED research that used time-series measurement strategies but did not label it as such. This is important because then it could be determined how many SCEDs could be analyzed with time-series statistical methods.

Daily diary and ecological momentary assessment methods

EMA and daily diary approaches represent methodological procedures for collecting repeated measurements in time-series and non-time-series experiments, which are also known as experience sampling. Presenting an in-depth discussion of the nuances of these sampling techniques is well beyond the scope of this paper. The reader is referred to the following review articles: daily diary ( Bolger et al., 2003 ; Reis & Gable, 2000 ; Thiele, Laireiter, & Baumann, 2002 ), and EMA ( Shiffman et al., 2008 ). Experience sampling in psychology has burgeoned in the past two decades as technological advances have permitted more precise and immediate reporting by participants (e.g., Internet-based, two-way pagers, cellular telephones, handheld computers) than do paper and pencil methods (for reviews see Barrett & Barrett, 2001 ; Shiffman & Stone, 1998 ). Both methods have practical limitations and advantages. For example, electronic methods are more costly and may exclude certain subjects from participating in the study, either because they do not have access to the necessary technology or they do not have the familiarity or savvy to successfully complete reporting. Electronic data collection methods enable the researcher to prompt responses at random or predetermined intervals and also accurately assess compliance. Paper and pencil methods have been criticized for their inability to reliably track respondents’ compliance: Palermo, Valenzuela, and Stork (2004) found better compliance with electronic diaries than with paper and pencil. On the other hand, Green, Rafaeli, Bolger, Shrout, & Reis (2006) demonstrated the psychometric data structure equivalence between these two methods, suggesting that the data collected in either method will yield similar statistical results given comparable compliance rates.

Daily diary/daily self-report and EMA measurement were somewhat rarely represented in this review, occurring in only 6.1% of the total studies. EMA methods had been used in only one of the reviewed studies. The recent proliferation of EMA and daily diary studies in psychology reported by others ( Bolger et al., 2003 ; Piasecki et al., 2007 ; Shiffman et al., 2008 ) suggests that these methods have not yet reached SCED researchers, which could in part have resulted from the long-held supremacy of observational measurement in fields that commonly practice single-case research.

Measurement Standards

As was previously mentioned, measurement in SCEDs requires the reliable assessment of change over time. As illustrated in Table 4 , DIV16 and the NRP explicitly require that reliability of all measures be reported. DIV12 provides little direction in the selection of the measurement instrument, except to require that three or more clinically important behaviors with relative independence be assessed. Similarly, the only item concerned with measurement on the Tate et al. scale specifies assessing behaviors consistent with the target of the intervention. The WWC and the Tate et al. scale require at least two independent assessors of the DV and that interrater reliability meeting minimum established thresholds be reported. Furthermore, WWC requires that interrater reliability be assessed on at least 20% of the data in each phase and in each condition. DIV16 expects that assessment of the outcome measures will be multisource and multimethod, when applicable. The interval of measurement is not specified by any of the reviewed sources. The WWC and the Tate et al. scale require that DVs be measured repeatedly across phases (e.g., baseline and treatment), which is a typical requirement of a SCED. The NRP asks that the time points at which DV measurement occurred be reported.

The baseline measurement represents one of the most crucial design elements of the SCED. Because subjects provide their own data for comparison, gathering a representative, stable sampling of behavior before manipulating the IV is essential to accurately inferring an effect. Some researchers have reported the typical length of the baseline period to range from 3 to 12 observations in intervention research applications (e.g., Center et al., 1986 ; Huitema, 1985 ; R. R. Jones et al., 1977 ; Sharpley, 1987 ); Huitema’s (1985) review of 881 experiments published in the Journal of Applied Behavior Analysis resulted in a modal number of three to four baseline points. Center et al. (1986) suggested five as the minimum number of baseline measurements needed to accurately estimate autocorrelation. Longer baseline periods suggest a greater likelihood of a representative measurement of the DVs, which has been found to increase the validity of the effects and reduce bias resulting from autocorrelation ( Huitema & McKean, 1994 ). The results of this review are largely consistent with those of previous researchers: The mean number of baseline observations was found to be 10.22 ( SD = 9.59), and 6 was the modal number of observations. Baseline data were available in 77.8% of the reviewed studies. Although the baseline assessment has tremendous bearing on the results of a SCED study, it was often difficult to locate the exact number of data points. Similarly, the number of data points assessed across all phases of the study were not easily identified.

The WWC, DIV12, and DIV16 agree that a minimum of three data points during the baseline is necessary. However, to receive the highest rating by the WWC, five data points are necessary in each phase, including the baseline and any subsequent withdrawal baselines as would occur in a reversal design. DIV16 explicitly states that more than three points are preferred and further stipulates that the baseline must demonstrate stability (i.e., limited variability), absence of overlap between the baseline and other phases, absence of a trend, and that the level of the baseline measurement is severe enough to warrant intervention; each of these aspects of the data is important in inferential accuracy. Detrending techniques can be used to address baseline data trend. The integration option in ARIMA-based modeling and the empirical mode decomposition method ( Wu, Huang, Long, & Peng, 2007 ) are two sophisticated detrending techniques. In regression-based analytic methods, detrending can be accomplished by simply regressing each variable in the model on time (i.e., the residuals become the detrended series), which is analogous to adding a linear, exponential, or quadratic term to the regression equation.

NRP does not provide a minimum for data points, nor does the Tate et al. scale, which requires only a sufficient sampling of baseline behavior. Although the mean and modal number of baseline observations is well within these parameters, seven (1.7%) studies reported mean baselines of less than three data points.

Establishing a uniform minimum number of required baseline observations would provide researchers and reviewers with only a starting guide. The baseline phase is important in SCED research because it establishes a trend that can then be compared with that of subsequent phases. Although a minimum number of observations might be required to meet standards, many more might be necessary to establish a trend when there is variability and trends in the direction of the expected effect. The selected data analytic approach also has some bearing on the number of necessary baseline observations. This is discussed further in the Analysis section.

Reporting of repeated measurements

Stone and Shiffman (2002) provide a comprehensive set of guidelines for the reporting of EMA data, which can also be applied to other repeated-measurement strategies. Because the application of EMA is widespread and not confined to specific research designs, Stone and Shiffman intentionally place few restraints on researchers regarding selection of the DV and the reporter, which is determined by the research question under investigation. The methods of measurement, however, are specified in detail: Descriptions of prompting, recording of responses, participant-initiated entries, and the data acquisition interface (e.g., paper and pencil diary, PDA, cellular telephone) ought to be provided with sufficient detail for replication. Because EMA specifically, and time-series/daily diary methods similarly, are primarily concerned with the interval of assessment, Stone and Shiffman suggest reporting the density and schedule of assessment. The approach is generally determined by the nature of the research question and pragmatic considerations, such as access to electronic data collection devices at certain times of the day and participant burden. Compliance and missing data concerns are present in any longitudinal research design, but they are of particular importance in repeated-measurement applications with frequent measurement. When the research question pertains to temporal effects, compliance becomes paramount, and timely, immediate responding is necessary. For this reason, compliance decisions, rates of missing data, and missing data management techniques must be reported. The effect of missing data in time-series data streams has been the topic of recent research in the social sciences (e.g., Smith, Borckardt, & Nash, in press ; Velicer & Colby, 2005a , 2005b ). The results and implications of these and other missing data studies are discussed in the next section.

Analysis of SCED Data

Visual analysis.

Experts in the field generally agree about the majority of critical single-case experiment design and measurement characteristics. Analysis, on the other hand, is an area of significant disagreement, yet it has also received extensive recent attention and advancement. Debate regarding the appropriateness and accuracy of various methods for analyzing SCED data, the interpretation of single-case effect sizes, and other concerns vital to the validity of SCED results has been ongoing for decades, and no clear consensus has been reached. Visual analysis, following systematic procedures such as those provided by Franklin, Gorman, Beasley, and Allison (1997) and Parsonson and Baer (1978) , remains the standard by which SCED data are most commonly analyzed ( Parker, Cryer, & Byrns, 2006 ). Visual analysis can arguably be applied to all SCEDs. However, a number of baseline data characteristics must be met for effects obtained through visual analysis to be valid and reliable. The baseline phase must be relatively stable; free of significant trend, particularly in the hypothesized direction of the effect; have minimal overlap of data with subsequent phases; and have a sufficient sampling of behavior to be considered representative ( Franklin, Gorman, et al., 1997 ; Parsonson & Baer, 1978 ). The effect of baseline trend on visual analysis, and a technique to control baseline trend, are offered by Parker et al. (2006) . Kazdin (2010) suggests using statistical analysis when a trend or significant variability appears in the baseline phase, two conditions that ought to preclude the use of visual analysis techniques. Visual analysis methods are especially adept at determining intervention effects and can be of particular relevance in real-world applications (e.g., Borckardt et al., 2008 ; Kratochwill, Levin, Horner, & Swoboda, 2011 ).

However, visual analysis has its detractors. It has been shown to be inconsistent, can be affected by autocorrelation, and results in overestimation of effect (e.g., Matyas & Greenwood, 1990 ). Visual analysis as a means of estimating an effect precludes the results of SCED research from being included in meta-analysis, and also makes it very difficult to compare results to the effect sizes generated by other statistical methods. Yet, visual analysis proliferates in large part because SCED researchers are familiar with these methods and are not only generally unfamiliar with statistical approaches, but lack agreement about their appropriateness. Still, top experts in single-case analysis champion the use of statistical methods alongside visual analysis whenever it is appropriate to do so ( Kratochwill et al., 2011 ).

Statistical analysis

Statistical analysis of SCED data consists generally of an attempt to address one or more of three broad research questions: (1) Does introduction/manipulation of the IV result in statistically significant change in the level of the DV (level-change or phase-effect analysis)? (2) Does introduction/manipulation of the IV result in statistically significant change in the slope of the DV over time (slope-change analysis)? and (3) Do meaningful relationships exist between the trajectory of the DV and other potential covariates? Level- and slope-change analyses are relevant to intervention effectiveness studies and other research questions in which the IV is expected to result in changes in the DV in a particular direction. Visual analysis methods are most adept at addressing research questions pertaining to changes in level and slope (Questions 1 and 2), most often using some form of graphical representation and standardized computation of a mean level or trend line within and between each phase of interest (e.g., Horner & Spaulding, 2010 ; Kratochwill et al., 2011 ; Matyas & Greenwood, 1990 ). Research questions in other areas of psychological science might address the relationship between DVs or the slopes of DVs (Question 3). A number of sophisticated modeling approaches (e.g., cross-lag, multilevel, panel, growth mixture, latent class analysis) may be used for this type of question, and some are discussed in greater detail later in this section. However, a discussion about the nuances of this type of analysis and all their possible methods is well beyond the scope of this article.

The statistical analysis of SCEDs is a contentious issue in the field. Not only is there no agreed-upon statistical method, but the practice of statistical analysis in the context of the SCED is viewed by some as unnecessary (see Shadish, Rindskopf, & Hedges, 2008 ). Traditional trends in the prevalence of statistical analysis usage by SCED researchers are revealing: Busk & Marascuilo (1992) found that only 10% of the published single-case studies they reviewed used statistical analysis; Brossart, Parker, Olson, & Mahadevan (2006) estimated that this figure had roughly doubled by 2006. A range of concerns regarding single-case effect size calculation and interpretation is discussed in significant detail elsewhere (e.g., Campbell, 2004 ; Cohen, 1994 ; Ferron & Sentovich, 2002 ; Ferron & Ware, 1995 ; Kirk, 1996 ; Manolov & Solanas, 2008 ; Olive & Smith, 2005 ; Parker & Brossart, 2003 ; Robey et al., 1999 ; Smith et al., in press ; Velicer & Fava, 2003 ). One concern is the lack of a clearly superior method across datasets. Although statistical methods for analyzing SCEDs abound, few studies have examined their comparative performance with the same dataset. The most recent studies of this kind, performed by Brossart et al. (2006) , Campbell (2004) , Parker and Brossart (2003) , and Parker and Vannest (2009) , found that the more promising available statistical analysis methods yielded moderately different results on the same data series, which led them to conclude that each available method is equipped to adequately address only a relatively narrow spectrum of data. Given these findings, analysts need to select an appropriate model for the research questions and data structure, being mindful of how modeling results can be influenced by extraneous factors.

The current standards unfortunately provide little guidance in the way of statistical analysis options. This article presents an admittedly cursory introduction to available statistical methods; many others are not covered in this review. The following articles provide more in-depth discussion and description of other methods: Barlow et al. (2008) ; Franklin et al., (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992 , 2010 ). Shadish et al. (2008) summarize more recently developed methods. Similarly, a Special Issue of Evidence-Based Communication Assessment and Intervention (2008, Volume 2) provides articles and discussion of the more promising statistical methods for SCED analysis. An introduction to autocorrelation and its implications for statistical analysis is necessary before specific analytic methods can be discussed. It is also pertinent at this time to discuss the implications of missing data.

Autocorrelation

Many repeated measurements within a single subject or unit create a situation that most psychological researchers are unaccustomed to dealing with: autocorrelated data, which is the nonindependence of sequential observations, also known as serial dependence. Basic and advanced discussions of autocorrelation in single-subject data can be found in Borckardt et al. (2008) , Huitema (1985) , and Marshall (1980) , and discussions of autocorrelation in multilevel models can be found in Snijders and Bosker (1999) and Diggle and Liang (2001) . Along with trend and seasonal variation, autocorrelation is one example of the internal structure of repeated measurements. In the social sciences, autocorrelated data occur most naturally in the fields of physiological psychology, econometrics, and finance, where each phase of interest has potentially hundreds or even thousands of observations that are tightly packed across time (e.g., electroencephalography actuarial data, financial market indices). Applied SCED research in most areas of psychology is more likely to have measurement intervals of day, week, or hour.

Autocorrelation is a direct result of the repeated-measurement requirements of the SCED, but its effect is most noticeable and problematic when one is attempting to analyze these data. Many commonly used data analytic approaches, such as analysis of variance, assume independence of observations and can produce spurious results when the data are nonindependent. Even statistically insignificant autocorrelation estimates are generally viewed as sufficient to cause inferential bias when conventional statistics are used (e.g., Busk & Marascuilo, 1988 ; R. R. Jones et al., 1977 ; Matyas & Greenwood, 1990 ). The effect of autocorrelation on statistical inference in single-case applications has also been known for quite some time (e.g., R. R. Jones et al., 1977 ; Kanfer, 1970 ; Kazdin, 1981 ; Marshall, 1980 ). The findings of recent simulation studies of single-subject data streams indicate that autocorrelation is a nontrivial matter. For example, Manolov and Solanas (2008) determined that calculated effect sizes were linearly related to the autocorrelation of the data stream, and Smith et al. (in press) demonstrated that autocorrelation estimates in the vicinity of 0.80 negatively affect the ability to correctly infer a significant level-change effect using a standardized mean differences method. Huitema and colleagues (e.g., Huitema, 1985 ; Huitema & McKean, 1994 ) argued that autocorrelation is rarely a concern in applied research. Huitema’s methods and conclusions have been questioned and opposing data have been published (e.g., Allison & Gorman, 1993 ; Matyas & Greenwood, 1990 ; Robey et al., 1999 ), resulting in abandonment of the position that autocorrelation can be conscionably ignored without compromising the validity of the statistical procedures. Procedures for removing autocorrelation in the data stream prior to calculating effect sizes are offered as one option: One of the more promising analysis methods, autoregressive integrated moving averages (discussed later in this article), was specifically designed to remove the internal structure of time-series data, such as autocorrelation, trend, and seasonality ( Box & Jenkins, 1970 ; Tiao & Box, 1981 ).

Missing observations

Another concern inherent in repeated-measures designs is missing data. Daily diary and EMA methods are intended to reduce the risk of retrospection error by eliciting accurate, real-time information ( Bolger et al., 2003 ). However, these methods are subject to missing data as a result of honest forgetfulness, not possessing the diary collection tool at the specified time of collection, and intentional or systematic noncompliance. With paper and pencil diaries and some electronic methods, subjects might be able to complete missed entries retrospectively, defeating the temporal benefits of these assessment strategies ( Bolger et al., 2003 ). Methods of managing noncompliance through the study design and measurement methods include training the subject to use the data collection device appropriately, using technology to prompt responding and track the time of response, and providing incentives to participants for timely compliance (for additional discussion of this topic, see Bolger et al., 2003 ; Shiffman & Stone, 1998 ).

Even when efforts are made to maximize compliance during the conduct of the research, the problem of missing data is often unavoidable. Numerous approaches exist for handling missing observations in group multivariate designs (e.g., Horton & Kleinman, 2007 ; Ibrahim, Chen, Lipsitz, & Herring, 2005 ). Ragunathan (2004) and others concluded that full information and raw data maximum likelihood methods are preferable. Velicer and Colby (2005a , 2005b ) established the superiority of maximum likelihood methods over listwise deletion, mean of adjacent observations, and series mean substitution in the estimation of various critical time-series data parameters. Smith et al. (in press) extended these findings regarding the effect of missing data on inferential precision. They found that managing missing data with the EM procedure ( Dempster, Laird, & Rubin, 1977 ), a maximum likelihood algorithm, did not affect one’s ability to correctly infer a significant effect. However, lag-1 autocorrelation estimates in the vicinity of 0.80 resulted in insufficient power sensitivity (< 0.80), regardless of the proportion of missing data (10%, 20%, 30%, or 40%). 1 Although maximum likelihood methods have garnered some empirical support, methodological strategies that minimize missing data, particularly systematically missing data, are paramount to post-hoc statistical remedies.

Nonnormal distribution of data

In addition to the autocorrelated nature of SCED data, typical measurement methods also present analytic challenges. Many statistical methods, particularly those involving model finding, assume that the data are normally distributed. This is often not satisfied in SCED research when measurements involve count data, observer-rated behaviors, and other, similar metrics that result in skewed distributions. Techniques are available to manage nonnormal distributions in regression-based analysis, such as zero-inflated Poisson regression ( D. Lambert, 1992 ) and negative binomial regression ( Gardner, Mulvey, & Shaw, 1995 ), but many other statistical analysis methods do not include these sophisticated techniques. A skewed data distribution is perhaps one of the reasons Kazdin (2010) suggests not using count, categorical, or ordinal measurement methods.

Available statistical analysis methods

Following is a basic introduction to the more promising and prevalent analytic methods for SCED research. Because there is little consensus regarding the superiority of any single method, the burden unfortunately falls on the researcher to select a method capable of addressing the research question and handling the data involved in the study. Some indications and contraindications are provided for each method presented here.

Multilevel and structural equation modeling

Multilevel modeling (MLM; e.g., Schmidt, Perels, & Schmitz, 2010 ) techniques represent the state of the art among parametric approaches to SCED analysis, particularly when synthesizing SCED results ( Shadish et al., 2008 ). MLM and related latent growth curve and factor mixture methods in structural equation modeling (SEM; e.g., Lubke & Muthén, 2005 ; B. O. Muthén & Curran, 1997 ) are particularly effective for evaluating trajectories and slopes in longitudinal data and relating changes to potential covariates. MLM and related hierarchical linear models (HLM) can also illuminate the relationship between the trajectories of different variables under investigation and clarify whether or not these relationships differ amongst the subjects in the study. Time-series and cross-lag analyses can also be used in MLM and SEM ( Chow, Ho, Hamaker, & Dolan, 2010 ; du Toit & Browne, 2007 ). However, they generally require sophisticated model-fitting techniques, making them difficult for many social scientists to implement. The structure (autocorrelation) and trend of the data can also complicate many MLM methods. The common, short data streams in SCED research and the small number of subjects also present problems to MLM and SEM approaches, which were developed for data with significantly greater numbers of observations when the number of subjects is fewer, and for a greater number of participants for model-fitting purposes, particularly when there are fewer data points. Still, MLM and related techniques arguably represent the most promising analytic methods.

A number of software options 2 exist for SEM. Popular statistical packages in the social sciences provide SEM options, such as PROC CALIS in SAS ( SAS Institute Inc., 2008 ), the AMOS module ( Arbuckle, 2006 ) of SPSS ( SPSS Statistics, 2011 ), and the sempackage for R ( R Development Core Team, 2005 ), the use of which is described by Fox ( Fox, 2006 ). A number of stand-alone software options are also available for SEM applications, including Mplus ( L. K. Muthén & Muthén, 2010 ) and Stata ( StataCorp., 2011 ). Each of these programs also provides options for estimating multilevel/hierarchical models (for a review of using these programs for MLM analysis see Albright & Marinova, 2010 ). Hierarchical linear and nonlinear modeling can also be accomplished using the HLM 7 program ( Raudenbush, Bryk, & Congdon, 2011 ).

Autoregressive moving averages (ARMA; e.g., Browne & Nesselroade, 2005 ; Liu & Hudack, 1995 ; Tiao & Box, 1981 )

Two primary points have been raised regarding ARMA modeling: length of the data stream and feasibility of the modeling technique. ARMA models generally require 30–50 observations in each phase when analyzing a single-subject experiment (e.g., Borckardt et al., 2008 ; Box & Jenkins, 1970 ), which is often difficult to satisfy in applied psychological research applications. However, ARMA models in an SEM framework, such as those described by du Toit & Browne (2001) , are well suited for longitudinal panel data with few observations and many subjects. Autoregressive SEM models are also applicable under similar conditions. Model-fitting options are available in SPSS, R, and SAS via PROC ARMA.

ARMA modeling also requires considerable training in the method and rather advanced knowledge about statistical methods (e.g., Kratochwill & Levin, 1992 ). However, Brossart et al. (2006) point out that ARMA-based approaches can produce excellent results when there is no “model finding” and a simple lag-1 model, with no differencing and no moving average, is used. This approach can be taken for many SCED applications when phase- or slope-change analyses are of interest with a single, or very few, subjects. As already mentioned, this method is particularly useful when one is seeking to account for autocorrelation or other over-time variations that are not directly related to the experimental or intervention effect of interest (i.e., detrending). ARMA and other time-series analysis methods require missing data to be managed prior to analysis by means of options such as full information maximum likelihood estimation, multiple imputation, or the Kalman filter (see Box & Jenkins, 1970 ; Hamilton, 1994 ; Shumway & Stoffer, 1982 ) because listwise deletion has been shown to result in inaccurate time-series parameter estimates ( Velicer & Colby, 2005a ).

Standardized mean differences

Standardized mean differences approaches include the common Cohen’s d , Glass’s Delta, and Hedge’s g that are used in the analysis of group designs. The computational properties of mean differences approaches to SCEDs are identical to those used for group comparisons, except that the results represent within-case variation instead of the variation between groups, which suggests that the obtained effect sizes are not interpretively equivalent. The advantage of the mean differences approach is its simplicity of calculation and also its familiarity to social scientists. The primary drawback of these approaches is that they were not developed to contend with autocorrelated data. However, Manolov and Solanas (2008) reported that autocorrelation least affected effect sizes calculated using standardized mean differences approaches. To the applied-research scientist this likely represents the most accessible analytic approach, because statistical software is not required to calculate these effect sizes. The resultant effect sizes of single subject standardized mean differences analysis must be interpreted cautiously because their relation to standard effect size benchmarks, such as those provided by Cohen (1988) , is unknown. Standardized mean differences approaches are appropriate only when examining significant differences between phases of the study and cannot illuminate trajectories or relationships between variables.

Other analytic approaches

Researchers have offered other analytic methods to deal with the characteristics of SCED data. A number of methods for analyzing N -of-1 experiments have been developed. Borckardt’s Simulation Modeling Analysis (2006) program provides a method for analyzing level- and slope-change in short (<30 observations per phase; see Borckardt et al., 2008 ), autocorrelated data streams that is statistically sophisticated, yet accessible and freely available to typical psychological scientists and clinicians. A replicated single-case time-series design conducted by Smith, Handler, & Nash (2010) provides an example of SMA application. The Singwin Package, described in Bloom et al., (2003) , is a another easy-to-use parametric approach for analyzing single-case experiments. A number of nonparametric approaches have also been developed that emerged from the visual analysis tradition: Some examples include percent nonoverlapping data ( Scruggs, Mastropieri, & Casto, 1987 ) and nonoverlap of all pairs ( Parker & Vannest, 2009 ); however, these methods have come under scrutiny, and Wolery, Busick, Reichow, and Barton (2010) have suggested abandoning them altogether. Each of these methods appears to be well suited for managing specific data characteristics, but they should not be used to analyze data streams beyond their intended purpose until additional empirical research is conducted.

Combining SCED Results

Beyond the issue of single-case analysis is the matter of integrating and meta-analyzing the results of single-case experiments. SCEDs have been given short shrift in the majority of meta-analytic literature ( Littell, Corcoran, & Pillai, 2008 ; Shadish et al., 2008 ), with only a few exceptions ( Carr et al., 1999 ; Horner & Spaulding, 2010 ). Currently, few proven methods exist for integrating the results of multiple single-case experiments. Allison and Gorman (1993) and Shadish et al. (2008) present the problems associated with meta-analyzing single-case effect sizes, and W. P. Jones (2003) , Manolov and Solanas (2008) , Scruggs and Mastropieri (1998) , and Shadish et al. (2008) offer four different potential statistical solutions for this problem, none of which appear to have received consensus amongst researchers. The ability to synthesize and compare single-case effect sizes, particularly effect sizes garnered through group design research, is undoubtedly necessary to increase SCED proliferation.

Discussion of Review Results and Coding of Analytic Methods

The coding criteria for this review were quite stringent in terms of what was considered to be either visual or statistical analysis. For visual analysis to be coded as present, it was necessary for the authors to self-identify as having used a visual analysis method. In many cases, it could likely be inferred that visual analysis had been used, but it was often not specified. Similarly, statistical analysis was reserved for analytic methods that produced an effect. 3 Analyses that involved comparing magnitude of change using raw count data or percentages were not considered rigorous enough. These two narrow definitions of visual and statistical analysis contributed to the high rate of unreported analytic method, shown in Table 1 (52.3%). A better representation of the use of visual and statistical analysis would likely be the percentage of studies within those that reported a method of analysis. Under these parameters, 41.5% used visual analysis and 31.3% used statistical analysis. Included in these figures are studies that included both visual and statistical methods (11%). These findings are slightly higher than those estimated by Brossart et al. (2006) , who estimated statistical analysis is used in about 20% of SCED studies. Visual analysis continues to undoubtedly be the most prevalent method, but there appears to be a trend for increased use of statistical approaches, which is likely to only gain momentum as innovations continue.

Analysis Standards

The standards selected for inclusion in this review offer minimal direction in the way of analyzing the results of SCED research. Table 5 summarizes analysis-related information provided by the six reviewed sources for SCED standards. Visual analysis is acceptable to DV12 and DIV16, along with unspecified statistical approaches. In the WWC standards, visual analysis is the acceptable method of determining an intervention effect, with statistical analyses and randomization tests permissible as a complementary or supporting method to the results of visual analysis methods. However, the authors of the WWC standards state, “As the field reaches greater consensus about appropriate statistical analyses and quantitative effect-size measures, new standards for effect demonstration will need to be developed” ( Kratochwill et al., 2010 , p.16). The NRP and DIV12 seem to prefer statistical methods when they are warranted. The Tate at al. scale accepts only statistical analysis with the reporting of an effect size. Only the WWC and DIV16 provide guidance in the use of statistical analysis procedures: The WWC “recommends” nonparametric and parametric approaches, multilevel modeling, and regression when statistical analysis is used. DIV16 refers the reader to Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999) for direction in this matter. Statistical analysis of daily diary and EMA methods is similarly unsettled. Stone and Shiffman (2002) ask for a detailed description of the statistical procedures used, in order for the approach to be replicated and evaluated. They provide direction for analyzing aggregated and disaggregated data. They also aptly note that because many different modes of analysis exist, researchers must carefully match the analytic approach to the hypotheses being pursued.

Limitations and Future Directions

This review has a number of limitations that leave the door open for future study of SCED methodology. Publication bias is a concern in any systematic review. This is particularly true for this review because the search was limited to articles published in peer-reviewed journals. This strategy was chosen in order to inform changes in the practice of reporting and of reviewing, but it also is likely to have inflated the findings regarding the methodological rigor of the reviewed works. Inclusion of book chapters, unpublished studies, and dissertations would likely have yielded somewhat different results.

A second concern is the stringent coding criteria in regard to the analytic methods and the broad categorization into visual and statistical analytic approaches. The selection of an appropriate method for analyzing SCED data is perhaps the murkiest area of this type of research. Future reviews that evaluate the appropriateness of selected analytic strategies and provide specific decision-making guidelines for researchers would be a very useful contribution to the literature. Although six sources of standards apply to SCED research reviewed in this article, five of them were developed almost exclusively to inform psychological and behavioral intervention research. The principles of SCED research remain the same in different contexts, but there is a need for non–intervention scientists to weigh in on these standards.

Finally, this article provides a first step in the synthesis of the available SCED reporting guidelines. However, it does not resolve disagreements, nor does it purport to be a definitive source. In the future, an entity with the authority to construct such a document ought to convene and establish a foundational, adaptable, and agreed-upon set of guidelines that cuts across subspecialties but is applicable to many, if not all, areas of psychological research, which is perhaps an idealistic goal. Certain preferences will undoubtedly continue to dictate what constitutes acceptable practice in each subspecialty of psychology, but uniformity along critical dimensions will help advance SCED research.

Conclusions

The first decade of the twenty-first century has seen an upwelling of SCED research across nearly all areas of psychology. This article contributes updated benchmarks in terms of the frequency with which SCED design and methodology characteristics are used, including the number of baseline observations, assessment and measurement practices, and data analytic approaches, most of which are largely consistent with previously reported benchmarks. However, this review is much broader than those of previous research teams and also breaks down the characteristics of single-case research by the predominant design. With the recent SCED proliferation came a number of standards for the conduct and reporting of such research. This article also provides a much-needed synthesis of recent SCED standards that can inform the work of researchers, reviewers, and funding agencies conducting and evaluating single-case research, which reveals many areas of consensus as well as areas of significant disagreement. It appears that the question of where to go next is very relevant at this point in time. The majority of the research design and measurement characteristics of the SCED are reasonably well established, and the results of this review suggest general practice that is in accord with existing standards and guidelines, at least in regard to published peer-reviewed works. In general, the published literature appears to be meeting the basic design and measurement requirement to ensure adequate internal validity of SCED studies.

Consensus regarding the superiority of any one analytic method stands out as an area of divergence. Judging by the current literature and lack of consensus, researchers will need to carefully select a method that matches the research design, hypotheses, and intended conclusions of the study, while also considering the most up-to-date empirical support for the chosen analytic method, whether it be visual or statistical. In some cases the number of observations and subjects in the study will dictate which analytic methods can and cannot be used. In the case of the true N -of-1 experiment, there are relatively few sound analytic methods, and even fewer that are robust with shorter data streams (see Borckardt et al., 2008 ). As the number of observations and subjects increases, sophisticated modeling techniques, such as MLM, SEM, and ARMA, become applicable. Trends in the data and autocorrelation further obfuscate the development of a clear statistical analysis selection algorithm, which currently does not exist. Autocorrelation was rarely addressed or discussed in the articles reviewed, except when the selected statistical analysis dictated consideration. Given the empirical evidence regarding the effect of autocorrelation on visual and statistical analysis, researchers need to address this more explicitly. Missing-data considerations are similarly left out when they are unnecessary for analytic purposes. As newly devised statistical analysis approaches mature and are compared with one another for appropriateness in specific SCED applications, guidelines for statistical analysis will necessarily be revised. Similarly, empirically derived guidance, in the form of a decision tree, must be developed to ensure application of appropriate methods based on characteristics of the data and the research questions being addressed. Researchers could also benefit from tutorials and comparative reviews of different software packages: This is a needed area of future research. Powerful and reliable statistical analyses help move the SCED up the ladder of experimental designs and attenuate the view that the method applies primarily to pilot studies and idiosyncratic research questions and situations.

Another potential future advancement of SCED research comes in the area of measurement. Currently, SCED research gives significant weight to observer ratings and seems to discourage other forms of data collection methods. This is likely due to the origins of the SCED in behavioral assessment and applied behavior analysis, which remains a present-day stronghold. The dearth of EMA and diary-like sampling procedures within the SCED research reviewed, yet their ever-growing prevalence in the larger psychological research arena, highlights an area for potential expansion. Observational measurement, although reliable and valid in many contexts, is time and resource intensive and not feasible in all areas in which psychologists conduct research. It seems that numerous untapped research questions are stifled because of this measurement constraint. SCED researchers developing updated standards in the future should include guidelines for the appropriate measurement requirement of non-observer-reported data. For example, the results of this review indicate that reporting of repeated measurements, particularly the high-density type found in diary and EMA sampling strategies, ought to be more clearly spelled out, with specific attention paid to autocorrelation and trend in the data streams. In the event that SCED researchers adopt self-reported assessment strategies as viable alternatives to observation, a set of standards explicitly identifying the necessary psychometric properties of the measures and specific items used would be in order.

Along similar lines, SCED researchers could take a page from other areas of psychology that champion multimethod and multisource evaluation of primary outcomes. In this way, the long-standing tradition of observational assessment and the cutting-edge technological methods of EMA and daily diary could be married with the goal of strengthening conclusions drawn from SCED research and enhancing the validity of self-reported outcome assessment. The results of this review indicate that they rarely intersect today, and I urge SCED researchers to adopt other methods of assessment informed by time-series, daily diary, and EMA methods. The EMA standards could serve as a jumping-off point for refined measurement and assessment reporting standards in the context of multimethod SCED research.

One limitation of the current SCED standards is their relatively limited scope. To clarify, with the exception of the Stone & Shiffman EMA reporting guidelines, the other five sources of standards were developed in the context of designing and evaluating intervention research. Although this is likely to remain its patent emphasis, SCEDs are capable of addressing other pertinent research questions in the psychological sciences, and the current standards truly only roughly approximate salient crosscutting SCED characteristics. I propose developing broad SCED guidelines that address the specific design, measurement, and analysis issues in a manner that allows it to be useful across applications, as opposed to focusing solely on intervention effects. To accomplish this task, methodology experts across subspecialties in psychology would need to convene. Admittedly this is no small task.

Perhaps funding agencies will also recognize the fiscal and practical advantages of SCED research in certain areas of psychology. One example is in the field of intervention effectiveness, efficacy, and implementation research. A few exemplary studies using robust forms of SCED methodology are needed in the literature. Case-based methodologies will never supplant the group design as the gold standard in experimental applications, nor should that be the goal. Instead, SCEDs provide a viable and valid alternative experimental methodology that could stimulate new areas of research and answer questions that group designs cannot. With the astonishing number of studies emerging every year that use single-case designs and explore the methodological aspects of the design, we are poised to witness and be a part of an upsurge in the sophisticated application of the SCED. When federal grant-awarding agencies and journal editors begin to use formal standards while making funding and publication decisions, the field will benefit.

Last, for the practice of SCED research to continue and mature, graduate training programs must provide students with instruction in all areas of the SCED. This is particularly true of statistical analysis techniques that are not often taught in departments of psychology and education, where the vast majority of SCED studies seem to be conducted. It is quite the conundrum that the best available statistical analytic methods are often cited as being inaccessible to social science researchers who conduct this type of research. This need not be the case. To move the field forward, emerging scientists must be able to apply the most state-of-the-art research designs, measurement techniques, and analytic methods.

Acknowledgments

Research support for the author was provided by research training grant MH20012 from the National Institute of Mental Health, awarded to Elizabeth A. Stormshak. The author gratefully acknowledges Robert Horner and Laura Lee McIntyre, University of Oregon; Michael Nash, University of Tennessee; John Ferron, University of South Florida; the Action Editor, Lisa Harlow, and the anonymous reviewers for their thoughtful suggestions and guidance in shaping this article; Cheryl Mikkola for her editorial support; and Victoria Mollison for her assistance in the systematic review process.

Appendix. Results of Systematic Review Search and Studies Included in the Review

Psycinfo search conducted july 2011.

  • Alternating treatment design
  • Changing criterion design
  • Experimental case*
  • Multiple baseline design
  • Replicated single-case design
  • Simultaneous treatment design
  • Time-series design
  • Quantitative study OR treatment outcome/randomized clinical trial
  • NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study
  • Publication range: 2000–2010
  • Published in peer-reviewed journals
  • Available in the English Language

Bibliography

(* indicates inclusion in study: N = 409)

1 Autocorrelation estimates in this range can be caused by trends in the data streams, which creates complications in terms of detecting level-change effects. The Smith et al. (in press) study used a Monte Carlo simulation to control for trends in the data streams, but trends are likely to exist in real-world data with high lag-1 autocorrelation estimates.

2 The author makes no endorsement regarding the superiority of any statistical program or package over another by their mention or exclusion in this article. The author also has no conflicts of interest in this regard.

3 However, it should be noted that it was often very difficult to locate an actual effect size reported in studies that used statistical analysis. Although this issue would likely have added little to this review, it does inhibit the inclusion of the results in meta-analysis.

  • Albright JJ, Marinova DM. Estimating multilevel modelsuUsing SPSS, Stata, and SAS. Indiana University; 2010. Retrieved from http://www.iub.edu/%7Estatmath/stat/all/hlm/hlm.pdf . [ Google Scholar ]
  • Allison DB, Gorman BS. Calculating effect sizes for meta-analysis: The case of the single case. Behavior Research and Therapy. 1993; 31 (6):621–631. doi: 10.1016/0005-7967(93)90115-B. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alloy LB, Just N, Panzarella C. Attributional style, daily life events, and hopelessness depression: Subtype validation by prospective variability and specificity of symptoms. Cognitive Therapy Research. 1997; 21 :321–344. doi: 10.1023/A:1021878516875. [ CrossRef ] [ Google Scholar ]
  • Arbuckle JL. Amos (Version 7.0) Chicago, IL: SPSS, Inc; 2006. [ Google Scholar ]
  • Barlow DH, Nock MK, Hersen M. Single case research designs: Strategies for studying behavior change. 3. New York, NY: Allyn and Bacon; 2008. [ Google Scholar ]
  • Barrett LF, Barrett DJ. An introduction to computerized experience sampling in psychology. Social Science Computer Review. 2001; 19 (2):175–185. doi: 10.1177/089443930101900204. [ CrossRef ] [ Google Scholar ]
  • Bloom M, Fisher J, Orme JG. Evaluating practice: Guidelines for the accountable professional. 4. Boston, MA: Allyn & Bacon; 2003. [ Google Scholar ]
  • Bolger N, Davis A, Rafaeli E. Diary methods: Capturing life as it is lived. Annual Review of Psychology. 2003; 54 :579–616. doi: 10.1146/annurev.psych.54.101601.145030. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borckardt JJ. Simulation Modeling Analysis: Time series analysis program for short time series data streams (Version 8.3.3) Charleston, SC: Medical University of South Carolina; 2006. [ Google Scholar ]
  • Borckardt JJ, Nash MR, Murphy MD, Moore M, Shaw D, O’Neil P. Clinical practice as natural laboratory for psychotherapy research. American Psychologist. 2008; 63 :1–19. doi: 10.1037/0003-066X.63.2.77. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borsboom D, Mellenbergh GJ, van Heerden J. The theoretical status of latent variables. Psychological Review. 2003; 110 (2):203–219. doi: 10.1037/0033-295X.110.2.203. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bower GH. Mood and memory. American Psychologist. 1981; 36 (2):129–148. doi: 10.1037/0003-066x.36.2.129. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Box GEP, Jenkins GM. Time-series analysis: Forecasting and control. San Francisco, CA: Holden-Day; 1970. [ Google Scholar ]
  • Brossart DF, Parker RI, Olson EA, Mahadevan L. The relationship between visual analysis and five statistical analyses in a simple AB single-case research design. Behavior Modification. 2006; 30 (5):531–563. doi: 10.1177/0145445503261167. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Browne MW, Nesselroade JR. Representing psychological processes with dynamic factor models: Some promising uses and extensions of autoregressive moving average time series models. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics: A festschrift for Roderick P McDonald. Mahwah, NJ: Lawrence Erlbaum Associates Publishers; 2005. pp. 415–452. [ Google Scholar ]
  • Busk PL, Marascuilo LA. Statistical analysis in single-case research: Issues, procedures, and recommendations, with applications to multiple behaviors. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc; 1992. pp. 159–185. [ Google Scholar ]
  • Busk PL, Marascuilo RC. Autocorrelation in single-subject research: A counterargument to the myth of no autocorrelation. Behavioral Assessment. 1988; 10 :229–242. [ Google Scholar ]
  • Campbell JM. Statistical comparison of four effect sizes for single-subject designs. Behavior Modification. 2004; 28 (2):234–246. doi: 10.1177/0145445503259264. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carr EG, Horner RH, Turnbull AP, Marquis JG, Magito McLaughlin D, McAtee ML, Doolabh A. Positive behavior support for people with developmental disabilities: A research synthesis. Washington, DC: American Association on Mental Retardation; 1999. [ Google Scholar ]
  • Center BA, Skiba RJ, Casey A. A methodology for the quantitative synthesis of intra-subject design research. Journal of Educational Science. 1986; 19 :387–400. doi: 10.1177/002246698501900404. [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Hollon SD. Defining empirically supported therapies. Journal of Consulting and Clinical Psychology. 1998; 66 (1):7–18. doi: 10.1037/0022-006X.66.1.7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Ollendick TH. Empirically supported psychological interventions: Controversies and evidence. Annual Review of Psychology. 2001; 52 :685–716. doi: 10.1146/annurev.psych.52.1.685. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chow S-M, Ho M-hR, Hamaker EL, Dolan CV. Equivalence and differences between structural equation modeling and state-space modeling techniques. Structural Equation Modeling. 2010; 17 (2):303–332. doi: 10.1080/10705511003661553. [ CrossRef ] [ Google Scholar ]
  • Cohen J. Statistical power analysis for the bahavioral sciences. 2. Hillsdale, NJ: Erlbaum; 1988. [ Google Scholar ]
  • Cohen J. The earth is round (p < .05) American Psychologist. 1994; 49 :997–1003. doi: 10.1037/0003-066X.49.12.997. [ CrossRef ] [ Google Scholar ]
  • Crosbie J. Interrupted time-series analysis with brief single-subject data. Journal of Consulting and Clinical Psychology. 1993; 61 (6):966–974. doi: 10.1037/0022-006X.61.6.966. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dattilio FM, Edwards JA, Fishman DB. Case studies within a mixed methods paradigm: Toward a resolution of the alienation between researcher and practitioner in psychotherapy research. Psychotherapy: Theory, Research, Practice, Training. 2010; 47 (4):427–441. doi: 10.1037/a0021181. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dempster A, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977; 39 (1):1–38. [ Google Scholar ]
  • Des Jarlais DC, Lyles C, Crepaz N. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. American Journal of Public Health. 2004; 94 (3):361–366. doi: 10.2105/ajph.94.3.361. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Diggle P, Liang KY. Analyses of longitudinal data. New York: Oxford University Press; 2001. [ Google Scholar ]
  • Doss BD, Atkins DC. Investigating treatment mediators when simple random assignment to a control group is not possible. Clinical Psychology: Science and Practice. 2006; 13 (4):321–336. doi: 10.1111/j.1468-2850.2006.00045.x. [ CrossRef ] [ Google Scholar ]
  • du Toit SHC, Browne MW. The covariance structure of a vector ARMA time series. In: Cudeck R, du Toit SHC, Sörbom D, editors. Structural equation modeling: Present and future. Lincolnwood, IL: Scientific Software International; 2001. pp. 279–314. [ Google Scholar ]
  • du Toit SHC, Browne MW. Structural equation modeling of multivariate time series. Multivariate Behavioral Research. 2007; 42 :67–101. doi: 10.1080/00273170701340953. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fechner GT. Elemente der psychophysik [Elements of psychophysics] Leipzig, Germany: Breitkopf & Hartel; 1889. [ Google Scholar ]
  • Ferron J, Sentovich C. Statistical power of randomization tests used with multiple-baseline designs. The Journal of Experimental Education. 2002; 70 :165–178. doi: 10.1080/00220970209599504. [ CrossRef ] [ Google Scholar ]
  • Ferron J, Ware W. Analyzing single-case data: The power of randomization tests. The Journal of Experimental Education. 1995; 63 :167–178. [ Google Scholar ]
  • Fox J. TEACHER’S CORNER: Structural equation modeling with the sem package in R. Structural Equation Modeling: A Multidisciplinary Journal. 2006; 13 (3):465–486. doi: 10.1207/s15328007sem1303_7. [ CrossRef ] [ Google Scholar ]
  • Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahwah, NJ: Lawrence Erlbaum Associates; 1997. [ Google Scholar ]
  • Franklin RD, Gorman BS, Beasley TM, Allison DB. Graphical display and visual analysis. In: Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahway, NJ: Lawrence Erlbaum Associates, Publishers; 1997. pp. 119–158. [ Google Scholar ]
  • Gardner W, Mulvey EP, Shaw EC. Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological Bulletin. 1995; 118 (3):392–404. doi: 10.1037/0033-2909.118.3.392. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Green AS, Rafaeli E, Bolger N, Shrout PE, Reis HT. Paper or plastic? Data equivalence in paper and electronic diaries. Psychological Methods. 2006; 11 (1):87–105. doi: 10.1037/1082-989X.11.1.87. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hamilton JD. Time series analysis. Princeton, NJ: Princeton University Press; 1994. [ Google Scholar ]
  • Hammond D, Gast DL. Descriptive analysis of single-subject research designs: 1983–2007. Education and Training in Autism and Developmental Disabilities. 2010; 45 :187–202. [ Google Scholar ]
  • Hanson MD, Chen E. Daily stress, cortisol, and sleep: The moderating role of childhood psychosocial environments. Health Psychology. 2010; 29 (4):394–402. doi: 10.1037/a0019879. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Harvey AC. Forecasting, structural time series models and the Kalman filter. Cambridge, MA: Cambridge University Press; 2001. [ Google Scholar ]
  • Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M. The use of single-subject research to identify evidence-based practice in special education. Exceptional Children. 2005; 71 :165–179. [ Google Scholar ]
  • Horner RH, Spaulding S. Single-case research designs. In: Salkind NJ, editor. Encyclopedia of research design. Thousand Oaks, CA: Sage Publications; 2010. [ Google Scholar ]
  • Horton NJ, Kleinman KP. Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician. 2007; 61 (1):79–90. doi: 10.1198/000313007X172556. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hser Y, Shen H, Chou C, Messer SC, Anglin MD. Analytic approaches for assessing long-term treatment effects. Evaluation Review. 2001; 25 (2):233–262. doi: 10.1177/0193841X0102500206. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huitema BE. Autocorrelation in applied behavior analysis: A myth. Behavioral Assessment. 1985; 7 (2):107–118. [ Google Scholar ]
  • Huitema BE, McKean JW. Reduced bias autocorrelation estimation: Three jackknife methods. Educational and Psychological Measurement. 1994; 54 (3):654–665. doi: 10.1177/0013164494054003008. [ CrossRef ] [ Google Scholar ]
  • Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH. Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association. 2005; 100 (469):332–346. doi: 10.1198/016214504000001844. [ CrossRef ] [ Google Scholar ]
  • Institute of Medicine. Reducing risks for mental disorders: Frontiers for preventive intervention research. Washington, DC: National Academy Press; 1994. [ PubMed ] [ Google Scholar ]
  • Jacobsen NS, Christensen A. Studying the effectiveness of psychotherapy: How well can clinical trials do the job? American Psychologist. 1996; 51 :1031–1039. doi: 10.1037/0003-066X.51.10.1031. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones RR, Vaught RS, Weinrott MR. Time-series analysis in operant research. Journal of Behavior Analysis. 1977; 10 (1):151–166. doi: 10.1901/jaba.1977.10-151. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones WP. Single-case time series with Bayesian analysis: A practitioner’s guide. Measurement and Evaluation in Counseling and Development. 2003; 36 (28–39) [ Google Scholar ]
  • Kanfer H. Self-monitoring: Methodological limitations and clinical applications. Journal of Consulting and Clinical Psychology. 1970; 35 (2):148–152. doi: 10.1037/h0029874. [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Drawing valid inferences from case studies. Journal of Consulting and Clinical Psychology. 1981; 49 (2):183–192. doi: 10.1037/0022-006X.49.2.183. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Mediators and mechanisms of change in psychotherapy research. Annual Review of Clinical Psychology. 2007; 3 :1–27. doi: 10.1146/annurev.clinpsy.3.022806.091432. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care. American Psychologist. 2008; 63 (3):146–159. doi: 10.1037/0003-066X.63.3.146. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Understanding how and why psychotherapy leads to change. Psychotherapy Research. 2009; 19 (4):418–428. doi: 10.1080/10503300802448899. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Single-case research designs: Methods for clinical and applied settings. 2. New York, NY: Oxford University Press; 2010. [ Google Scholar ]
  • Kirk RE. Practical significance: A concept whose time has come. Educational and Psychological Measurement. 1996; 56 :746–759. doi: 10.1177/0013164496056005002. [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR. Preparing psychologists for evidence-based school practice: Lessons learned and challenges ahead. American Psychologist. 2007; 62 :829–843. doi: 10.1037/0003-066X.62.8.829. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Hitchcock J, Horner RH, Levin JR, Odom SL, Rindskopf DM, Shadish WR. Single-case designs technical documentation. 2010 Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf . Retrieved from http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf .
  • Kratochwill TR, Levin JR. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc; 1992. [ Google Scholar ]
  • Kratochwill TR, Levin JR. Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods. 2010; 15 (2):124–144. doi: 10.1037/a0017736. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Levin JR, Horner RH, Swoboda C. Visual analysis of single-case intervention research: Conceptual and methodological considerations (WCER Working Paper No. 2011-6) 2011 Retrieved from University of Wisconsin–Madison, Wisconsin Center for Education Research website: http://www.wcer.wisc.edu/publications/workingPapers/papers.php .
  • Lambert D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics. 1992; 34 (1):1–14. [ Google Scholar ]
  • Lambert MJ, Hansen NB, Harmon SC. Developing and Delivering Practice-Based Evidence. John Wiley & Sons, Ltd; 2010. Outcome Questionnaire System (The OQ System): Development and practical applications in healthcare settings; pp. 139–154. [ Google Scholar ]
  • Littell JH, Corcoran J, Pillai VK. Systematic reviews and meta-analysis. New York: Oxford University Press; 2008. [ Google Scholar ]
  • Liu LM, Hudack GB. The SCA statistical system. Vector ARMA modeling of multiple time series. Oak Brook, IL: Scientific Computing Associates Corporation; 1995. [ Google Scholar ]
  • Lubke GH, Muthén BO. Investigating population heterogeneity with factor mixture models. Psychological Methods. 2005; 10 (1):21–39. doi: 10.1037/1082-989x.10.1.21. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Manolov R, Solanas A. Comparing N = 1 effect sizes in presence of autocorrelation. Behavior Modification. 2008; 32 (6):860–875. doi: 10.1177/0145445508318866. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marshall RJ. Autocorrelation estimation of time series with randomly missing observations. Biometrika. 1980; 67 (3):567–570. doi: 10.1093/biomet/67.3.567. [ CrossRef ] [ Google Scholar ]
  • Matyas TA, Greenwood KM. Visual analysis of single-case time series: Effects of variability, serial dependence, and magnitude of intervention effects. Journal of Applied Behavior Analysis. 1990; 23 (3):341–351. doi: 10.1901/jaba.1990.23-341. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Chair Members of the Task Force on Evidence-Based Interventions in School Psychology. Procedural and coding manual for review of evidence-based interventions. 2003 Retrieved July 18, 2011 from http://www.sp-ebi.org/documents/_workingfiles/EBImanual1.pdf .
  • Moher D, Schulz KF, Altman DF the CONSORT Group. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomized trials. Journal of the American Medical Association. 2001; 285 :1987–1991. doi: 10.1001/jama.285.15.1987. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morgan DL, Morgan RK. Single-participant research design: Bringing science to managed care. American Psychologist. 2001; 56 (2):119–127. doi: 10.1037/0003-066X.56.2.119. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Muthén BO, Curran PJ. General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods. 1997; 2 (4):371–402. doi: 10.1037/1082-989x.2.4.371. [ CrossRef ] [ Google Scholar ]
  • Muthén LK, Muthén BO. Mplus (Version 6.11) Los Angeles, CA: Muthén & Muthén; 2010. [ Google Scholar ]
  • Nagin DS. Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods. 1999; 4 (2):139–157. doi: 10.1037/1082-989x.4.2.139. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • National Institute of Child Health and Human Development. Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4769) Washington, DC: U.S. Government Printing Office; 2000. [ Google Scholar ]
  • Olive ML, Smith BW. Effect size calculations and single subject designs. Educational Psychology. 2005; 25 (2–3):313–324. doi: 10.1080/0144341042000301238. [ CrossRef ] [ Google Scholar ]
  • Oslin DW, Cary M, Slaymaker V, Colleran C, Blow FC. Daily ratings measures of alcohol craving during an inpatient stay define subtypes of alcohol addiction that predict subsequent risk for resumption of drinking. Drug and Alcohol Dependence. 2009; 103 (3):131–136. doi: 10.1016/J.Drugalcdep.2009.03.009. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Palermo TP, Valenzuela D, Stork PP. A randomized trial of electronic versus paper pain diaries in children: Impact on compliance, accuracy, and acceptability. Pain. 2004; 107 (3):213–219. doi: 10.1016/j.pain.2003.10.005. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parker RI, Brossart DF. Evaluating single-case research data: A comparison of seven statistical methods. Behavior Therapy. 2003; 34 (2):189–211. doi: 10.1016/S0005-7894(03)80013-8. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Cryer J, Byrns G. Controlling baseline trend in single case research. School Psychology Quarterly. 2006; 21 (4):418–440. doi: 10.1037/h0084131. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Vannest K. An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy. 2009; 40 (4):357–367. doi: 10.1016/j.beth.2008.10.006. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parsonson BS, Baer DM. The analysis and presentation of graphic data. In: Kratochwill TR, editor. Single subject research. New York, NY: Academic Press; 1978. pp. 101–166. [ Google Scholar ]
  • Parsonson BS, Baer DM. The visual analysis of data, and current research into the stimuli controlling it. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ; England: Lawrence Erlbaum Associates, Inc; 1992. pp. 15–40. [ Google Scholar ]
  • Piasecki TM, Hufford MR, Solham M, Trull TJ. Assessing clients in their natural environments with electronic diaries: Rationale, benefits, limitations, and barriers. Psychological Assessment. 2007; 19 (1):25–43. doi: 10.1037/1040-3590.19.1.25. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2005. [ Google Scholar ]
  • Ragunathan TE. What do we do with missing data? Some options for analysis of incomplete data. Annual Review of Public Health. 2004; 25 :99–117. doi: 10.1146/annurev.publhealth.25.102802.124410. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Raudenbush SW, Bryk AS, Congdon R. HLM 7 Hierarchical Linear and Nonlinear Modeling. Scientific Software International, Inc; 2011. [ Google Scholar ]
  • Redelmeier DA, Kahneman D. Patients’ memories of painful medical treatments: Real-time and retrospective evaluations of two minimally invasive procedures. Pain. 1996; 66 (1):3–8. doi: 10.1016/0304-3959(96)02994-6. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Reis HT. Domains of experience: Investigating relationship processes from three perspectives. In: Erber R, Gilmore R, editors. Theoretical frameworks in personal relationships. Mahwah, NJ: Erlbaum; 1994. pp. 87–110. [ Google Scholar ]
  • Reis HT, Gable SL. Event sampling and other methods for studying everyday experience. In: Reis HT, Judd CM, editors. Handbook of research methods in social and personality psychology. New York, NY: Cambridge University Press; 2000. pp. 190–222. [ Google Scholar ]
  • Robey RR, Schultz MC, Crawford AB, Sinner CA. Single-subject clinical-outcome research: Designs, data, effect sizes, and analyses. Aphasiology. 1999; 13 (6):445–473. doi: 10.1080/026870399402028. [ CrossRef ] [ Google Scholar ]
  • Rossi PH, Freeman HE. Evaluation: A systematic approach. 5. Thousand Oaks, CA: Sage; 1993. [ Google Scholar ]
  • SAS Institute Inc. The SAS system for Windows, Version 9. Cary, NC: SAS Institute Inc; 2008. [ Google Scholar ]
  • Schmidt M, Perels F, Schmitz B. How to perform idiographic and a combination of idiographic and nomothetic approaches: A comparison of time series analyses and hierarchical linear modeling. Journal of Psychology. 2010; 218 (3):166–174. doi: 10.1027/0044-3409/a000026. [ CrossRef ] [ Google Scholar ]
  • Scollon CN, Kim-Pietro C, Diener E. Experience sampling: Promises and pitfalls, strengths and weaknesses. Assessing Well-Being. 2003; 4 :5–35. doi: 10.1007/978-90-481-2354-4_8. [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA. Summarizing single-subject research: Issues and applications. Behavior Modification. 1998; 22 (3):221–242. doi: 10.1177/01454455980223001. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA, Casto G. The quantitative synthesis of single-subject research. Remedial and Special Education. 1987; 8 (2):24–33. doi: 10.1177/074193258700800206. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin; 2002. [ Google Scholar ]
  • Shadish WR, Rindskopf DM, Hedges LV. The state of the science in the meta-analysis of single-case experimental designs. Evidence-Based Communication Assessment and Intervention. 2008; 3 :188–196. doi: 10.1080/17489530802581603. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Sullivan KJ. Characteristics of single-case designs used to assess treatment effects in 2008. Behavior Research Methods. 2011; 43 :971–980. doi: 10.3758/s13428-011-0111-y. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sharpley CF. Time-series analysis of behavioural data: An update. Behaviour Change. 1987; 4 :40–45. [ Google Scholar ]
  • Shiffman S, Hufford M, Hickcox M, Paty JA, Gnys M, Kassel JD. Remember that? A comparison of real-time versus retrospective recall of smoking lapses. Journal of Consulting and Clinical Psychology. 1997; 65 :292–300. doi: 10.1037/0022-006X.65.2.292.a. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shiffman S, Stone AA. Ecological momentary assessment: A new tool for behavioral medicine research. In: Krantz DS, Baum A, editors. Technology and methods in behavioral medicine. Mahwah, NJ: Erlbaum; 1998. pp. 117–131. [ Google Scholar ]
  • Shiffman S, Stone AA, Hufford MR. Ecological momentary assessment. Annual Review of Clinical Psychology. 2008; 4 :1–32. doi: 10.1146/annurev.clinpsy.3.022806.091415. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the EM Algorithm. Journal of Time Series Analysis. 1982; 3 (4):253–264. doi: 10.1111/j.1467-9892.1982.tb00349.x. [ CrossRef ] [ Google Scholar ]
  • Skinner BF. The behavior of organisms. New York, NY: Appleton-Century-Crofts; 1938. [ Google Scholar ]
  • Smith JD, Borckardt JJ, Nash MR. Inferential precision in single-case time-series datastreams: How well does the EM Procedure perform when missing observations occur in autocorrelated data? Behavior Therapy. doi: 10.1016/j.beth.2011.10.001. (in press) [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith JD, Handler L, Nash MR. Therapeutic Assessment for preadolescent boys with oppositional-defiant disorder: A replicated single-case time-series design. Psychological Assessment. 2010; 22 (3):593–602. doi: 10.1037/a0019697. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Snijders TAB, Bosker RJ. Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage; 1999. [ Google Scholar ]
  • Soliday E, Moore KJ, Lande MB. Daily reports and pooled time series analysis: Pediatric psychology applications. Journal of Pediatric Psychology. 2002; 27 (1):67–76. doi: 10.1093/jpepsy/27.1.67. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • SPSS Statistics. Chicago, IL: SPSS Inc; 2011. (Version 20.0.0) [ Google Scholar ]
  • StataCorp. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP; 2011. [ Google Scholar ]
  • Stone AA, Broderick JE, Kaell AT, Deles-Paul PAEG, Porter LE. Does the peak-end phenomenon observed in laboratory pain studies apply to real-world pain in rheumatoid arthritics? Journal of Pain. 2000; 1 :212–217. doi: 10.1054/jpai.2000.7568. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stone AA, Shiffman S. Capturing momentary, self-report data: A proposal for reporting guidelines. Annals of Behavioral Medicine. 2002; 24 :236–243. doi: 10.1207/S15324796ABM2403_09. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stout RL. Advancing the analysis of treatment process. Addiction. 2007; 102 :1539–1545. doi: 10.1111/j.1360-0443.2007.01880.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tate RL, McDonald S, Perdices M, Togher L, Schultz R, Savage S. Rating the methodological quality of single-subject designs and N-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Scale. Neuropsychological Rehabilitation. 2008; 18 (4):385–401. doi: 10.1080/09602010802009201. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Thiele C, Laireiter A-R, Baumann U. Diaries in clinical psychology and psychotherapy: A selective review. Clinical Psychology & Psychotherapy. 2002; 9 (1):1–37. doi: 10.1002/cpp.302. [ CrossRef ] [ Google Scholar ]
  • Tiao GC, Box GEP. Modeling multiple time series with applications. Journal of the American Statistical Association. 1981; 76 :802–816. [ Google Scholar ]
  • Tschacher W, Ramseyer F. Modeling psychotherapy process by time-series panel analysis (TSPA) Psychotherapy Research. 2009; 19 (4):469–481. doi: 10.1080/10503300802654496. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. A comparison of missing-data procedures for ARIMA time-series analysis. Educational and Psychological Measurement. 2005a; 65 (4):596–615. doi: 10.1177/0013164404272502. [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. Missing data and the general transformation approach to time series analysis. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics. A festschrift to Roderick P McDonald. Hillsdale, NJ: Lawrence Erlbaum; 2005b. pp. 509–535. [ Google Scholar ]
  • Velicer WF, Fava JL. Time series analysis. In: Schinka J, Velicer WF, Weiner IB, editors. Research methods in psychology. Vol. 2. New York, NY: John Wiley & Sons; 2003. [ Google Scholar ]
  • Wachtel PL. Beyond “ESTs”: Problematic assumptions in the pursuit of evidence-based practice. Psychoanalytic Psychology. 2010; 27 (3):251–272. doi: 10.1037/a0020532. [ CrossRef ] [ Google Scholar ]
  • Watson JB. Behaviorism. New York, NY: Norton; 1925. [ Google Scholar ]
  • Weisz JR, Hawley KM. Finding, evaluating, refining, and applying empirically supported treatments for children and adolescents. Journal of Clinical Child Psychology. 1998; 27 :206–216. doi: 10.1207/s15374424jccp2702_7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weisz JR, Hawley KM. Procedural and coding manual for identification of beneficial treatments. Washinton, DC: American Psychological Association, Society for Clinical Psychology, Division 12, Committee on Science and Practice; 1999. [ Google Scholar ]
  • Westen D, Bradley R. Empirically supported complexity. Current Directions in Psychological Science. 2005; 14 :266–271. doi: 10.1111/j.0963-7214.2005.00378.x. [ CrossRef ] [ Google Scholar ]
  • Westen D, Novotny CM, Thompson-Brenner HK. The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting controlled clinical trials. Psychological Bulletin. 2004; 130 :631–663. doi: 10.1037/0033-2909.130.4.631. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wilkinson L The Task Force on Statistical Inference. Statistical methods in psychology journals: Guidelines and explanations. American Psychologist. 1999; 54 :694–704. doi: 10.1037/0003-066X.54.8.594. [ CrossRef ] [ Google Scholar ]
  • Wolery M, Busick M, Reichow B, Barton EE. Comparison of overlap methods for quantitatively synthesizing single-subject data. The Journal of Special Education. 2010; 44 (1):18–28. doi: 10.1177/0022466908328009. [ CrossRef ] [ Google Scholar ]
  • Wu Z, Huang NE, Long SR, Peng C-K. On the trend, detrending, and variability of nonlinear and nonstationary time series. Proceedings of the National Academy of Sciences. 2007; 104 (38):14889–14894. doi: 10.1073/pnas.0701020104. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Descriptive Research and Case Studies

Learning objectives.

  • Explain the importance and uses of descriptive research, especially case studies, in studying abnormal behavior

Types of Research Methods

There are many research methods available to psychologists in their efforts to understand, describe, and explain behavior and the cognitive and biological processes that underlie it. Some methods rely on observational techniques. Other approaches involve interactions between the researcher and the individuals who are being studied—ranging from a series of simple questions; to extensive, in-depth interviews; to well-controlled experiments.

The three main categories of psychological research are descriptive, correlational, and experimental research. Research studies that do not test specific relationships between variables are called descriptive, or qualitative, studies . These studies are used to describe general or specific behaviors and attributes that are observed and measured. In the early stages of research, it might be difficult to form a hypothesis, especially when there is not any existing literature in the area. In these situations designing an experiment would be premature, as the question of interest is not yet clearly defined as a hypothesis. Often a researcher will begin with a non-experimental approach, such as a descriptive study, to gather more information about the topic before designing an experiment or correlational study to address a specific hypothesis. Descriptive research is distinct from correlational research , in which psychologists formally test whether a relationship exists between two or more variables. Experimental research goes a step further beyond descriptive and correlational research and randomly assigns people to different conditions, using hypothesis testing to make inferences about how these conditions affect behavior. It aims to determine if one variable directly impacts and causes another. Correlational and experimental research both typically use hypothesis testing, whereas descriptive research does not.

Each of these research methods has unique strengths and weaknesses, and each method may only be appropriate for certain types of research questions. For example, studies that rely primarily on observation produce incredible amounts of information, but the ability to apply this information to the larger population is somewhat limited because of small sample sizes. Survey research, on the other hand, allows researchers to easily collect data from relatively large samples. While surveys allow results to be generalized to the larger population more easily, the information that can be collected on any given survey is somewhat limited and subject to problems associated with any type of self-reported data. Some researchers conduct archival research by using existing records. While existing records can be a fairly inexpensive way to collect data that can provide insight into a number of research questions, researchers using this approach have no control on how or what kind of data was collected.

Correlational research can find a relationship between two variables, but the only way a researcher can claim that the relationship between the variables is cause and effect is to perform an experiment. In experimental research, which will be discussed later, there is a tremendous amount of control over variables of interest. While performing an experiment is a powerful approach, experiments are often conducted in very artificial settings, which calls into question the validity of experimental findings with regard to how they would apply in real-world settings. In addition, many of the questions that psychologists would like to answer cannot be pursued through experimental research because of ethical concerns.

The three main types of descriptive studies are case studies, naturalistic observation, and surveys.

Clinical or Case Studies

Psychologists can use a detailed description of one person or a small group based on careful observation.  Case studies  are intensive studies of individuals and have commonly been seen as a fruitful way to come up with hypotheses and generate theories. Case studies add descriptive richness. Case studies are also useful for formulating concepts, which are an important aspect of theory construction. Through fine-grained knowledge and description, case studies can fully specify the causal mechanisms in a way that may be harder in a large study.

Sigmund Freud   developed  many theories from case studies (Anna O., Little Hans, Wolf Man, Dora, etc.). F or example, he conducted a case study of a man, nicknamed “Rat Man,”  in which he claimed that this patient had been cured by psychoanalysis.  T he nickname derives from the fact that among the patient’s many compulsions, he had an obsession with nightmarish fantasies about rats. 

Today, more commonly, case studies reflect an up-close, in-depth, and detailed examination of an individual’s course of treatment. Case studies typically include a complete history of the subject’s background and response to treatment. From the particular client’s experience in therapy, the therapist’s goal is to provide information that may help other therapists who treat similar clients.

Case studies are generally a single-case design, but can also be a multiple-case design, where replication instead of sampling is the criterion for inclusion. Like other research methodologies within psychology, the case study must produce valid and reliable results in order to be useful for the development of future research. Distinct advantages and disadvantages are associated with the case study in psychology.

A commonly described limit of case studies is that they do not lend themselves to generalizability . The other issue is that the case study is subject to the bias of the researcher in terms of how the case is written, and that cases are chosen because they are consistent with the researcher’s preconceived notions, resulting in biased research. Another common problem in case study research is that of reconciling conflicting interpretations of the same case history.

Despite these limitations, there are advantages to using case studies. One major advantage of the case study in psychology is the potential for the development of novel hypotheses of the  cause of abnormal behavior   for later testing. Second, the case study can provide detailed descriptions of specific and rare cases and help us study unusual conditions that occur too infrequently to study with large sample sizes. The major disadvantage is that case studies cannot be used to determine causation, as is the case in experimental research, where the factors or variables hypothesized to play a causal role are manipulated or controlled by the researcher. 

Link to Learning: Famous Case Studies

Some well-known case studies that related to abnormal psychology include the following:

  • Harlow— Phineas Gage
  • Breuer & Freud (1895)— Anna O.
  • Cleckley’s case studies: on psychopathy ( The Mask of Sanity ) (1941) and multiple personality disorder ( The Three Faces of Eve ) (1957)
  • Freud and  Little Hans
  • Freud and the  Rat Man
  • John Money and the  John/Joan case
  • Genie (feral child)
  • Piaget’s studies
  • Rosenthal’s book on the  murder of Kitty Genovese
  • Washoe (sign language)
  • Patient H.M.

Naturalistic Observation

If you want to understand how behavior occurs, one of the best ways to gain information is to simply observe the behavior in its natural context. However, people might change their behavior in unexpected ways if they know they are being observed. How do researchers obtain accurate information when people tend to hide their natural behavior? As an example, imagine that your professor asks everyone in your class to raise their hand if they always wash their hands after using the restroom. Chances are that almost everyone in the classroom will raise their hand, but do you think hand washing after every trip to the restroom is really that universal?

This is very similar to the phenomenon mentioned earlier in this module: many individuals do not feel comfortable answering a question honestly. But if we are committed to finding out the facts about handwashing, we have other options available to us.

Suppose we send a researcher to a school playground to observe how aggressive or socially anxious children interact with peers. Will our observer blend into the playground environment by wearing a white lab coat, sitting with a clipboard, and staring at the swings? We want our researcher to be inconspicuous and unobtrusively positioned—perhaps pretending to be a school monitor while secretly recording the relevant information. This type of observational study is called naturalistic observation : observing behavior in its natural setting. To better understand peer exclusion, Suzanne Fanger collaborated with colleagues at the University of Texas to observe the behavior of preschool children on a playground. How did the observers remain inconspicuous over the duration of the study? They equipped a few of the children with wireless microphones (which the children quickly forgot about) and observed while taking notes from a distance. Also, the children in that particular preschool (a “laboratory preschool”) were accustomed to having observers on the playground (Fanger, Frankel, & Hazen, 2012).

woman in black leather jacket sitting on concrete bench

It is critical that the observer be as unobtrusive and as inconspicuous as possible: when people know they are being watched, they are less likely to behave naturally. For example, psychologists have spent weeks observing the behavior of homeless people on the streets, in train stations, and bus terminals. They try to ensure that their naturalistic observations are unobtrusive, so as to minimize interference with the behavior they observe. Nevertheless, the presence of the observer may distort the behavior that is observed, and this must be taken into consideration (Figure 1).

The greatest benefit of naturalistic observation is the validity, or accuracy, of information collected unobtrusively in a natural setting. Having individuals behave as they normally would in a given situation means that we have a higher degree of ecological validity, or realism, than we might achieve with other research approaches. Therefore, our ability to generalize the findings of the research to real-world situations is enhanced. If done correctly, we need not worry about people modifying their behavior simply because they are being observed. Sometimes, people may assume that reality programs give us a glimpse into authentic human behavior. However, the principle of inconspicuous observation is violated as reality stars are followed by camera crews and are interviewed on camera for personal confessionals. Given that environment, we must doubt how natural and realistic their behaviors are.

The major downside of naturalistic observation is that they are often difficult to set up and control. Although something as simple as observation may seem like it would be a part of all research methods, participant observation is a distinct methodology that involves the researcher embedding themselves into a group in order to study its dynamics. For example, Festinger, Riecken, and Shacter (1956) were very interested in the psychology of a particular cult. However, this cult was very secretive and wouldn’t grant interviews to outside members. So, in order to study these people, Festinger and his colleagues pretended to be cult members, allowing them access to the behavior and psychology of the cult. Despite this example, it should be noted that the people being observed in a participant observation study usually know that the researcher is there to study them. [1]

Another potential problem in observational research is observer bias . Generally, people who act as observers are closely involved in the research project and may unconsciously skew their observations to fit their research goals or expectations. To protect against this type of bias, researchers should have clear criteria established for the types of behaviors recorded and how those behaviors should be classified. In addition, researchers often compare observations of the same event by multiple observers, in order to test inter-rater reliability : a measure of reliability that assesses the consistency of observations by different observers.

Often, psychologists develop surveys as a means of gathering data. Surveys are lists of questions to be answered by research participants, and can be delivered as paper-and-pencil questionnaires, administered electronically, or conducted verbally (Figure 3). Generally, the survey itself can be completed in a short time, and the ease of administering a survey makes it easy to collect data from a large number of people.

Surveys allow researchers to gather data from larger samples than may be afforded by other research methods . A sample is a subset of individuals selected from a population , which is the overall group of individuals that the researchers are interested in. Researchers study the sample and seek to generalize their findings to the population.

A sample online survey reads, “Dear visitor, your opinion is important to us. We would like to invite you to participate in a short survey to gather your opinions and feedback on your news consumption habits. The survey will take approximately 10-15 minutes. Simply click the “Yes” button below to launch the survey. Would you like to participate?” Two buttons are labeled “yes” and “no.”

There is both strength and weakness in surveys when compared to case studies. By using surveys, we can collect information from a larger sample of people. A larger sample is better able to reflect the actual diversity of the population, thus allowing better generalizability. Therefore, if our sample is sufficiently large and diverse, we can assume that the data we collect from the survey can be generalized to the larger population with more certainty than the information collected through a case study. However, given the greater number of people involved, we are not able to collect the same depth of information on each person that would be collected in a case study.

Another potential weakness of surveys is something we touched on earlier in this module: people do not always give accurate responses. They may lie, misremember, or answer questions in a way that they think makes them look good. For example, people may report drinking less alcohol than is actually the case.

Any number of research questions can be answered through the use of surveys. One real-world example is the research conducted by Jenkins, Ruppel, Kizer, Yehl, and Griffin (2012) about the backlash against the U.S. Arab-American community following the terrorist attacks of September 11, 2001. Jenkins and colleagues wanted to determine to what extent these negative attitudes toward Arab-Americans still existed nearly a decade after the attacks occurred. In one study, 140 research participants filled out a survey with 10 questions, including questions asking directly about the participant’s overt prejudicial attitudes toward people of various ethnicities. The survey also asked indirect questions about how likely the participant would be to interact with a person of a given ethnicity in a variety of settings (such as, “How likely do you think it is that you would introduce yourself to a person of Arab-American descent?”). The results of the research suggested that participants were unwilling to report prejudicial attitudes toward any ethnic group. However, there were significant differences between their pattern of responses to questions about social interaction with Arab-Americans compared to other ethnic groups: they indicated less willingness for social interaction with Arab-Americans compared to the other ethnic groups. This suggested that the participants harbored subtle forms of prejudice against Arab-Americans, despite their assertions that this was not the case (Jenkins et al., 2012).

Think it Over

Research has shown that parental depressive symptoms are linked to a number of negative child outcomes. A classmate of yours is interested in  the associations between parental depressive symptoms and actual child behaviors in everyday life [2] because this associations remains largely unknown. After reading this section, what do you think is the best way to better understand such associations? Which method might result in the most valid data?

clinical or case study:  observational research study focusing on one or a few people

correlational research:  tests whether a relationship exists between two or more variables

descriptive research:  research studies that do not test specific relationships between variables; they are used to describe general or specific behaviors and attributes that are observed and measured

experimental research:  tests a hypothesis to determine cause-and-effect relationships

generalizability:  inferring that the results for a sample apply to the larger population

inter-rater reliability:  measure of agreement among observers on how they record and classify a particular event

naturalistic observation:  observation of behavior in its natural setting

observer bias:  when observations may be skewed to align with observer expectations

population:  overall group of individuals that the researchers are interested in

sample:  subset of individuals selected from the larger population

survey:  list of questions to be answered by research participants—given as paper-and-pencil questionnaires, administered electronically, or conducted verbally—allowing researchers to collect data from a large number of people

CC Licensed Content, Shared Previously

  • Descriptive Research and Case Studies . Authored by : Sonja Ann Miller for Lumen Learning.  Provided by : Lumen Learning.  License :  CC BY-SA: Attribution-ShareAlike
  • Approaches to Research.  Authored by : OpenStax College.  Located at :  http://cnx.org/contents/[email protected]:iMyFZJzg@5/Approaches-to-Research .  License :  CC BY: Attribution .  License Terms : Download for free at http://cnx.org/contents/[email protected]
  • Descriptive Research.  Provided by : Boundless.  Located at :  https://www.boundless.com/psychology/textbooks/boundless-psychology-textbook/researching-psychology-2/types-of-research-studies-27/descriptive-research-124-12659/ .  License :  CC BY-SA: Attribution-ShareAlike
  • Case Study.  Provided by : Wikipedia.  Located at :  https://en.wikipedia.org/wiki/Case_study .  License :  CC BY-SA: Attribution-ShareAlike
  • Rat man.  Provided by : Wikipedia.  Located at :  https://en.wikipedia.org/wiki/Rat_Man#Legacy .  License :  CC BY-SA: Attribution-ShareAlike
  • Case study in psychology.  Provided by : Wikipedia.  Located at :  https://en.wikipedia.org/wiki/Case_study_in_psychology .  License :  CC BY-SA: Attribution-ShareAlike
  • Research Designs.  Authored by : Christie Napa Scollon.  Provided by : Singapore Management University.  Located at :  https://nobaproject.com/modules/research-designs#reference-6 .  Project : The Noba Project.  License :  CC BY-NC-SA: Attribution-NonCommercial-ShareAlike
  • Single subject design.  Provided by : Wikipedia.  Located at :  https://en.wikipedia.org/wiki/Single-subject_design .  License :  CC BY-SA: Attribution-ShareAlike
  • Single subject research.  Provided by : Wikipedia.  Located at :  https://en.wikipedia.org/wiki/Single-subject_research#A-B-A-B .  License :  Public Domain: No Known Copyright
  • Pills.  Authored by : qimono.  Provided by : Pixabay.  Located at :  https://pixabay.com/illustrations/pill-capsule-medicine-medical-1884775/ .  License :  CC0: No Rights Reserved
  • ABAB Design.  Authored by : Doc. Yu.  Provided by : Wikimedia.  Located at :  https://commons.wikimedia.org/wiki/File:A-B-A-B_Design.png .  License :  CC BY-SA: Attribution-ShareAlike
  • Scollon, C. N. (2020). Research designs. In R. Biswas-Diener & E. Diener (Eds), Noba textbook series: Psychology. Champaign, IL: DEF publishers. Retrieved from http://noba.to/acxb2thy ↵
  • Slatcher, R. B., & Trentacosta, C. J. (2011). A naturalistic observation study of the links between parental depressive symptoms and preschoolers' behaviors in everyday life. Journal of family psychology : JFP : journal of the Division of Family Psychology of the American Psychological Association (Division 43), 25(3), 444–448. https://doi.org/10.1037/a0023728 ↵

Descriptive Research and Case Studies Copyright © by Meredith Palm is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

Take your Career to the Next Level

Texas Lean Six Sigma logo

817.272.5947

Get More Info

Design of Experiments: A Case Study – Part 1

Home » Design of Experiments: A Case Study – Part 1

case study design of experiments

Co-Authored by Alberto Yáñez-Moreno, Ph.D and Russ Aikman, MBB

Recently TMAC was providing coaching for a Black Belt working in the Analyze Phase of his project. He was trying to figure out how to increase the output for a manufacturing process. The company was in a sold-out position – demand exceeded supply. The process was production of glass fibers used in the manufacture of different types of insulation. The manufacture of glass fibers is a highly complex process. The BB determined additional capacity could result in a sales increase of over $1M. Past efforts in determining root cause of the capacity issue had been only partly successful. After discussion with the belt and his sponsor a decision was made to use Design of Experiments (DOE).

Before going into detail about this particular project, let’s start with some fundamental questions:

  • What is a Designed Experiment?
  • When is DOE recommended?
  • What other root cause tools might be used?
  • What are the pros and cons of DOE compared to other tools?

To answer these questions, we should begin with a review of Analyze Phase concepts. All LSS practitioners are familiar with the equation: Y = f(X1, X2, X3,…), where the “Y” is the process output or response variable. It is also called the Key Process Output Variable (KPOV) or the dependent variable.  In terms of LSS, the “Y” is related to what we want to predict or how are we going to measure the success of the project.  The “Xs” represent process inputs, and may also be called factors, Key Process Input Variables (KPIVs), or independent variables.  The “f” is simply the function or formula that would allow us to predict the “Y”.

Before considering DOE, always start with more basic tools. If you have no hard data, then you can try tools like brainstorming, fishbone diagram, or C&E matrix. Such tools are always acceptable in the early stages of root cause analysis. They are simple and easy to use. And for less complex problems basic root cause tools work just fine.

But for more complex problems there is a big downside to these tools: We do not get a mathematical model. Another downside: Basic root cause tools are very dependent on process knowledge. And hence, are subject to people’s opinions. Still, basic root cause tools can be very helpful in determining what input variables to consider for more advanced tools like DOE.

Let’s say you do have historical X and Y data. In this case a good place to start is with either Simple or Multiple Linear Regression. Both can be used to generate mathematical models. And with powerful statistical software like Minitab development of a mathematical is relatively easy.

The limitation of Simple Regression is it only has one X in the model. Most real-world systems are much too complex to be modeled with a single input. Hence, Multiple Regression is a better option because it allows models with two or more inputs. Very sophisticated models can be developed using Multiple Regression.

What are the downsides to Multiple Regression? A big negative is that it does not identify interactions. An interaction occurs when changes to two or more inputs cause an effect on the output. To be fair, a person with deep knowledge of statistics can use regression to create a model which includes interactions. But for the typical LSS practitioner this is not easy to do.

There is another downside to all three of these tools: They are limited by the historical data available. Imagine a process where there is a big change in the Y when the X gets to a certain level. For example, an annealing process where oven temperature greatly impacts steel hardness once it gets above a certain level. You would never learn this if day to day operations limit oven temperatures to lower levels. Another example: Say there is a process involving inputs of time and pressure. If there is an interaction between these two inputs you might never learn of the interaction unless they were changed in a specific way.

Now, back to the questions at the start of this blog. First, what is Design of Experiments? DOE is a strategy for conducting scientific investigation of a process for the purpose of gaining information about the process.  This is accomplished by actively changing specific inputs to determine their effect on outputs.  A key thing to keep in mind: DOE will provide the most information possible using the least amount of resources. Stated differently, Designed Experiments allow you to learn more about how a process works with less data.

When is DOE a good tool to use?

This is the first key differentiator between DOE and regression: How data are collected. With regression, data are gathered as a part of normal operations. In other words, when the process is running normally output data is collected along with the regular input settings. The values for the input setting are typically chosen from standard operating procedures or product specifications.

With Designed Experiments data collection is different. First, the input values for the designed experiment are chosen so as to learn as much as possible about their effect on the process output. This means the input values chosen – known as the factor levels – are often outside the normal range of input settings. Most DOE experiments have just two levels – low and high for a continuous input. Each experimental run consists of a combination of input settings at these low and high values.

Imagine a process where the typical range for an input, say pressure, is 100 to 200 psi. For a Designed Experiment the two levels chosen might be 80 (low) and 220 (high) psi. Choosing input levels outside the regular operating range is likely to result in greater insight about how that input impacts the output.

Another differentiator with Designed Experiments is that changes in the input levels from experimental run to run are done randomly. The use of randomization is of critical importance to DOEs. By randomizing the experiment, a practitioner minimizes the risk of confounding . This occurs when there appears to be an effect due to one input when in reality it is due to a different input. Or the effect may be due to a ‘noise’ variable, such as ambient temperature or barometric pressure. Confounding can – and does – occur as part of regular, day-to-day operations. And hence can greatly complicate the use of regression as a root cause tool.

Finally, DOE is very good at separating out the effects of specific inputs and interactions. Other tools, like regression, are not as effective at achieving this goal which is important in development of a useful mathematical model.

Other advantages of Designed Experiments:

  • Use a structured, planned approach
  • Statistical significance of results is known
  • Can be used for screening out factors (inputs) that have no effect on the response
  • Can be used to determine optimal values for inputs to achieve project goals
  • It can be applied in a wide variety of businesses including agriculture, manufacturing, service, marketing, social work, and medical research

Comparison of Root Cause Tools

Comparison of Root Cause Tools Table

In Part 2 we will discuss how to plan and conduct a DOE. And we’ll will share information about the experiment conducted by the Black Belt who received TMAC coaching, and the results he achieved.

TMAC - The University of Texas at Arlington 7300 Jack Newell Blvd South Fort Worth, Texas 76118

(817) 272-5947

case study design of experiments

As a NIST MEP approved Center, TMAC is the official representative of the MEP National Network in Texas.

2024 ©  Lean Six Sigma . TMAC Texas MEP

  • https://www.facebook.com/tmac.org

Pardon Our Interruption

As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

  • You've disabled JavaScript in your web browser.
  • You're a power user moving through this website with super-human speed.
  • You've disabled cookies in your web browser.
  • A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article .

To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 08 September 2024

Longitudinal analysis of teacher self-efficacy evolution during a STEAM professional development program: a qualitative case study

  • Haozhe Jiang   ORCID: orcid.org/0000-0002-7870-0993 1 ,
  • Ritesh Chugh   ORCID: orcid.org/0000-0003-0061-7206 2 ,
  • Xuesong Zhai   ORCID: orcid.org/0000-0002-4179-7859 1 , 3   nAff7 ,
  • Ke Wang 4 &
  • Xiaoqin Wang 5 , 6  

Humanities and Social Sciences Communications volume  11 , Article number:  1162 ( 2024 ) Cite this article

28 Accesses

Metrics details

Despite the widespread advocacy for the integration of arts and humanities (A&H) into science, technology, engineering, and mathematics (STEM) education on an international scale, teachers face numerous obstacles in practically integrating A&H into STEM teaching (IAT). To tackle the challenges, a comprehensive five-stage framework for teacher professional development programs focussed on IAT has been developed. Through the use of a qualitative case study approach, this study outlines the shifts in a participant teacher’s self-efficacy following their exposure to each stage of the framework. The data obtained from interviews and reflective analyses were analyzed using a seven-stage inductive method. The findings have substantiated the significant impact of a teacher professional development program based on the framework on teacher self-efficacy, evident in both individual performance and student outcomes observed over eighteen months. The evolution of teacher self-efficacy in IAT should be regarded as an open and multi-level system, characterized by interactions with teacher knowledge, skills and other entrenched beliefs. Building on our research findings, an enhanced model of teacher professional learning is proposed. The revised model illustrates that professional learning for STEAM teachers should be conceived as a continuous and sustainable process, characterized by the dynamic interaction among teaching performance, teacher knowledge, and teacher beliefs. The updated model further confirms the inseparable link between teacher learning and student learning within STEAM education. This study contributes to the existing body of literature on teacher self-efficacy, teacher professional learning models and the design of IAT teacher professional development programs.

Similar content being viewed by others

case study design of experiments

Primary and secondary school teachers’ perceptions of their social science training needs

case study design of experiments

Investigating how subject teachers transition to integrated STEM education: A hybrid qualitative study on primary and middle school teachers

case study design of experiments

The mediating role of teaching enthusiasm in the relationship between mindfulness, growth mindset, and psychological well-being of Chinese EFL teachers

Introduction.

In the past decade, there has been a surge in the advancement and widespread adoption of Science, Technology, Engineering, and Mathematics (STEM) education on a global scale (Jiang et al. 2021 ; Jiang et al. 2022 ; Jiang et al. 2023 ; Jiang et al. 2024a , b ; Zhan et al. 2023 ; Zhan and Niu 2023 ; Zhong et al. 2022 ; Zhong et al. 2024 ). Concurrently, there has been a growing chorus of advocates urging the integration of Arts and Humanities (A&H) into STEM education (e.g., Alkhabra et al. 2023 ; Land 2020 ; Park and Cho 2022 ; Uştu et al. 2021 ; Vaziri and Bradburn 2021 ). STEM education is frequently characterized by its emphasis on logic and analysis; however, it may be perceived as deficient in emotional and intuitive elements (Ozkan and Umdu Topsakal 2021 ). Through the integration of Arts and Humanities (A&H), the resulting STEAM approach has the potential to become more holistic, incorporating both rationality and emotional intelligence (Ozkan and Umdu Topsakal 2021 ). Many studies have confirmed that A&H can help students increase interest and develop their understanding of the contents in STEM fields, and thus, A&H can attract potential underrepresented STEM learners such as female students and minorities (Land 2020 ; Park and Cho 2022 ; Perignat and Katz-Buonincontro 2019 ). Despite the increasing interest in STEAM, the approaches to integrating A&H, which represent fundamentally different disciplines, into STEM are theoretically and practically ambiguous (Jacques et al. 2020 ; Uştu et al. 2021 ). Moreover, studies have indicated that the implementation of STEAM poses significant challenges, with STEM educators encountering difficulties in integrating A&H into their teaching practices (e.g., Boice et al. 2021 ; Duong et al. 2024 ; Herro et al. 2019 ; Jacques et al. 2020 ; Park and Cho 2022 ; Perignat and Katz-Buonincontro 2019 ). Hence, there is a pressing need to provide STEAM teachers with effective professional training.

Motivated by this gap, this study proposes a novel five-stage framework tailored for teacher professional development programs specifically designed to facilitate the integration of A&H into STEM teaching (IAT). Following the establishment of this framework, a series of teacher professional development programs were implemented. To explain the framework, a qualitative case study is employed, focusing on examining a specific teacher professional development program’s impact on a pre-service teacher’s self-efficacy. The case narratives, with a particular focus on the pre-service teacher’s changes in teacher self-efficacy, are organized chronologically, delineating stages before and after each stage of the teacher professional development program. More specifically, meaningful vignettes of the pre-service teacher’s learning and teaching experiences during the teacher professional development program are offered to help understand the five-stage framework. This study contributes to understanding teacher self-efficacy, teacher professional learning model and the design of IAT teacher professional development programs.

Theoretical background

The conceptualization of steam education.

STEM education can be interpreted through various lenses (e.g., Jiang et al. 2021 ; English 2016 ). As Li et al. (2020) claimed, on the one hand, STEM education can be defined as individual STEM disciplinary-based education (i.e., science education, technology education, engineering education and mathematics education). On the other hand, STEM education can also be defined as interdisciplinary or cross-disciplinary education where individual STEM disciplines are integrated (Jiang et al. 2021 ; English 2016 ). In this study, we view it as individual disciplinary-based education separately in science, technology, engineering and mathematics (English 2016 ).

STEAM education emerged as a new pedagogy during the Americans for the Arts-National Policy Roundtable discussion in 2007 (Perignat and Katz-Buonincontro 2019 ). This pedagogy was born out of the necessity to enhance students’ engagement, foster creativity, stimulate innovation, improve problem-solving abilities, and cultivate employability skills such as teamwork, communication and adaptability (Perignat and Katz-Buonincontro 2019 ). In particular, within the framework of STEAM education, the ‘A’ should be viewed as a broad concept that represents arts and humanities (A&H) (Herro and Quigley 2016 ; de la Garza 2021 , Park and Cho 2022 ). This conceptualization emphasizes the need to include humanities subjects alongside arts (Herro and Quigley 2016 ; de la Garza 2021 ; Park and Cho 2022 ). Sanz-Camarero et al. ( 2023 ) listed some important fields of A&H, including physical arts, fine arts, manual arts, sociology, politics, philosophy, history, psychology and so on.

In general, STEM education does not necessarily entail the inclusion of all STEM disciplines collectively (Ozkan and Umdu Topsakal 2021 ), and this principle also applies to STEAM education (Gates 2017 ; Perignat and Katz-Buonincontro 2019 ; Quigley et al. 2017 ; Smith and Paré 2016 ). As an illustration, Smith and Paré ( 2016 ) described a STEAM activity in which pottery (representing A&H) and mathematics were integrated, while other STEAM elements such as science, technology and engineering were not included. In our study, STEAM education is conceptualized as an interdisciplinary approach that involves the integration of one or more components of A&H into one or more STEM school subjects within educational activities (Ozkan and Umdu Topsakal 2021 ; Vaziri and Bradburn 2021 ). Notably, interdisciplinary collaboration entails integrating one or more elements from arts and humanities (A&H) with one or more STEM school subjects, cohesively united by a shared theme while maintaining their distinct identities (Perignat and Katz-Buonincontro 2019 ).

In our teacher professional development programs, we help mathematics, technology, and science pre-service teachers integrate one component of A&H into their disciplinary-based teaching practices. For instance, we help mathematics teachers integrate history (a component of A&H) into mathematics teaching. In other words, in our study, integrating A&H into STEM teaching (IAT) can be defined as integrating one component of A&H into the teaching of one of the STEM school subjects. The components of A&H and the STEM school subject are brought together under a common theme, but each of them remains discrete. Engineering is not taught as an individual subject in the K-12 curriculum in mainland China. Therefore, A&H is not integrated into engineering teaching in our teacher professional development programs.

Self-efficacy and teacher self-efficacy

Self-efficacy was initially introduced by Bandura ( 1977 ) as a key concept within his social cognitive theory. Bandura ( 1997 ) defined self-efficacy as “people’s beliefs about their capabilities to produce designated levels of performance that exercise influence over events that affect their lives” (p. 71). Based on Bandura’s ( 1977 ) theory, Tschannen-Moran et al. ( 1998 ) defined the concept of teacher self-efficacy Footnote 1 as “a teacher’s belief in her or his ability to organize and execute the courses of action required to successfully accomplish a specific teaching task in a particular context” (p. 233). Blonder et al. ( 2014 ) pointed out that this definition implicitly included teachers’ judgment of their ability to bring about desired outcomes in terms of students’ engagement and learning. Moreover, OECD ( 2018 ) defined teacher self-efficacy as “the beliefs that teachers have of their ability to enact certain teaching behavior that influences students’ educational outcomes, such as achievement, interest, and motivation” (p. 51). This definition explicitly included two dimensions: teachers’ judgment of the ability related to their teaching performance (i.e., enacting certain teaching behavior) and their influence on student outcomes.

It is argued that teacher self-efficacy should not be regarded as a general or overarching construct (Zee et al. 2017 ; Zee and Koomen 2016 ). Particularly, in the performance-driven context of China, teachers always connect their beliefs in their professional capabilities with the educational outcomes of their students (Liu et al. 2018 ). Therefore, we operationally conceptualize teacher self-efficacy as having two dimensions: self-efficacy in individual performance and student outcomes (see Table 1 ).

Most importantly, given its consistent association with actual teaching performance and student outcomes (Bray-Clark and Bates 2003 ; Kelley et al. 2020 ), teacher self-efficacy is widely regarded as a pivotal indicator of teacher success (Kelley et al. 2020 ). Moreover, the enhancement of teaching self-efficacy reflects the effectiveness of teacher professional development programs (Bray-Clark and Bates 2003 ; Kelley et al. 2020 ; Wong et al. 2022 ; Zhou et al. 2023 ). For instance, Zhou et al. ( 2023 ) claimed that in STEM teacher education, effective teacher professional development programs should bolster teachers’ self-efficacy “in teaching the content in the STEM discipline” (p. 2).

It has been documented that teachers frequently experience diminished confidence and comfort when teaching subject areas beyond their expertise (Kelley et al. 2020 ; Stohlmann et al. 2012 ). This diminished confidence extends to their self-efficacy in implementing interdisciplinary teaching approaches, such as integrated STEM teaching and IAT (Kelley et al. 2020 ). For instance, Geng et al. ( 2019 ) found that STEM teachers in Hong Kong exhibited low levels of self-efficacy, with only 5.53% of teachers rating their overall self-efficacy in implementing STEM education as higher than a score of 4 out of 5. Additionally, Hunter-Doniger and Sydow ( 2016 ) found that teachers may experience apprehension and lack confidence when incorporating A&H elements into the classroom context, particularly within the framework of IAT. Considering the critical importance of teacher self-efficacy in STEM and STEAM education (Kelley et al. 2020 ; Zakariya, 2020 ; Zhou et al. 2023 ), it is necessary to explore effective measures, frameworks and teacher professional development programs to help teachers improve their self-efficacy regarding interdisciplinary teaching (e.g., IAT).

Teacher professional learning models

The relationship between teachers’ professional learning and students’ outcomes (such as achievements, skills and attitudes) has been a subject of extensive discussion and research for many years (Clarke and Hollingsworth 2002 ). For instance, Clarke and Hollingsworth ( 2002 ) proposed and validated the Interconnected Model of Professional Growth, which illustrates that teacher professional development is influenced by the interaction among four interconnected domains: the personal domain (teacher knowledge, beliefs and attitudes), the domain of practice (professional experimentation), the domain of consequence (salient outcomes), and the external domain (sources of information, stimulus or support). Sancar et al. ( 2021 ) emphasized that teachers’ professional learning or development never occurs independently. In practice, this process is inherently intertwined with many variables, including student outcomes, in various ways (Sancar et al. 2021 ). However, many current teacher professional development programs exclude real in-class teaching and fail to establish a comprehensive link between teachers’ professional learning and student outcomes (Cai et al. 2020 ; Sancar et al. 2021 ). Sancar et al. ( 2021 ) claimed that exploring the complex relationships between teachers’ professional learning and student outcomes should be grounded in monitoring and evaluating real in-class teaching, rather than relying on teachers’ self-assessment. It is essential to understand these relationships from a holistic perspective within the context of real classroom teaching (Sancar et al. 2021 ). However, as Sancar et al. ( 2021 ) pointed out, such efforts in teacher education are often considered inadequate. Furthermore, in the field of STEAM education, such efforts are further exacerbated.

Cai et al. ( 2020 ) proposed a teacher professional learning model where student outcomes are emphasized. This model was developed based on Cai ( 2017 ), Philipp ( 2007 ) and Thompson ( 1992 ). It has also been used and justified in a series of teacher professional development programs (e.g., Calabrese et al. 2024 ; Hwang et al. 2024 ; Marco and Palatnik 2024 ; Örnek and Soylu 2021 ). The model posits that teachers typically increase their knowledge and modify their beliefs through professional teacher learning, subsequently improving their classroom instruction, enhancing teaching performance, and ultimately fostering improved student learning outcomes (Cai et al. 2020 ). Notably, this model can be updated in several aspects. Firstly, prior studies have exhibited the interplay between teacher knowledge and beliefs (e.g., Basckin et al. 2021 ; Taimalu and Luik 2019 ). This indicates that the increase in teacher knowledge and the change in teacher belief may not be parallel. The two processes can be intertwined. Secondly, the Interconnected Model of Professional Growth highlights that the personal domain and the domain of practice are interconnected (Clarke and Hollingsworth 2002 ). Liu et al. ( 2022 ) also confirmed that improvements in classroom instruction may, in turn, influence teacher beliefs. This necessitates a reconsideration of the relationships between classroom instruction, teacher knowledge and teacher beliefs in Cai et al.’s ( 2020 ) model. Thirdly, the Interconnected Model of Professional Growth also exhibits the connections between the domain of consequence and the personal domain (Clarke and Hollingsworth 2002 ). Hence, the improvement of learning outcomes may signify the end of teacher learning. For instance, students’ learning feedback may be a vital source of teacher self-efficacy (Bandura 1977 ). Therefore, the improvement of student outcomes may, in turn, affect teacher beliefs. The aforementioned arguments highlight the need for an updated model that integrates Cai et al.’s ( 2020 ) teacher professional learning model with Clarke and Hollingsworth’s ( 2002 ) Interconnected Model of Professional Growth. This integration may provide a holistic view of the teacher’s professional learning process, especially within the complex contexts of STEAM teacher education.

The framework for teacher professional development programs of integrating arts and humanities into STEM teaching

In this section, we present a framework for IAT teacher professional development programs, aiming to address the practical challenges associated with STEAM teaching implementation. Our framework incorporates the five features of effective teacher professional development programs outlined by Archibald et al. ( 2011 ), Cai et al. ( 2020 ), Darling-Hammond et al. ( 2017 ), Desimone and Garet ( 2015 ) and Roth et al. ( 2017 ). These features include: (a) alignment with shared goals (e.g., school, district, and national policies and practice), (b) emphasis on core content and modeling of teaching strategies for the content, (c) collaboration among teachers within a community, (d) adequate opportunities for active learning of new teaching strategies, and (e) embedded follow-up and continuous feedback. It is worth noting that two concepts, namely community of practice and lesson study, have been incorporated into our framework. Below, we delineate how these features are reflected in our framework.

(a) The Chinese government has issued a series of policies to facilitate STEAM education in K-12 schools (Jiang et al. 2021 ; Li and Chiang 2019 ; Lyu et al. 2024 ; Ro et al. 2022 ). The new curriculum standards released in 2022 mandate that all K-12 teachers implement interdisciplinary teaching, including STEAM education. Our framework for teacher professional development programs, which aims to help teachers integrate A&H into STEM teaching, closely aligns with these national policies and practices supporting STEAM education in K-12 schools.

(b) The core content of the framework is IAT. Specifically, as A&H is a broad concept, we divide it into several subcomponents, such as history, culture, and visual and performing arts (e.g., drama). We are implementing a series of teacher professional development programs to help mathematics, technology and science pre-service teachers integrate these subcomponents of A&H into their teaching Footnote 2 . Notably, pre-service teachers often lack teaching experience, making it challenging to master and implement new teaching strategies. Therefore, our framework provides five step-by-step stages designed to help them effectively model the teaching strategies of IAT.

(c) Our framework advocates for collaboration among teachers within a community of practice. Specifically, a community of practice is “a group of people who share an interest in a domain of human endeavor and engage in a process of collective learning that creates bonds between them” (Wenger et al. 2002 , p. 1). A teacher community of practice can be considered a group of teachers “sharing and critically observing their practices in growth-promoting ways” (Näykki et al. 2021 , p. 497). Long et al. ( 2021 ) claimed that in a teacher community of practice, members collaboratively share their teaching experiences and work together to address teaching problems. Our community of practice includes three types of members. (1) Mentors: These are professors and experts with rich experience in helping pre-service teachers practice IAT. (2) Pre-service teachers: Few have teaching experience before the teacher professional development programs. (3) In-service teachers: All in-service teachers are senior teachers with rich teaching experience. All the members work closely together to share and improve their IAT practice. Moreover, our community includes not only mentors and in-service teachers but also pre-service teachers. We encourage pre-service teachers to collaborate with experienced in-service teachers in various ways, such as developing IAT lesson plans, writing IAT case reports and so on. In-service teachers can provide cognitive and emotional support and share their practical knowledge and experience, which may significantly benefit the professional growth of pre-service teachers (Alwafi et al. 2020 ).

(d) Our framework offers pre-service teachers various opportunities to engage in lesson study, allowing them to actively design and implement IAT lessons. Based on the key points of effective lesson study outlined by Akiba et al. ( 2019 ), Ding et al. ( 2024 ), and Takahashi and McDougal ( 2016 ), our lesson study incorporates the following seven features. (1) Study of IAT materials: Pre-service teachers are required to study relevant IAT materials under the guidance of mentors. (2) Collaboration on lesson proposals: Pre-service teachers should collaborate with in-service teachers to develop comprehensive lesson proposals. (3) Observation and data collection: During the lesson, pre-service teachers are required to carefully observe and collect data on student learning and development. (4) Reflection and analysis: Pre-service teachers use the collected data to reflect on the lesson and their teaching effects. (5) Lesson revision and reteaching: If needed, pre-service teachers revise and reteach the lesson based on their reflections and data analysis. (6) Mentor and experienced in-service teacher involvement: Mentors and experienced in-service teachers, as knowledgeable others, are involved throughout the lesson study process. (7) Collaboration on reporting: Pre-service teachers collaborate with in-service teachers to draft reports and disseminate the results of the lesson study. Specifically, recognizing that pre-service teachers often lack teaching experience, we do not require them to complete all the steps of lesson study independently at once. Instead, we guide them through the lesson study process in a step-by-step manner, allowing them to gradually build their IAT skills and confidence. For instance, in Stage 1, pre-service teachers primarily focus on studying IAT materials. In Stage 2, they develop lesson proposals, observe and collect data, and draft reports. However, the implementation of IAT lessons is carried out by in-service teachers. This approach prevents pre-service teachers from experiencing failures due to their lack of teaching experience. In Stage 3, pre-service teachers implement, revise, and reteach IAT lessons, experiencing the lesson study process within a simulated environment. In Stage 4, pre-service teachers engage in lesson study in an actual classroom environment. However, their focus is limited to one micro-course during each lesson study session. It is not until the fifth stage that they experience a complete lesson study in an actual classroom environment.

(e) Our teacher professional development programs incorporate assessments specifically designed to evaluate pre-service teachers’ IAT practices. We use formative assessments to measure their understanding and application of IAT strategies. Pre-service teachers receive ongoing and timely feedback from peers, mentors, in-service teachers, and students, which helps them continuously refine their IAT practices throughout the program. Recognizing that pre-service teachers often have limited contact with real students and may not fully understand students’ learning needs, processes and outcomes, our framework requires them to actively collect and analyze student feedback. By doing so, they can make informed improvements to their instructional practice based on student feedback.

After undergoing three rounds of theoretical and practical testing and revision over the past five years, we have successfully finalized the optimization of the framework design (Zhou 2021 ). Throughout each cycle, we collected feedback from both participants and researchers on at least three occasions. Subsequently, we analyzed this feedback and iteratively refined the framework. For example, we enlisted the participation of in-service teachers to enhance the implementation of STEAM teaching, extended practice time through micro-teaching sessions, and introduced a stage of micro-course development within the framework to provide more opportunities for pre-service teachers to engage with real teaching situations. In this process, we continuously improved the coherence between each stage of the framework, ensuring that they mutually complement one another. The five-stage framework is described as follows.

Stage 1 Literature study

Pre-service teachers are provided with a series of reading materials from A&H. On a weekly basis, two pre-service teachers are assigned to present their readings and reflections to the entire group, followed by critical discussions thereafter. Mentors and all pre-service teachers discuss and explore strategies for translating the original A&H materials into viable instructional resources suitable for classroom use. Subsequently, pre-service teachers select topics of personal interest for further study under mentor guidance.

Stage 2 Case learning

Given that pre-service teachers have no teaching experience, collaborative efforts between in-service teachers and pre-service teachers are undertaken to design IAT lesson plans. Subsequently, the in-service teachers implement these plans. Throughout this process, pre-service teachers are afforded opportunities to engage in lesson plan implementation. Figure 1 illustrates the role of pre-service teachers in case learning. In the first step, pre-service teachers read about materials related to A&H, select suitable materials, and report their ideas on IAT lesson design to mentors, in-service teachers, and fellow pre-service teachers.

figure 1

Note: A&H refers to arts and humanities.

In the second step, they liaise with the in-service teachers responsible for implementing the lesson plan, discussing the integration of A&H into teaching practices. Pre-service teachers then analyze student learning objectives aligned with curriculum standards, collaboratively designing the IAT lesson plan with in-service teachers. Subsequently, pre-service teachers present lesson plans for feedback from mentors and other in-service teachers.

In the third step, pre-service teachers observe the lesson plan’s implementation, gathering and analyzing feedback from students and in-service teachers using an inductive approach (Merriam 1998 ). Feedback includes opinions on the roles and values of A&H, perceptions of the teaching effect, and recommendations for lesson plan implementation and modification. The second and third steps may iterate multiple times to refine the IAT lesson plan. In the fourth step, pre-service teachers consolidate all data, including various versions of teaching instructions, classroom videos, feedback, and discussion notes, composing reflection notes. Finally, pre-service teachers collaborate with in-service teachers to compile the IAT case report and submit it for publication.

Stage 3 Micro-teaching

Figure 2 illustrates the role of pre-service teachers in micro-teaching. Before entering the micro-classrooms Footnote 3 , all the discussions and communications occur within the pre-service teacher group, excluding mentors and in-service teachers. After designing the IAT lesson plan, pre-service teachers take turns implementing 40-min lesson plans in a simulated micro-classroom setting. Within this simulated environment, one pre-service teacher acts as the teacher, while others, including mentors, in-service teachers, and other fellow pre-service teachers, assume the role of students Footnote 4 . Following the simulated teaching, the implementer reviews the video of their session and self-assesses their performance. Subsequently, the implementer receives feedback from other pre-service teachers, mentors, and in-service teachers. Based on this feedback, the implementer revisits steps 2 and 3, revising the lesson plan and conducting the simulated teaching again. This iterative process typically repeats at least three times until the mentors, in-service teachers, and other pre-service teachers are satisfied with the implementation of the revised lesson plan. Finally, pre-service teachers complete reflection notes and submit a summary of their reflections on the micro-teaching experience. Each pre-service teacher is required to choose at least three topics and undergo at least nine simulated teaching sessions.

figure 2

Stage 4 Micro-course development

While pre-service teachers may not have the opportunity to execute the whole lesson plans in real classrooms, they can design and create five-minute micro-courses Footnote 5 before class, subsequently presenting these videos to actual students. The process of developing micro-courses closely mirrors that of developing IAT cases in the case learning stage (see Fig. 1 ). However, in Step 3, pre-service teachers assume dual roles, not only as observers of IAT lesson implementation but also as implementers of a five-minute IAT micro-course.

Stage 5 Classroom teaching

Pre-service teachers undertake the implementation of IAT lesson plans independently, a process resembling micro-teaching (see Fig. 2 ). However, pre-service teachers engage with real school students in partner schools Footnote 6 instead of simulated classrooms. Furthermore, they collect feedback not only from the mentors, in-service teachers, and fellow pre-service teachers but also from real students.

To provide our readers with a better understanding of the framework, we provide meaningful vignettes of a pre-service teacher’s learning and teaching experiences in one of the teacher professional development programs based on the framework. In addition, we choose teacher self-efficacy as an indicator to assess the framework’s effectiveness, detailing the pre-service teacher’s changes in teacher self-efficacy.

Research design

Research method.

Teacher self-efficacy can be measured both quantitatively and qualitatively (Bandura 1986 , 1997 ; Lee and Bobko 1994 ; Soprano and Yang 2013 ; Unfried et al. 2022 ). However, researchers and theorists in the area of teacher self-efficacy have called for more qualitative and longitudinal studies (Klassen et al. 2011 ). As some critiques stated, most studies were based on correlational and cross-sectional data obtained from self-report surveys, and qualitative studies of teacher efficacy were overwhelmingly neglected (Henson 2002 ; Klassen et al. 2011 ; Tschannen-Moran et al. 1998 ; Xenofontos and Andrews 2020 ). There is an urgent need for more longitudinal studies to shed light on the development of teacher efficacy (Klassen et al. 2011 ; Xenofontos and Andrews 2020 ).

This study utilized a longitudinal qualitative case study methodology to delve deeply into the context (Jiang et al. 2021 ; Corden and Millar 2007 ; Dicks et al. 2023 ; Henderson et al. 2012 ; Matusovich et al. 2010 ; Shirani and Henwood 2011 ), presenting details grounded in real-life situations and analyzing the inner relationships rather than generalize findings about the change of a large group of pre-service teachers’ self-efficacy.

Participant

This study forms a component of a broader multi-case research initiative examining teachers’ professional learning in the STEAM teacher professional development programs in China (Jiang et al. 2021 ; Wang et al. 2018 ; Wang et al. 2024 ). Within this context, one participant, Shuitao (pseudonym), is selected and reported in this current study. Shuitao was a first-year graduate student at a first-tier Normal university in Shanghai, China. Normal universities specialize in teacher education. Her graduate major was mathematics curriculum and instruction. Teaching practice courses are offered to students in this major exclusively during their third year of study. The selection of Shuitao was driven by three primary factors. Firstly, Shuitao attended the entire teacher professional development program and actively engaged in nearly all associated activities. Table 2 illustrates the timeline of the five stages in which Shuitao was involved. Secondly, her undergraduate major was applied mathematics, which was not related to mathematics teaching Footnote 7 . She possessed no prior teaching experience and had not undergone any systematic study of IAT before her involvement in the teacher professional development program. Thirdly, her other master’s courses during her first two years of study focused on mathematics education theory and did not include IAT Footnote 8 . Additionally, she scarcely participated in any other teaching practice outside of the teacher professional development program. As a pre-service teacher, Shuitao harbored a keen interest in IAT. Furthermore, she discovered that she possessed fewer teaching skills compared to her peers who had majored in education during their undergraduate studies. Hence, she had a strong desire to enhance her teaching skills. Consequently, Shuitao decided to participate in our teacher professional development program.

Shuitao was grouped with three other first-year graduate students during the teacher professional development program. She actively collaborated with them at every stage of the program. For instance, they advised each other on their IAT lesson designs, observed each other’s IAT practice and offered constructive suggestions for improvement.

Research question

Shuitao was a mathematics pre-service teacher who participated in one of our teacher professional development programs, focusing on integrating history into mathematics teaching (IHT) Footnote 9 . Notably, this teacher professional development program was designed based on our five-stage framework for teacher professional development programs of IAT. To examine the impact of this teacher professional development program on Shuitao’s self-efficacy related to IHT, this case study addresses the following research question:

What changes in Shuitao’s self-efficacy in individual performance regarding integrating history into mathematics teaching (SE-IHT-IP) may occur through participation in the teacher professional development program?

What changes in Shuitao’s self-efficacy in student outcomes regarding integrating history into mathematics teaching (SE-IHT-SO) may occur through participation in the teacher professional development program?

Data collection and analysis

Before Shuitao joined the teacher professional development program, a one-hour preliminary interview was conducted to guide her in self-narrating her psychological and cognitive state of IHT.

During the teacher professional development program, follow-up unstructured interviews were conducted once a month with Shuitao. All discussions in the development of IHT cases were recorded, Shuitao’s teaching and micro-teaching were videotaped, and the reflection notes, journals, and summary reports written by Shuitao were collected.

After completing the teacher professional development program, Shuitao participated in a semi-structured three-hour interview. The objectives of this interview were twofold: to reassess her self-efficacy and to explore the relationship between her self-efficacy changes and each stage of the teacher professional development program.

Interview data, discussions, reflection notes, journals, summary reports and videos, and analysis records were archived and transcribed before, during, and after the teacher professional development program.

In this study, we primarily utilized data from seven interviews: one conducted before the teacher professional development program, five conducted after each stage of the program, and one conducted upon completion of the program. Additionally, we reviewed Shuitao’s five reflective notes, which were written after each stage, as well as her final summary report that encompassed the entire teacher professional development program.

Merriam’s ( 1998 ) approach to coding data and inductive approach to retrieving possible concepts and themes were employed using a seven-stage method. Considering theoretical underpinnings in qualitative research is common when interpreting data (Strauss and Corbin 1990 ). First, a list based on our conceptual framework of teacher self-efficacy (see Table 1 ) was developed. The list included two codes (i.e., SE-IHT-IP and SE-IHT-SO). Second, all data were sorted chronologically, read and reread to be better understood. Third, texts were coded into multi-colored highlighting and comment balloons. Fourth, the data for groups of meanings, themes, and behaviors were examined. How these groups were connected within the conceptual framework of teacher self-efficacy was confirmed. Fifth, after comparing, confirming, and modifying, the selective codes were extracted and mapped onto the two categories according to the conceptual framework of teacher self-efficacy. Accordingly, changes in SE-IHT-IP and SE-IHT-SO at the five stages of the teacher professional development program were identified, respectively, and then the preliminary findings came (Strauss and Corbin 1990 ). In reality, in Shuitao’s narratives, SE-IHT-IP and SE-IHT-SO were frequently intertwined. Through our coding process, we differentiated between SE-IHT-IP and SE-IHT-SO, enabling us to obtain a more distinct understanding of how these two aspects of teacher self-efficacy evolved over time. This helped us address the two research questions effectively.

Reliability and validity

Two researchers independently analyzed the data to establish inter-rater reliability. The inter-rater reliability was established as kappa = 0.959. Stake ( 1995 ) suggested that the most critical assertions in a study require the greatest effort toward confirmation. In this study, three methods served this purpose and helped ensure the validity of the findings. The first way to substantiate the statement about the changes in self-efficacy was by revisiting each transcript to confirm whether the participant explicitly acknowledged the changes (Yin 2003 ). Such a check was repeated in the analysis of this study. The second way to confirm patterns in the data was by examining whether Shuitao’s statements were replicated in separate interviews (Morris and Usher 2011 ). The third approach involved presenting the preliminary conclusions to Shuitao and affording her the opportunity to provide feedback on the data and conclusions. This step aimed to ascertain whether we accurately grasped the true intentions of her statements and whether our subjective interpretations inadvertently influenced our analysis of her statements. Additionally, data from diverse sources underwent analysis by at least two researchers, with all researchers reaching consensus on each finding.

As each stage of our teacher professional development programs spanned a minimum of three months, numerous documented statements regarding the enhancement of Shuitao’s self-efficacy regarding IHT were recorded. Notably, what we present here offers only a concise overview of findings derived from our qualitative analysis. The changes in Shuitao’s SE-IHT-IP and SE-IHT-SO are organized chronologically, delineating the period before and during the teacher professional development program.

Before the teacher professional development program: “I have no confidence in IHT”

Before the teacher professional development program, Shuitao frequently expressed her lack of confidence in IHT. On the one hand, Shuitao expressed considerable apprehension about her individual performance in IHT. “How can I design and implement IHT lesson plans? I do not know anything [about it]…” With a sense of doubt, confusion and anxiety, Shuitao voiced her lack of confidence in her ability to design and implement an IHT case that would meet the requirements of the curriculum standards. Regarding the reasons for her lack of confidence, Shuitao attributed it to her insufficient theoretical knowledge and practical experience in IHT:

I do not know the basic approaches to IHT that I could follow… it is very difficult for me to find suitable historical materials… I am very confused about how to organize [historical] materials logically around the teaching goals and contents… [Furthermore,] I am [a] novice, [and] I have no IHT experience.

On the other hand, Shuitao articulated very low confidence in the efficacy of her IHT on student outcomes:

I think my IHT will have a limited impact on student outcomes… I do not know any specific effects [of history] other than making students interested in mathematics… In fact, I always think it is difficult for [my] students to understand the history… If students cannot understand [the history], will they feel bored?

This statement suggests that Shuitao did not fully grasp the significance of IHT. In fact, she knew little about the educational significance of history for students, and she harbored no belief that her IHT approach could positively impact students. In sum, her SE-IHT-SO was very low.

After stage 1: “I can do well in the first step of IHT”

After Stage 1, Shuitao indicated a slight improvement in her confidence in IHT. She attributed this improvement to her acquisition of theoretical knowledge in IHT, the approaches for selecting history-related materials, and an understanding of the educational value of history.

One of Shuitao’s primary concerns about implementing IHT before the teacher professional development program was the challenge of sourcing suitable history-related materials. However, after Stage 1, Shuitao explicitly affirmed her capability in this aspect. She shared her experience of organizing history-related materials related to logarithms as an example.

Recognizing the significance of suitable history-related materials in effective IHT implementation, Shuitao acknowledged that conducting literature studies significantly contributed to enhancing her confidence in undertaking this initial step. Furthermore, she expressed increased confidence in designing IHT lesson plans by utilizing history-related materials aligned with teaching objectives derived from the curriculum standards. In other words, her SE-IHT-IP was enhanced. She said:

After experiencing multiple discussions, I gradually know more about what kinds of materials are essential and should be emphasized, what kinds of materials should be adapted, and what kinds of materials should be omitted in the classroom instructions… I have a little confidence to implement IHT that could meet the requirements [of the curriculum standards] since now I can complete the critical first step [of IHT] well…

However, despite the improvement in her confidence in IHT following Stage 1, Shuitao also expressed some concerns. She articulated uncertainty regarding her performance in the subsequent stages of the teacher professional development program. Consequently, her confidence in IHT experienced only a modest increase.

After stage 2: “I participate in the development of IHT cases, and my confidence is increased a little bit more”

Following Stage 2, Shuitao reported further increased confidence in IHT. She attributed this growth to two main factors. Firstly, she successfully developed several instructional designs for IHT through collaboration with in-service teachers. These collaborative experiences enabled her to gain a deeper understanding of IHT approaches and enhance her pedagogical content knowledge in this area, consequently bolstering her confidence in her ability to perform effectively. Secondly, Shuitao observed the tangible impact of IHT cases on students in real classroom settings, which reinforced her belief in the efficacy of IHT. These experiences instilled in her a greater sense of confidence in her capacity to positively influence her students through her implementation of IHT. Shuitao remarked that she gradually understood how to integrate suitable history-related materials into her instructional designs (e.g., employ a genetic approach Footnote 10 ), considering it as the second important step of IHT. She shared her experience of developing IHT instructional design on the concept of logarithms. After creating several iterations of IHT instructional designs, Shuitao emphasized that her confidence in SE-IHT-IP has strengthened. She expressed belief in her ability to apply these approaches to IHT, as well as the pedagogical content knowledge of IHT, acquired through practical experience, in her future teaching endeavors. The following is an excerpt from the interview:

I learned some effective knowledge, skills, techniques and approaches [to IHT]… By employing these approaches, I thought I could [and] I had the confidence to integrate the history into instructional designs very well… For instance, [inspired] by the genetic approach, we designed a series of questions and tasks based on the history of logarithms. The introduction of the new concept of logarithms became very natural, and it perfectly met the requirements of our curriculum standards, [which] asked students to understand the necessity of learning the concept of logarithms…

Shuitao actively observed the classroom teaching conducted by her cooperating in-service teacher. She helped her cooperating in-service teacher in collecting and analyzing students’ feedback. Subsequently, discussions ensued on how to improve the instructional designs based on this feedback. The refined IHT instructional designs were subsequently re-implemented by the in-service teacher. After three rounds of developing IHT cases, Shuitao became increasingly convinced of the significance and efficacy of integrating history into teaching practices, as evidenced by the following excerpt:

The impacts of IHT on students are visible… For instance, more than 93% of the students mentioned in the open-ended questionnaires that they became more interested in mathematics because of the [historical] story of Napier… For another example, according to the results of our surveys, more than 75% of the students stated that they knew log a ( M  +  N ) = log a M  × log a N was wrong because of history… I have a little bit more confidence in the effects of my IHT on students.

This excerpt highlights that Shuitao’s SE-IHT-SO was enhanced. She attributed this enhancement to her realization of the compelling nature of history and her belief in her ability to effectively leverage its power to positively influence her students’ cognitive and emotional development. This also underscores the importance of reinforcing pre-service teachers’ awareness of the significance of history. Nonetheless, Shuiato elucidated that she still retained concerns regarding the effectiveness of her IHT implementation. Her following statement shed light on why her self-efficacy only experienced a marginal increase in this stage:

Knowing how to do it successfully and doing it successfully in practice are two totally different things… I can develop IHT instructional designs well, but I have no idea whether I can implement them well and whether I can introduce the history professionally in practice… My cooperation in-service teacher has a long history of teaching mathematics and gains rich experience in educational practices… If I cannot acquire some required teaching skills and capabilities, I still cannot influence my students powerfully.

After stage 3: “Practice makes perfect, and my SE-IHT-IP is steadily enhanced after a hit”

After successfully developing IHT instructional designs, the next critical step was the implementation of these designs. Drawing from her observations of her cooperating in-service teachers’ IHT implementations and discussions with other pre-service teachers, Shuitao developed her own IHT lesson plans. In Stage 3, she conducted simulated teaching sessions and evaluated her teaching performance ten times Footnote 11 . Shuitao claimed that her SE-IHT-IP steadily improved over the course of these sessions. According to Shuitao, two main processes in Stage 3 facilitated this steady enhancement of SE-IHT-IP.

On the one hand, through the repeated implementation of simulated teaching sessions, Shuitao’s teaching proficiency and fluency markedly improved. Shuitao first described the importance of teaching proficiency and fluency:

Since the detailed history is not included in our curriculum standards and textbooks, if I use my historical materials in class, I have to teach more contents than traditional teachers. Therefore, I have to teach proficiently so that teaching pace becomes a little faster than usual… I have to teach fluently so as to use each minute efficiently in my class. Otherwise, I cannot complete the teaching tasks required [by curriculum standards].

As Shuitao said, at the beginning of Stage 3, her self-efficacy even decreased because she lacked teaching proficiency and fluency and was unable to complete the required teaching tasks:

In the first few times of simulated teaching, I always needed to think for a second about what I should say next when I finish one sentence. I also felt very nervous when I stood in the front of the classrooms. This made my narration of the historical story between Briggs and Napier not fluent at all. I paused many times to look for some hints on my notes… All these made me unable to complete the required teaching tasks… My [teaching] confidence took a hit.

Shuitao quoted the proverb, “practice makes perfect”, and she emphasized that it was repeated practice that improved her teaching proficiency and fluency:

I thought I had no other choice but to practice IHT repeatedly… [At the end of Stage 3,] I could naturally remember most words that I should say when teaching the topics that I selected… My teaching proficiency and fluency was improved through my repeated review of my instructional designs and implementation of IHT in the micro-classrooms… With the improvement [of my teaching proficiency and fluency], I could complete the teaching tasks, and my confidence was increased as well.

In addition, Shuitao also mentioned that through this kind of self-exploration in simulated teaching practice, her teaching skills and capabilities (e.g., blackboard writing, abilities of language organization abilities, etc.) improved. This process was of great help to her enhancement of SE-IHT-IP.

On the other hand, Shuitao’s simulated teaching underwent assessment by herself, with mentors, in-service teachers and fellow pre-service teachers. This comprehensive evaluation process played a pivotal role in enhancing her individual performance and self-efficacy. Reflecting on this aspect, Shuitao articulated the following sentiments in one of her reflection reports:

By watching the videos, conducting self-assessment, and collecting feedback from others, I can understand what I should improve or emphasize in my teaching. [Then,] I think my IHT can better meet the requirements [of curriculum standards]… I think my teaching performance is getting better and better.

After stage 4: “My micro-courses influenced students positively, and my SE-IHT-SO is steadily enhanced”

In Stage 4, Shuitao commenced by creating 5-min micro-course videos. Subsequently, she played these videos in her cooperating in-service teachers’ authentic classroom settings and collected student feedback. This micro-course was played at the end of her cooperating in-service teachers’ lesson Footnote 12 . Shuitao wrote in her reflections that this micro-course of logarithms helped students better understand the nature of mathematics:

According to the results of our surveys, many students stated that they knew the development and evolution of the concept of logarithms is a long process and many mathematicians from different countries have contributed to the development of the concept of logarithms… This indicated that my micro-course helped students better understand the nature of mathematics… My micro-course about the history informed students that mathematics is an evolving and human subject and helped them understand the dynamic development of the [mathematics] concept…

Meanwhile, Shuitao’s micro-course positively influenced some students’ beliefs towards mathematics. As evident from the quote below, integrating historical context into mathematics teaching transformed students’ perception of the subject, boosting Shuitao’s confidence too.

Some students’ responses were very exciting… [O]ne [typical] response stated, he always regarded mathematics as abstract, boring, and dreadful subject; but after seeing the photos of mathematicians and great men and learning the development of the concept of logarithms through the micro-course, he found mathematics could be interesting. He wanted to learn more the interesting history… Students’ such changes made me confident.

Furthermore, during post-class interviews, several students expressed their recognition of the significance of the logarithms concept to Shuitao, attributing this realization to the insights provided by prominent figures in the micro-courses. They also conveyed their intention to exert greater effort in mastering the subject matter. This feedback made Shuitao believe that her IHT had the potential to positively influence students’ attitudes towards learning mathematics.

In summary, Stage 4 marked Shuitao’s first opportunity to directly impact students through her IHT in authentic classroom settings. Despite implementing only brief 5-min micro-courses integrating history during each session, the effectiveness of her short IHT implementation was validated by student feedback. Shuitao unequivocally expressed that students actively engaged with her micro-courses and that these sessions positively influenced them, including attitudes and motivation toward mathematics learning, understanding of mathematics concepts, and beliefs regarding mathematics. These collective factors contributed to a steady enhancement of her confidence in SE-IHT-SO.

After stage 5: “My overall self-efficacy is greatly enhanced”

Following Stage 5, Shuitao reported a significant increase in her overall confidence in IHT, attributing it to gaining mastery through successful implementations of IHT in real classroom settings. On the one hand, Shuitao successfully designed and executed her IHT lesson plans, consistently achieving the teaching objectives mandated by curriculum standards. This significantly enhanced her SE-IHT-IP. On the other hand, as Shuitao’s IHT implementation directly influenced her students, her confidence in SE-IHT-SO experienced considerable improvement.

According to Bandura ( 1997 ), mastery experience is the most powerful source of self-efficacy. Shuitao’s statements confirmed this. As she claimed, her enhanced SE-IHT-IP in Stage 5 mainly came from the experience of successful implementations of IHT in real classrooms:

[Before the teacher professional development program,] I had no idea about implementing IHT… Now, I successfully implemented IHT in senior high school [classrooms] many times… I can complete the teaching tasks and even better completed the teaching objectives required [by the curriculum standards]… The successful experience greatly enhances my confidence to perform well in my future implementation of IHT… Yeah, I think the successful teaching practice experience is the strongest booster of my confidence.

At the end of stage 5, Shuitao’s mentors and in-service teachers gave her a high evaluation. For instance, after Shuitao’s IHT implementation of the concept of logarithms, all mentors and in-service teachers consistently provided feedback that her IHT teaching illustrated the necessity of learning the concept of logarithms and met the requirements of the curriculum standards very well. This kind of verbal persuasion (Bandura 1997 ) enhanced her SE-IHT-IP.

Similarly, Shuitao’s successful experience of influencing students positively through IHT, as one kind of mastery experience, powerfully enhanced her SE-IHT-SO. She described her changes in SE-IHT-SO as follows:

I could not imagine my IHT could be so influential [before]… But now, my IHT implementation directly influenced students in so many aspects… When I witnessed students’ real changes in various cognitive and affective aspects, my confidence was greatly improved.

Shuitao described the influence of her IHT implementation of the concept of logarithms on her students. The depiction is grounded in the outcomes of surveys conducted by Shuitao following her implementation. Shuitao asserted that these results filled her with excitement and confidence regarding her future implementation of IHT.

In summary, following Stage 5 of the teacher professional development program, Shuitao experienced a notable enhancement in her overall self-efficacy, primarily attributed to her successful practical experience in authentic classroom settings during this stage.

A primary objective of our teacher professional development programs is to equip pre-service teachers with the skills and confidence needed to effectively implement IAT. Our findings show that one teacher professional development program, significantly augmented a participant’s TSE-IHT across two dimensions: individual performance and student outcomes. Considering the pressing need to provide STEAM teachers with effective professional training (e.g., Boice et al. 2021 ; Duong et al. 2024 ; Herro et al. 2019 ; Jacques et al. 2020 ; Park and Cho 2022 ; Perignat and Katz-Buonincontro 2019 ), the proposed five-stage framework holds significant promise in both theoretical and practical realms. Furthermore, this study offers a viable solution to address the prevalent issue of low levels of teacher self-efficacy in interdisciplinary teaching, including IAT, which is critical in STEAM education (Zhou et al. 2023 ). This study holds the potential to make unique contributions to the existing body of literature on teacher self-efficacy, teacher professional learning models and the design of teacher professional development programs of IAT.

Firstly, this study enhances our understanding of the development of teacher self-efficacy. Our findings further confirm the complexity of the development of teacher self-efficacy. On the one hand, the observed enhancement of the participant’s teacher self-efficacy did not occur swiftly but unfolded gradually through a protracted, incremental process. Moreover, it is noteworthy that the participant’s self-efficacy exhibited fluctuations, underscoring that the augmentation of teacher self-efficacy is neither straightforward nor linear. On the other hand, the study elucidated that the augmentation of teacher self-efficacy constitutes an intricate, multi-level system that interacts with teacher knowledge, skills, and other beliefs. This finding resonates with prior research on teacher self-efficacy (Morris et al. 2017 ; Xenofontos and Andrews 2020 ). For example, our study revealed that Shuitao’s enhancement of SE-IHT-SO may always be interwoven with her continuous comprehension of the significance of the A&H in classroom settings. Similarly, the participant progressively acknowledged the educational value of A&H in classroom contexts in tandem with the stepwise enhancement of SE-IHT-SO. Factors such as the participant’s pedagogical content knowledge of IHT, instructional design, and teaching skills were also identified as pivotal components of SE-IHT-IP. This finding corroborates Morris and Usher ( 2011 ) assertion that sustained improvements in self-efficacy stem from developing teachers’ skills and knowledge. With the bolstering of SE-IHT-IP, the participant’s related teaching skills and content knowledge also exhibited improvement.

Methodologically, many researchers advocate for qualitative investigations into self-efficacy (e.g., Philippou and Pantziara 2015; Klassen et al. 2011 ; Wyatt 2015 ; Xenofontos and Andrews 2020 ). While acknowledging limitations in sample scope and the generalizability of the findings, this study offers a longitudinal perspective on the stage-by-stage development of teacher self-efficacy and its interactions with different factors (i.e., teacher knowledge, skills, and beliefs), often ignored by quantitative studies. Considering that studies of self-efficacy have been predominantly quantitative, typically drawing on survey techniques and pre-determined scales (Xenofontos and Andrews, 2020 ; Zhou et al. 2023 ), this study highlights the need for greater attention to qualitative studies so that more cultural, situational and contextual factors in the development of self-efficacy can be captured.

Our study provides valuable practical implications for enhancing pre-service teachers’ self-efficacy. We conceptualize teacher self-efficacy in two primary dimensions: individual performance and student outcomes. On the one hand, pre-service teachers can enhance their teaching qualities, boosting their self-efficacy in individual performance. The adage “practice makes perfect” underscores the necessity of ample teaching practice opportunities for pre-service teachers who lack prior teaching experience. Engaging in consistent and reflective practice helps them develop confidence in their teaching qualities. On the other hand, pre-service teachers should focus on positive feedback from their students, reinforcing their self-efficacy in individual performance. Positive student feedback serves as an affirmation of their teaching effectiveness and encourages continuous improvement. Furthermore, our findings highlight the significance of mentors’ and peers’ positive feedback as critical sources of teacher self-efficacy. Mentors and peers play a pivotal role in the professional growth of pre-service teachers by actively encouraging them and recognizing their teaching achievements. Constructive feedback from experienced mentors and supportive peers fosters a collaborative learning environment and bolsters the self-confidence of pre-service teachers. Additionally, our research indicates that pre-service teachers’ self-efficacy may fluctuate. Therefore, mentors should be prepared to help pre-service teachers manage teaching challenges and setbacks, and alleviate any teaching-related anxiety. Mentors can help pre-service teachers build resilience and maintain a positive outlook on their teaching journey through emotional support and guidance. Moreover, a strong correlation exists between teacher self-efficacy and teacher knowledge and skills. Enhancing pre-service teachers’ knowledge base and instructional skills is crucial for bolstering their overall self-efficacy.

Secondly, this study also responds to the appeal to understand teachers’ professional learning from a holistic perspective and interrelate teachers’ professional learning process with student outcome variables (Sancar et al. 2021 ), and thus contributes to the understanding of the complexity of STEAM teachers’ professional learning. On the one hand, we have confirmed Cai et al.’s ( 2020 ) teacher professional learning model in a new context, namely STEAM teacher education. Throughout the teacher professional development program, the pre-service teacher, Shuitao, demonstrated an augmentation in her knowledge, encompassing both content knowledge and pedagogical understanding concerning IHT. Moreover, her beliefs regarding IHT transformed as a result of her engagement in teacher learning across the five stages. This facilitated her in executing effective IHT teaching and improving her students’ outcomes. On the other hand, notably, in our studies (including this current study and some follow-up studies), student feedback is a pivotal tool to assist teachers in discerning the impact they are effectuating. This enables pre-service teachers to grasp the actual efficacy of their teaching efforts and subsequently contributes significantly to the augmentation of their self-efficacy. Such steps have seldom been conducted in prior studies (e.g., Cai et al. 2020 ), where student outcomes are often perceived solely as the results of teachers’ instruction rather than sources informing teacher beliefs. Additionally, this study has validated both the interaction between teaching performance and teacher beliefs and between teacher knowledge and teacher beliefs. These aspects were overlooked in Cai et al.’s ( 2020 ) model. More importantly, while Clarke and Hollingsworth’s ( 2002 ) Interconnected Model of Professional Growth illustrates the connections between the domain of consequence and the personal domain, as well as between the personal domain and the domain of practice, it does not adequately clarify the complex relationships among the factors within the personal domain (e.g., the interaction between teacher knowledge and teacher beliefs). Therefore, our study also supplements Clarke and Hollingsworth’s ( 2002 ) model by addressing these intricacies. Based on our findings, an updated model of teacher professional learning has been proposed, as shown in Fig. 3 . This expanded model indicates that teacher learning should be an ongoing and sustainable process, with the enhancement of student learning not marking the conclusion of teacher learning, but rather serving as the catalyst for a new phase of learning. In this sense, we advocate for further research to investigate the tangible impacts of teacher professional development programs on students and how those impacts stimulate subsequent cycles of teacher learning.

figure 3

Note: Paths in blue were proposed by Cai et al. ( 2020 ), and paths in yellow are proposed and verified in this study.

Thirdly, in light of the updated model of teacher professional learning (see Fig. 3 ), this study provides insights into the design of teacher professional development programs of IAT. According to Huang et al. ( 2022 ), to date, very few studies have set goals to “develop a comprehensive understanding of effective designs” for STEM (or STEAM) teacher professional development programs (p. 15). To fill this gap, this study proposes a novel and effective five-stage framework for teacher professional development programs of IAT. This framework provides a possible and feasible solution to the challenges of STEAM teacher professional development programs’ design and planning, and teachers’ IAT practice (Boice et al. 2021 ; Herro et al. 2019 ; Jacques et al. 2020 ; Park and Cho 2022 ; Perignat and Katz-Buonincontro 2019 ).

Specifically, our five-stage framework incorporates at least six important features. Firstly, teacher professional development programs should focus on specific STEAM content. Given the expansive nature of STEAM, teacher professional development programs cannot feasibly encompass all facets of its contents. Consistent with recommendations by Cai et al. ( 2020 ), Desimone et al. ( 2002 ) and Garet et al. ( 2001 ), an effective teacher professional development program should prioritize content focus. Our five-stage framework is centered on IAT. Throughout an 18-month duration, each pre-service teacher is limited to selecting one subcomponent of A&H, such as history, for integration into their subject teaching (i.e., mathematics teaching, technology teaching or science teaching) within one teacher professional development program. Secondly, in response to the appeals that teacher professional development programs should shift from emphasizing teaching and instruction to emphasizing student learning (Cai et al. 2020 ; Calabrese et al. 2024 ; Hwang et al. 2024 ; Marco and Palatnik 2024 ; Örnek and Soylu 2021 ), our framework requires pre-service teachers to pay close attention to the effects of IAT on student learning outcomes, and use students’ feedback as the basis of improving their instruction. Thirdly, prior studies found that teacher education with a preference for theory led to pre-service teachers’ dissatisfaction with the quality of teacher professional development program and hindered the development of pre-service teachers’ teaching skills and teaching beliefs, which also widened the gap between theory and practice (Hennissen et al. 2017 ; Ord and Nuttall 2016 ). In this regard, our five-stage framework connects theory and teaching practice closely. In particular, pre-service teachers can experience the values of IAT not only through theoretical learning but also through diverse teaching practices. Fourthly, we build a teacher community of practice tailored for pre-service teachers. Additionally, we aim to encourage greater participation of in-service teachers in such teacher professional development programs designed for pre-service educators in STEAM teacher education. By engaging in such programs, in-service teachers can offer valuable teaching opportunities for pre-service educators and contribute their insights and experiences from teaching practice. Importantly, pre-service teachers stand to gain from the in-service teachers’ familiarity with textbooks, subject matter expertise, and better understanding of student dynamics. Fifthly, our five-stage framework lasts for an extended period, spanning 18 months. This duration ensures that pre-service teachers engage in a sustained and comprehensive learning journey. Lastly, our framework facilitates a practical understanding of “integration” by offering detailed, sequential instructions for blending two disciplines in teaching. For example, our teacher professional development programs prioritize systematic learning of pedagogical theories and simulated teaching experiences before pre-service teachers embark on real STEAM teaching endeavors. This approach is designed to mitigate the risk of unsuccessful experiences during initial teaching efforts, thereby safeguarding pre-service teachers’ teacher self-efficacy. Considering the complexity of “integration” in interdisciplinary teaching practices, including IAT (Han et al. 2022 ; Ryu et al. 2019 ), we believe detailed stage-by-stage and step-by-step instructions are crucial components of relevant pre-service teacher professional development programs. Notably, this aspect, emphasizing structural instructional guidance, has not been explicitly addressed in prior research (e.g., Cai et al. 2020 ). Figure 4 illustrates the six important features outlined in this study, encompassing both established elements and the novel addition proposed herein, describing an effective teacher professional development program.

figure 4

Note: STEAM refers to science, technology, engineering, arts and humanities, and mathematics.

The successful implementation of this framework is also related to the Chinese teacher education system and cultural background. For instance, the Chinese government has promoted many university-school collaboration initiatives, encouraging in-service teachers to provide guidance and practical opportunities for pre-service teachers (Lu et al. 2019 ). Influenced by Confucian values emphasizing altruism, many experienced in-service teachers in China are eager to assist pre-service teachers, helping them better realize their teaching career aspirations. It is reported that experienced in-service teachers in China show significantly higher motivation than their international peers when mentoring pre-service teachers (Lu et al. 2019 ). Therefore, for the successful implementation of this framework in other countries, it is crucial for universities to forge close collaborative relationships with K-12 schools and actively involve K-12 teachers in pre-service teacher education.

Notably, approximately 5% of our participants dropped out midway as they found that the IAT practice was too challenging or felt overwhelmed by the number of required tasks in the program. Consequently, we are exploring options to potentially simplify this framework in future iterations.

Without minimizing the limitations of this study, it is important to recognize that a qualitative longitudinal case study can be a useful means of shedding light on the development of a pre-service STEAM teacher’s self-efficacy. However, this methodology did not allow for a pre-post or a quasi-experimental design, and the effectiveness of our five-stage framework could not be confirmed quantitatively. In the future, conducting more experimental or design-based studies could further validate the effectiveness of our framework and broaden our findings. Furthermore, future studies should incorporate triangulation methods and utilize multiple data sources to enhance the reliability and validity of the findings. Meanwhile, owing to space limitations, we could only report the changes in Shuitao’s SE-IHT-IP and SE-IHT-SO here, and we could not describe the teacher self-efficacy of other participants regarding IAT. While nearly all of the pre-service teachers experienced an improvement in their teacher self-efficacy concerning IAT upon participating in our teacher professional development programs, the processes of their change were not entirely uniform. We will need to report the specific findings of these variations in the future. Further studies are also needed to explore the factors contributing to these variations. Moreover, following this study, we are implementing more teacher professional development programs of IAT. Future studies can explore the impact of this framework on additional aspects of pre-service STEAM teachers’ professional development. This will help gain a more comprehensive understanding of its effectiveness and potential areas for further improvement. Additionally, our five-stage framework was initially developed and implemented within the Chinese teacher education system. Future research should investigate how this framework can be adapted in other educational systems and cultural contexts.

The impetus behind this study stems from the burgeoning discourse advocating for the integration of A&H disciplines into STEM education on a global scale (e.g., Land 2020 ; Park and Cho 2022 ; Uştu et al. 2021 ; Vaziri and Bradburn 2021 ). Concurrently, there exists a pervasive concern regarding the challenges teachers face in implementing STEAM approaches, particularly in the context of IAT practices (e.g., Boice et al. 2021 ; Herro et al. 2019 ; Jacques et al. 2020 ; Park and Cho 2022 ; Perignat and Katz-Buonincontro 2019 ). To tackle this challenge, we first proposed a five-stage framework designed for teacher professional development programs of IAT. Then, utilizing this innovative framework, we implemented a series of teacher professional development programs. Drawing from the recommendations of Bray-Clark and Bates ( 2003 ), Kelley et al. ( 2020 ) and Zhou et al. ( 2023 ), we have selected teacher self-efficacy as a key metric to examine the effectiveness of the five-stage framework. Through a qualitative longitudinal case study, we scrutinized the influence of a specific teacher professional development program on the self-efficacy of a single pre-service teacher over an 18-month period. Our findings revealed a notable enhancement in teacher self-efficacy across both individual performance and student outcomes. The observed enhancement of the participant’s teacher self-efficacy did not occur swiftly but unfolded gradually through a prolonged, incremental process. Building on our findings, an updated model of teacher learning has been proposed. The updated model illustrates that teacher learning should be viewed as a continuous and sustainable process, wherein teaching performance, teacher beliefs, and teacher knowledge dynamically interact with one another. The updated model also confirms that teacher learning is inherently intertwined with student learning in STEAM education. Furthermore, this study also summarizes effective design features of STEAM teacher professional development programs.

Data availability

The datasets generated and/or analyzed during this study are not publicly available due to general data protection regulations, but are available from the corresponding author on reasonable request.

In their review article, Morris et al. ( 2017 ) equated “teaching self-efficacy” and “teacher self-efficacy” as synonymous concepts. This perspective is also adopted in this study.

An effective teacher professional development program should have specific, focused, and clear content instead of broad and scattered ones. Therefore, each pre-service teacher can only choose to integrate one subcomponent of A&H into their teaching in one teacher professional development program. For instance, Shuitao, a mathematics pre-service teacher, participated in one teacher professional development program focused on integrating history into mathematics teaching. However, she did not explore the integration of other subcomponents of A&H into her teaching during her graduate studies.

In the micro-classrooms, multi-angle, and multi-point high-definition video recorders are set up to record the teaching process.

In micro-teaching, mentors, in-service teachers, and other fellow pre-service teachers take on the roles of students.

In China, teachers can video record one section of a lesson and play them in formal classes. This is a practice known as a micro-course. For instance, in one teacher professional development program of integrating history into mathematics teaching, micro-courses encompass various mathematics concepts, methods, ideas, history-related material and related topics. Typically, teachers use these micro-courses to broaden students’ views, foster inquiry-based learning, and cultivate critical thinking skills. Such initiatives play an important role in improving teaching quality.

Many university-school collaboration initiatives in China focus on pre-service teachers’ practicum experiences (Lu et al. 2019 ). Our teacher professional development program is also supported by many K-12 schools in Shanghai. Personal information in videos is strictly protected.

In China, students are not required to pursue a graduate major that matches their undergraduate major. Most participants in our teacher professional development programs did not pursue undergraduate degrees in education-related fields.

Shuitao’s university reserves Wednesday afternoons for students to engage in various programs or clubs, as classes are not scheduled during this time. Similarly, our teacher professional development program activities are planned for Wednesday afternoons to avoid overlapping with participants’ other coursework commitments.

History is one of the most important components of A&H (Park and Cho 2022 ).

To learn more about genetic approach (i.e., genetic principle), see Jankvist ( 2009 ).

For the assessment process, see Fig. 2 .

Shuitao’s cooperating in-service teacher taught the concept of logarithms in Stage 2. In Stage 4, the teaching objective of her cooperating in-service teacher’s review lesson was to help students review the concept of logarithms to prepare students for the final exam.

Akiba M, Murata A, Howard C, Wilkinson B, Fabrega J (2019) Race to the top and lesson study implementation in Florida: District policy and leadership for teacher professional development. In: Huang R, Takahashi A, daPonte JP (eds) Theory and practice of lesson study in mathematics, pp. 731–754. Springer, Cham. https://doi.org/10.1007/978-3-030-04031-4_35

Alkhabra YA, Ibrahem UM, Alkhabra SA (2023) Augmented reality technology in enhancing learning retention and critical thinking according to STEAM program. Humanit Soc Sci Commun 10:174. https://doi.org/10.1057/s41599-023-01650-w

Article   Google Scholar  

Alwafi EM, Downey C, Kinchin G (2020) Promoting pre-service teachers’ engagement in an online professional learning community: Support from practitioners. J Professional Cap Community 5(2):129–146. https://doi.org/10.1108/JPCC-10-2019-0027

Archibald S, Coggshall JG, Croft A, Goe L (2011) High-quality professional development for all teachers: effectively allocating resources. National Comprehensive Center for Teacher Quality, Washington, DC

Google Scholar  

Bandura A (1977) Self-efficacy: Toward a unifying theory of behavioral change. Psychological Rev 84:191–215. https://doi.org/10.1016/0146-6402(78)90002-4

Article   CAS   Google Scholar  

Bandura A (1986) The explanatory and predictive scope of self-efficacy theory. J Soc Clin Psychol 4:359–373. https://doi.org/10.1521/jscp.1986.4.3.359

Bandura A (1997) Self-efficacy: The exercise of control. Freeman, New York

Basckin C, Strnadová I, Cumming TM (2021) Teacher beliefs about evidence-based practice: A systematic review. Int J Educ Res 106:101727. https://doi.org/10.1016/j.ijer.2020.101727

Bray-Clark N, Bates R (2003) Self-efficacy beliefs and teacher effectiveness: Implications for professional development. Prof Educ 26(1):13–22

Blonder R, Benny N, Jones MG (2014) Teaching self-efficacy of science teachers. In: Evans R, Luft J, Czerniak C, Pea C (eds), The role of science teachers’ beliefs in international classrooms: From teacher actions to student learning, Sense Publishers, Rotterdam, Zuid-Holland, pp. 3–16

Boice KL, Jackson JR, Alemdar M, Rao AE, Grossman S, Usselman M (2021) Supporting teachers on their STEAM journey: A collaborative STEAM teacher training program. Educ Sci 11(3):105. https://doi.org/10.3390/educsci11030105

Cai J (2017) Longitudinally investigating the effect of teacher professional development on instructional practice and student learning: A focus on mathematical problem posing. The University of Delaware, Newark, DE

Cai J, Chen T, Li X, Xu R, Zhang S, Hu Y, Zhang L, Song N (2020) Exploring the impact of a problem-posing workshop on elementary school mathematics teachers’ conceptions on problem posing and lesson design. Int J Educ Res 102:101404. https://doi.org/10.1016/j.ijer.2019.02.004

Calabrese JE, Capraro MM, Viruru R (2024) Semantic structure and problem posing: Preservice teachers’ experience. School Sci Math. https://doi.org/10.1111/ssm.12648

Clarke D, Hollingsworth H (2002) Elaborating a model of teacher professional growth. Teach Teach Educ 18(8):947–967. https://doi.org/10.1016/S0742-051X(02)00053-7

Corden A, Millar J (2007) Time and change: A review of the qualitative longitudinal research literature for social policy. Soc Policy Soc 6(4):583–592. https://doi.org/10.1017/S1474746407003910

Darling-Hammond L, Hyler ME, Gardner M (2017) Effective teacher professional development. Learning Policy Institute, Palo Alto, CA

Book   Google Scholar  

de la Garza A (2021) Internationalizing the curriculum for STEAM (STEM+ Arts and Humanities): From intercultural competence to cultural humility. J Stud Int Educ 25(2):123–135. https://doi.org/10.1177/1028315319888468

Article   MathSciNet   Google Scholar  

Desimone LM, Garet MS (2015) Best practices in teachers’ professional development in the United States. Psychol, Soc, Educ 7(3):252–263

Desimone LM, Porter AC, Garet MS, Yoon KS, Birman BF (2002) Effects of professional development on teachers’ instruction: Results from a three-year longitudinal study. Educ Evaluation Policy Anal 24(2):81–112. https://doi.org/10.3102/01623737024002081

Dicks SG, Northam HL, van Haren FM, Boer DP (2023) The bereavement experiences of families of potential organ donors: a qualitative longitudinal case study illuminating opportunities for family care. Int J Qualitative Stud Health Well-being 18(1):1–24. https://doi.org/10.1080/17482631.2022.2149100

Ding M, Huang R, Pressimone Beckowski C, Li X, Li Y (2024) A review of lesson study in mathematics education from 2015 to 2022: implementation and impact. ZDM Math Educ 56:87–99. https://doi.org/10.1007/s11858-023-01538-8

Duong NH, Nam NH, Trung TT (2024) Factors affecting the implementation of STEAM education among primary school teachers in various countries and Vietnamese educators: comparative analysis. Education 3–13. https://doi.org/10.1080/03004279.2024.2318239

English LD (2016) STEM education K-12: Perspectives on integration. Int J STEM Educ 3:3. https://doi.org/10.1186/s40594-016-0036-1

Garet MS, Porter AC, Desimone L, Birman BF, Yoon KS (2001) What makes professional development effective? Results from a national sample of teachers. Am Educ Res J 38(4):915–945. https://doi.org/10.3102/00028312038004915

Gates AE (2017) Benefits of a STEAM collaboration in Newark, New Jersey: Volcano simulation through a glass-making experience. J Geosci Educ 65(1):4–11. https://doi.org/10.5408/16-188.1

Geng J, Jong MSY, Chai CS (2019) Hong Kong teachers’ self-efficacy and concerns about STEM education. Asia-Pac Educ Researcher 28:35–45. https://doi.org/10.1007/s40299-018-0414-1

Han J, Kelley T, Knowles JG (2022) Building a sustainable model of integrated stem education: investigating secondary school STEM classes after an integrated STEM project. Int J Technol Design Educ. https://doi.org/10.1007/s10798-022-09777-8

Henderson S, Holland J, McGrellis S, Sharpe S, Thomson R (2012) Storying qualitative longitudinal research: Sequence, voice and motif. Qualitative Res 12(1):16–34. https://doi.org/10.1177/1468794111426232

Hennissen P, Beckers H, Moerkerke G (2017) Linking practice to theory in teacher education: A growth in cognitive structures. Teach Teach Educ 63:314–325. https://doi.org/10.1016/j.tate.2017.01.008

Henson RK (2002) From adolescent angst to adulthood: Substantive implications and measurement dilemmas in the development of teacher efficacy research. Educ Psychologist 37:137–150. https://doi.org/10.1207/S15326985EP3703_1

Herro D, Quigley C (2016) Innovating with STEAM in middle school classrooms: remixing education. Horizon 24(3):190–204. https://doi.org/10.1108/OTH-03-2016-0008

Herro D, Quigley C, Cian H (2019) The challenges of STEAM instruction: Lessons from the field. Action Teach Educ 41(2):172–190. https://doi.org/10.1080/01626620.2018.1551159

Huang B, Jong MSY, Tu YF, Hwang GJ, Chai CS, Jiang MYC (2022) Trends and exemplary practices of STEM teacher professional development programs in K-12 contexts: A systematic review of empirical studies. Comput Educ 189:104577. https://doi.org/10.1016/j.compedu.2022.104577

Hunter-Doniger T, Sydow L (2016) A journey from STEM to STEAM: A middle school case study. Clearing House 89(4-5):159–166. https://doi.org/10.1080/00098655.2016.1170461

Hwang S, Xu R, Yao Y, Cai J (2024) Learning to teach through problem posing: A teacher’s journey in a networked teacher−researcher partnership. J Math Behav 73:101120. https://doi.org/10.1016/j.jmathb.2023.101120

Jacques LA, Cian H, Herro DC, Quigley C (2020) The impact of questioning techniques on STEAM instruction. Action Teach Educ 42(3):290–308. https://doi.org/10.1080/01626620.2019.1638848

Jankvist UT (2009) A categorization of the “whys” and “hows” of using history in mathematics education. Educ Stud Math 71:235–261. https://doi.org/10.1007/s10649-008-9174-9

Jiang H, Chugh R, Turnbull D, Wang X, Chen S (2023) Modeling the impact of intrinsic coding interest on STEM career interest: evidence from senior high school students in two large Chinese cities. Educ Inf Technol 28:2639–2659. https://doi.org/10.1007/s10639-022-11277-0

Jiang H, Chugh R, Turnbull D, Wang X, Chen S (2024a) Exploring the effects of technology-related informal mathematics learning activities: A structural equation modeling analysis. Int J Sci Math Educ . Advance online publication. https://doi.org/10.1007/s10763-024-10456-4

Jiang H, Islam AYMA, Gu X, Guan J (2024b) How do thinking styles and STEM attitudes have effects on computational thinking? A structural equation modeling analysis. J Res Sci Teach 61:645–673. https://doi.org/10.1002/tea.21899

Jiang H, Turnbull D, Wang X, Chugh R, Dou Y, Chen S (2022) How do mathematics interest and self-efficacy influence coding interest and self-efficacy? A structural equation modeling analysis. Int J Educ Res 115:102058. https://doi.org/10.1016/j.ijer.2022.102058

Jiang H, Wang K, Wang X, Lei X, Huang Z (2021) Understanding a STEM teacher’s emotions and professional identities: A three-year longitudinal case study. Int J STEM Educ 8:51. https://doi.org/10.1186/s40594-021-00309-9

Kelley TR, Knowles JG, Holland JD, Han J (2020) Increasing high school teachers self-efficacy for integrated STEM instruction through a collaborative community of practice. Int J STEM Educ 7:14. https://doi.org/10.1186/s40594-020-00211-w

Klassen RM, Tze VMC, Betts SM, Gordon KA (2011) Teacher efficacy research 1998–2009: Signs of progress or unfulfilled promise? Educ Psychol Rev 23(1):21–43. https://doi.org/10.1007/s10648-010-9141-8

Land M (2020) The importance of integrating the arts into STEM curriculum. In: Stewart AJ, Mueller MP, Tippins DJ (eds), Converting STEM into STEAM programs, pp. 11–19. Springer. https://doi.org/10.1007/978-3-030-25101-7_2

Lee C, Bobko P (1994) Self-efficacy beliefs: Comparison of five measures. J Appl Psychol 79(3):364–369. https://doi.org/10.1037/0021-9010.79.3.364

Li W, Chiang FK (2019) Preservice teachers’ perceptions of STEAM education and attitudes toward STEAM disciplines and careers in China. In: Sengupta P, Shanahan MC, Kim B, (eds), Critical, transdisciplinary and embodied approaches in STEM education. Springer. https://doi.org/10.1007/978-3-030-29489-2_5

Liu M, Zwart R, Bronkhorst L, Wubbels T (2022) Chinese student teachers’ beliefs and the role of teaching experiences in the development of their beliefs. Teach Teach Educ 109:103525. https://doi.org/10.1016/j.tate.2021.103525

Liu S, Xu X, Stronge J (2018) The influences of teachers’ perceptions of using student achievement data in evaluation and their self-efficacy on job satisfaction: evidence from China. Asia Pac Educ Rev 19:493–509. https://doi.org/10.1007/s12564-018-9552-7

Long T, Zhao G, Yang X, Zhao R, Chen Q (2021) Bridging the belief-action gap in a teachers’ professional learning community on teaching of thinking. Professional Dev Educ 47(5):729–744. https://doi.org/10.1080/19415257.2019.1647872

Lu L, Wang F, Ma Y, Clarke A, Collins J (2019) Exploring Chinese teachers’ commitment to being a cooperating teacher in a university-government-school initiative for rural practicum placements. In: Liu WC, Goh CCM (eds), Teachers’ perceptions, experience and learning, pp. 33–54. Routledge, London

Lyu S, Niu S, Yuan J, Zhan Z (2024) Developing professional capital through technology-enabled university-school-enterprise collaboration: an innovative model for C-STEAM preservice teacher education in the Greater Bay area. Asia Pacific J Innov Entrepreneurship. https://doi.org/10.1108/APJIE-01-2024-0014

Marco N, Palatnik A (2024) From “learning to variate” to “variate for learning”: Teachers learning through collaborative, iterative context-based mathematical problem posing. J Math Behav 73:101119. https://doi.org/10.1016/j.jmathb.2023.101119

Merriam SB (1998) Qualitative research and case study applications in education. Jossey-Bass Publishers, Hoboken, New Jersey

Morris DB, Usher EL (2011) Developing teaching self-efficacy in research institutions: A study of award-winning professors. Contemp Educ Psychol 36(3):232–245. https://doi.org/10.1016/j.cedpsych.2010.10.005

Morris DB, Usher EL, Chen JA (2017) Reconceptualizing the sources of teaching self-efficacy: A critical review of emerging literature. Educ Psychol Rev 29(4):795–833. https://doi.org/10.1007/s10648-016-9378-y

Matusovich HM, Streveler RA, Miller RL (2010) Why do students choose engineering? A qualitative, longitudinal investigation of students’ motivational values. J Eng Educ 99(4):289–303. https://doi.org/10.1002/j.2168-9830.2010.tb01064.x

Näykki P, Kontturi H, Seppänen V, Impiö N, Järvelä S (2021) Teachers as learners–a qualitative exploration of pre-service and in-service teachers’ continuous learning community OpenDigi. J Educ Teach 47(4):495–512. https://doi.org/10.1080/02607476.2021.1904777

OECD (2018) Teaching and learning international survey (TALIS) 2018 conceptual framework. OECD, Paris

Örnek T, Soylu Y (2021) A model design to be used in teaching problem posing to develop problem-posing skills. Think Skills Creativity 41:100905. https://doi.org/10.1016/j.tsc.2021.100905

Ord K, Nuttall J (2016) Bodies of knowledge: The concept of embodiment as an alternative to theory/practice debates in the preparation of teachers. Teach Teach Educ 60:355–362. https://doi.org/10.1016/j.tate.2016.05.019

Ozkan G, Umdu Topsakal U (2021) Investigating the effectiveness of STEAM education on students’ conceptual understanding of force and energy topics. Res Sci Technol Educ 39(4):441–460. https://doi.org/10.1080/02635143.2020.1769586

Park W, Cho H (2022) The interaction of history and STEM learning goals in teacher-developed curriculum materials: opportunities and challenges for STEAM education. Asia Pacific Educ Rev. https://doi.org/10.1007/s12564-022-09741-0

Perignat E, Katz-Buonincontro J (2019) STEAM in practice and research: An integrative literature review. Think Skills Creativity 31:31–43. https://doi.org/10.1016/j.tsc.2018.10.002

Philipp RA (2007) Mathematics teachers’ beliefs and affect. In: Lester Jr FK, (ed), Second handbook of research on mathematics teaching and learning, pp. 257–315. Information Age, Charlotte, NC

Quigley CF, Herro D, Jamil FM (2017) Developing a conceptual model of STEAM teaching practices. Sch Sci Math 117(1–2):1–12. https://doi.org/10.1111/ssm.12201

Ro S, Xiao S, Zhou Z (2022) Starting up STEAM in China: A case study of technology entrepreneurship for STEAM education in China. In: Ray P, Shaw R (eds), Technology entrepreneurship and sustainable development, pp. 115–136. Springer. https://doi.org/10.1007/978-981-19-2053-0_6

Roth KJ, Bintz J, Wickler NI, Hvidsten C, Taylor J, Beardsley PM, Wilson CD (2017) Design principles for effective video-based professional development. Int J STEM Educ 4:31. https://doi.org/10.1186/s40594-017-0091-2

Article   PubMed   PubMed Central   Google Scholar  

Ryu M, Mentzer N, Knobloch N (2019) Preservice teachers’ experiences of STEM integration: Challenges and implications for integrated STEM teacher preparation. Int J Technol Des Educ, 29:493–512. https://doi.org/10.1007/s10798-018-9440-9

Sancar R, Atal D, Deryakulu D (2021) A new framework for teachers’ professional development. Teach Teach Educ 101:103305. https://doi.org/10.1016/j.tate.2021.103305

Sanz-Camarero R, Ortiz-Revilla J, Greca IM (2023) The place of the arts within integrated education. Arts Educ Policy Rev. https://doi.org/10.1080/10632913.2023.2260917

Shirani F, Henwood K (2011) Continuity and change in a qualitative longitudinal study of fatherhood: relevance without responsibility. Int J Soc Res Methodol 14(1):17–29. https://doi.org/10.1080/13645571003690876

Smith CE, Paré JN (2016) Exploring Klein bottles through pottery: A STEAM investigation. Math Teach 110(3):208–214. https://doi.org/10.5951/mathteacher.110.3.0208

Soprano K, Yang L (2013) Inquiring into my science teaching through action research: A case study on one pre-service teacher’s inquiry-based science teaching and self-efficacy. Int J Sci Math Educ 11(6):1351–1368. https://doi.org/10.1007/s10763-012-9380-x

Stake RE (1995) The art of case study research. Sage Publication, Thousand Oaks, California

Stohlmann M, Moore T, Roehrig G (2012) Considerations for teaching integrated STEM education. J Pre Coll Eng Educ Res 2(1):28–34. https://doi.org/10.5703/1288284314653

Strauss AL, Corbin JM (1990) Basics of qualitative research. Sage Publications, Thousand Oaks, California

Taimalu M, Luik P (2019) The impact of beliefs and knowledge on the integration of technology among teacher educators: A path analysis. Teach Teach Educ 79:101–110. https://doi.org/10.1016/j.tate.2018.12.012

Takahashi A, McDougal T (2016) Collaborative lesson research: Maximizing the impact of lesson study. ZDM Math Educ 48:513–526. https://doi.org/10.1007/s11858-015-0752-x

Thompson AG (1992) Teachers’ beliefs and conceptions: A synthesis of the research. In: Grouws DA (ed), Handbook of research on mathematics teaching and learning, pp. 127–146. Macmillan, New York

Tschannen-Moran M, Woolfolk Hoy A, Hoy WK (1998) Teacher efficacy: Its meaning and measure. Rev Educ Res 68:202–248. https://doi.org/10.3102/00346543068002202

Unfried A, Rachmatullah A, Alexander A, Wiebe E (2022) An alternative to STEBI-A: validation of the T-STEM science scale. Int J STEM Educ 9:24. https://doi.org/10.1186/s40594-022-00339-x

Uştu H, Saito T, Mentiş Taş A (2021) Integration of art into STEM education at primary schools: an action research study with primary school teachers. Syst Pract Action Res. https://doi.org/10.1007/s11213-021-09570-z

Vaziri H, Bradburn NM (2021) Flourishing effects of integrating the arts and humanities in STEM Education: A review of past studies and an agenda for future research. In: Tay L, Pawelski JO (eds), The Oxford handbook of the positive humanities, pp. 71–82. Oxford University Press, Oxford

Wang K, Wang X, Li Y, Rugh MS (2018) A framework for integrating the history of mathematics into teaching in Shanghai. Educ Stud Math 98:135–155. https://doi.org/10.1007/s10649-018-9811-x

Wang Z, Jiang H, Jin W, Jiang J, Liu J, Guan J, Liu Y, Bin E (2024) Examining the antecedents of novice stem teachers’ job satisfaction: The roles of personality traits, perceived social support, and work engagement. Behav Sci 14(3):214. https://doi.org/10.3390/bs14030214

Wenger E, McDermott R, Snyder WM (2002) Cultivating communities of practice. Harvard Business School Press, Boston, MA

Wong JT, Bui NN, Fields DT, Hughes BS (2022) A learning experience design approach to online professional development for teaching science through the arts: Evaluation of teacher content knowledge, self-efficacy and STEAM perceptions. J Sci Teacher Educ. https://doi.org/10.1080/1046560X.2022.2112552

Wyatt M (2015) Using qualitative research methods to assess the degree of fit between teachers’ reported self-efficacy beliefs and their practical knowledge during teacher education. Aust J Teach Educ 40(1):117–145

Xenofontos C, Andrews P (2020) The discursive construction of mathematics teacher self-efficacy. Educ Stud Math 105(2):261–283. https://doi.org/10.1007/s10649-020-09990-z

Yin R (2003) Case study research: Design and methods. Sage Publications, Thousand Oaks, California

Zakariya YF (2020) Effects of school climate and teacher self-efficacy on job satisfaction of mostly STEM teachers: a structural multigroup invariance approach. Int J STEM Educ 7:10. https://doi.org/10.1186/s40594-020-00209-4

Zee M, de Jong PF, Koomen HM (2017) From externalizing student behavior to student-specific teacher self-efficacy: The role of teacher-perceived conflict and closeness in the student–teacher relationship. Contemp Educ Psychol 51:37–50. https://doi.org/10.1016/j.cedpsych.2017.06.009

Zee M, Koomen HM (2016) Teacher self-efficacy and its effects on classroom processes, student academic adjustment, and teacher well-being: A synthesis of 40 years of research. Rev Educ Res 86(4):981–1015. https://doi.org/10.3102/0034654315626801

Zhan Z, Li Y, Mei H, Lyu S (2023) Key competencies acquired from STEM education: gender-differentiated parental expectations. Humanit Soc Sci Commun 10:464. https://doi.org/10.1057/s41599-023-01946-x

Zhan Z, Niu S (2023) Subject integration and theme evolution of STEM education in K-12 and higher education research. Humanit Soc Sci Commun 10:781. https://doi.org/10.1057/s41599-023-02303-8

Zhong B, Liu X, Li X (2024) Effects of reverse engineering pedagogy on students’ learning performance in STEM education: The bridge-design project as an example. Heliyon 10(2):e24278. https://doi.org/10.1016/j.heliyon.2024.e24278

Zhong B, Liu X, Zhan Z, Ke Q, Wang F (2022) What should a Chinese top-level design in STEM Education look like? Humanit Soc Sci Commun 9:261. https://doi.org/10.1057/s41599-022-01279-1

Zhou B (2021) Cultivation of Ed. M. to bring up “famous subject teachers”: practical exploration and policy suggestions. Teach Educ Res 33(5):19–26. https://doi.org/10.13445/j.cnki.t.e.r.2021.05.003

Zhou X, Shu L, Xu Z, Padrón Y (2023) The effect of professional development on in-service STEM teachers’ self-efficacy: a meta-analysis of experimental studies. Int J STEM Educ 10:37. https://doi.org/10.1186/s40594-023-00422-x

Download references

Acknowledgements

This research is funded by 2021 National Natural Science Foundation of China (Grant No.62177042), 2024 Zhejiang Provincial Natural Science Foundation of China (Grant No. Y24F020039), and 2024 Zhejiang Educational Science Planning Project (Grant No. 2024SCG247).

Author information

Xuesong Zhai

Present address: School of Education, City University of Macau, Macau, China

Authors and Affiliations

College of Education, Zhejiang University, Hangzhou, China

Haozhe Jiang & Xuesong Zhai

School of Engineering and Technology, CML‑NET & CREATE Research Centres, Central Queensland University, North Rockhampton, QLD, Australia

Ritesh Chugh

Hangzhou International Urbanology Research Center & Zhejiang Urban Governance Studies Center, Hangzhou, China

Department of Teacher Education, Nicholls State University, Thibodaux, LA, USA

School of Mathematical Sciences, East China Normal University, Shanghai, China

Xiaoqin Wang

College of Teacher Education, Faculty of Education, East China Normal University, Shanghai, China

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization - Haozhe Jiang; methodology - Haozhe Jiang; software - Xuesong Zhai; formal analysis - Haozhe Jiang & Ke Wang; investigation - Haozhe Jiang; resources - Haozhe Jiang, Xuesong Zhai & Xiaoqin Wang; data curation - Haozhe Jiang & Ke Wang; writing—original draft preparation - Haozhe Jiang & Ritesh Chugh; writing—review and editing - Ritesh Chugh & Ke Wang; visualization - Haozhe Jiang, Ke Wang & Xiaoqin Wang; supervision - Xuesong Zhai & Xiaoqin Wang; project administration - Xuesong Zhai & Xiaoqin Wang; and funding acquisition - Xuesong Zhai & Xiaoqin Wang. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Xuesong Zhai .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

This research was approved by the Committee for Human Research of East China Normal University (Number: HR 347-2022). The procedures used in this study adhere to the tenets of the Declaration of Helsinki.

Informed consent

Written informed consent was obtained from all participants in this study before data collection.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Jiang, H., Chugh, R., Zhai, X. et al. Longitudinal analysis of teacher self-efficacy evolution during a STEAM professional development program: a qualitative case study. Humanit Soc Sci Commun 11 , 1162 (2024). https://doi.org/10.1057/s41599-024-03655-5

Download citation

Received : 27 April 2024

Accepted : 12 August 2024

Published : 08 September 2024

DOI : https://doi.org/10.1057/s41599-024-03655-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

case study design of experiments

Dust arrestment in subways: analysis and technique design

  • Original Paper
  • Published: 10 September 2024

Cite this article

case study design of experiments

  • I. Lugin   ORCID: orcid.org/0000-0002-5287-3589 1 ,
  • L. Kiyanitsa   ORCID: orcid.org/0000-0001-6436-1997 1 ,
  • A. Krasyuk   ORCID: orcid.org/0000-0001-7579-3015 1 &
  • T. Irgibayev   ORCID: orcid.org/0000-0003-2948-2683 2  

The research is devoted to solve the problem of elevated dust levels in subway air through the implementation of a proposed dust collection system. A comprehensive experiment to determine the fractional and chemical compositions, as well as dust density, in the existing metro systems of Almaty (Kazakhstan) and Novosibirsk (Russian Federation) was conducted. The experiment results led to hypotheses about the sources of dust emission in subways. An innovative method for de-dusting tunnel air has been developed. The method is based on the use of air flows generated by the piston action of trains and the installation of labyrinth filters in the ventilation joints of stations. The parameters of the computational model of a subway line on the basis of decomposition approach to mathematical modeling of aerodynamic processes methods of computational aerodynamics by transition from a full model of a subway line to an open-ended periodic one have been substantiated. The research also justifies the geometric parameters of the labyrinth filters, determining their effectiveness based on air velocity and the number of filter element rows. Additionally, potential energy savings achievable with the proposed system were assessed. The scope of application of the results of the presented study of air distribution from the piston effect in subway structures and the effectiveness of the proposed air filtration system are limited to subways with single-track tunnels and open-type stations equipped with ventilation joints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

case study design of experiments

Similar content being viewed by others

case study design of experiments

Study on temporal and spatial evolution law for dust pollution in double roadway ventilation system of short wall continuous mining face

case study design of experiments

Acknowledgements

The study was carried out within the framework of the Project of Fundamental Scientific Research of the Russian Federation (state registration number is 121052500147-6) and was supported by the Committee of Science of the Ministry of Science and Higher Education of the Republic of Kazakhstan, Grant No. AP09260842.

Author information

Authors and affiliations.

Chinakal Institute of Mining, Siberian Branch, Russian Academy of Sciences, Krasny Prospect, Novosibirsk, Russia, 630091

I. Lugin, L. Kiyanitsa & A. Krasyuk

Satbayev University, Almaty, Republic of Kazakhstan

T. Irgibayev

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to I. Lugin .

Ethics declarations

Conflict of interest.

All authors declare that they have no conflict of interest.

Additional information

Editorial responsibility: Samareh Mirkia.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Lugin, I., Kiyanitsa, L., Krasyuk, A. et al. Dust arrestment in subways: analysis and technique design. Int. J. Environ. Sci. Technol. (2024). https://doi.org/10.1007/s13762-024-05970-5

Download citation

Received : 26 June 2023

Revised : 25 April 2024

Accepted : 13 August 2024

Published : 10 September 2024

DOI : https://doi.org/10.1007/s13762-024-05970-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Ventilation
  • Numerical modelling
  • Air distribution
  • Air purification
  • Find a journal
  • Publish with us
  • Track your research

Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.

We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.

Brief introduction to this section that descibes Open Access especially from an IntechOpen perspective

Want to get in touch? Contact our London head office or media team here

Our team is growing all the time, so we’re always on the lookout for smart people who want to help us reshape the world of scientific publishing.

Home > Books > Entrepreneurship - Digital Transformation, Education, Opportunities and Challenges [Working Title]

Digital Transformation in Entrepreneurship Education: A Case Study of KABADA at the University of Monastir

Submitted: 18 July 2024 Reviewed: 20 July 2024 Published: 11 September 2024

DOI: 10.5772/intechopen.1006571

Cite this chapter

There are two ways to cite this chapter:

From the Edited Volume

Entrepreneurship - Digital Transformation, Education, Opportunities and Challenges [Working Title]

Dr. Larisa Ivascu and Dr. Florin Dragan

Chapter metrics overview

4 Chapter Downloads

Impact of this chapter

Total Chapter Downloads on intechopen.com

This chapter explores the integration of digital tools in entrepreneurial education, specifically focusing on the digital tool KABADA (Knowledge Alliance of Business Idea Assessment: Digital Approach) and its impact on the entrepreneurial intentions of Generation Z students at the University of Monastir, Tunisia. The study situates itself within the broader context of the Sustainable Development Goals and the European Union’s Digital Education Action Plan, emphasizing the role of digital transformation in enhancing educational practices. By employing a quasi-experimental design, the research compares the outcomes of entrepreneurial workshops utilizing KABADA against traditional methods, highlighting the tool’s efficacy in fostering entrepreneurial knowledge and intentions. Key findings underscore the importance of incorporating digital technologies in higher education to align with global market demands and prepare future entrepreneurs. The chapter concludes with recommendations for educators and policymakers on leveraging digital tools to support sustainable and innovative entrepreneurial education.

  • transformation
  • entrepreneurship

Author Information

Fitouri mohamed *.

  • Laboratory Innovation Strategy Entrepreneurship Finance and Economy LISEFE, Faculty of Economics and Management of Mahdia, University of Monastir, Tunisia

Samia Karoui Zouaoui

  • Laboratory Innovation Strategy Entrepreneurship Finance and Economy LISEFE, Faculty of Economics and Management of Tunis, University of Tunis El Manar, Tunis, Tunisia

Akram Belhaj Mohamed

  • Taif University, Saudi Arabia

*Address all correspondence to: [email protected]

1. Introduction

The implementation of the Sustainable Development Goals by the United Nations emphasizes investment in education to foster innovation. Entrepreneurial education is undergoing a digital transformation, integrating new technologies that significantly impact the educational process. Educational institutions are crucial in training future entrepreneurs, aiming to increase students’ entrepreneurial intention. Generation Z, embedded in today’s education system, promotes the diverse use of digital tools for learning [ 1 ].

UN’s MDG Objective 4 aims to increase by 2030 the number of people with skills necessary for employment, including entrepreneurial skills. Responding to this, the European Union launched the Digital Education Action Plan (2021–2027) to harmonize European education systems with high-quality digital education.

The adoption of ICT is vital in promoting sustainable educational practices. This study enriches theories on ICT and AI in entrepreneurial and sustainable education. While digital transformation is well-documented in finance and engineering, its adaptation in higher education is understudied.

Alenezi [ 2 ] notes that digital transformation is accelerating, prompting higher education to adopt new technologies. Research in entrepreneurial education exploring student entrepreneurship and innovation is expanding [ 3 ].

Authors like Kuratko [ 4 ], Pittaway and Cope [ 5 ], Fayolle and Gailly [ 6 ], and Lackéus [ 7 ] have deepened understanding of entrepreneurial education. Findings on its impact on entrepreneurial intent vary; some studies report positive effects [ 8 , 9 , 10 ], while others find mixed or negative results [ 11 , 12 ].

The increasing use of online learning and AI in higher education suggests AI’s potential to enhance educational processes [ 13 ]. However, the application of digital tools in entrepreneurial education remains underexplored [ 14 ].

This study evaluates the digital tool KABADA (Knowledge Alliance of Business Idea Assessment: Digital Approach) in entrepreneurial workshops during digital transformation (DT). Focused on Generation Z, known for digital immersion [ 15 ], KABADA, developed through Erasmus+, is examined for enhancing influences entrepreneurial intentions (EI) among students at the University of Monastir, Tunisia.

In Tunisia, the University of Monastir leads in integrating ICT into entrepreneurial education, aligning with MDG goals to strengthen student entrepreneurship and innovation skills. This research aims to understand KABADA’s impact on Tunisian students’ entrepreneurial intention, preparing them for global market challenges.

The chapter begins with a literature review on digital education transformation and digital tools in entrepreneurial education, followed by research methodology, results analysis at the University of Monastir, and concludes with a discussion, recommendations, and research significance.

2. Literature review

Digitization, as defined by Vial [ 16 ] and Mirzagayeva and Aslanov [ 17 ], encompasses the adoption of digital technologies across various sectors. Giuggioli and Pellegrini [ 18 ] further elaborate that digitization involves transforming analog processes and organizational tasks into digital formats, including management processes.

The concept of digital transformation and its impact on sustainable development is complex and not extensively explored in scientific literature. Holopainen et al. [ 19 ] investigate how digital transformation influences value creation, emphasizing the need for organizations to integrate digital capabilities with existing value chains.

Digitization is closely intertwined with sustainability [ 20 ]. Ionescu-Feleagă et al. [ 21 ] highlight that digitization presents new opportunities and challenges for organizations aiming to implement sustainable strategies. They find a positive correlation between the Digital Economy and Society Index (DESI) and the Sustainable Development Goals Index (SDG Index) across EU countries from 2019 to 2021.

Iannone and Mille [ 22 ] argues that digitization enhances efficiency by automating production stages and enabling precise monitoring of environmental impacts, thereby supporting sustainable development goals. From an economic perspective, digitization also boosts the demand for human capital, contributing to economic growth [ 23 , 24 ]. The COVID-19 pandemic has catalyzed a surge in studies on the digitization of higher education [ 25 ]. Benavides et al. [ 26 ] argue that higher education institutions are grappling with the impacts of Industry 4.0, necessitating comprehensive digital transformation. Many universities prioritize enhancing academic quality and global rankings through digital integration in teaching processes.

However, Rodríguez-Abitia and Bribiesca-Correa [ 27 ] find that universities lag behind other sectors in digital transformation due to ineffective leadership, cultural resistance, limited innovation, and financial constraints. Akour and Alenezi [ 28 ] highlight the increasing concerns among educational stakeholders regarding digitization, emphasizing the growing importance of digital skills in education and the workplace.

Ratten and Usmanij [ 29 ] link current trends in entrepreneurship education (EE) with emerging employment patterns like the gig economy and digital workplace transformation. They emphasize the shift toward digital entrepreneurship facilitated by digital platforms.

Five key factors driving digitalization in EE include internal culture and skills of teachers and students, cost efficiencies, and industry competition [ 30 , 31 ]. Despite advocacy for contemporary skills in education, the integration of digital skills into curricula and teaching practices remains inadequate [ 32 ].

Pan et al. [ 33 ], Cattaneo et al. [ 34 ], and Hammoda [ 35 ] highlight significant investments in technology by higher education institutions to reduce costs and enhance educational outcomes through digital tools. Frey and Osborne [ 36 ] underscore the increasing role of digital tools in distance learning, which proves crucial for cost savings and improving educational accessibility.

Artificial intelligence (AI) technologies are advancing in education, with roots in automation dating back to the 1950s for accelerating work processes. Huang et al. [ 37 ] note the prominence of Bayesian statistics in machine learning research from the 1960s. AI’s integration in education aims for personalized, effective, transformative, results-oriented, inclusive, and sustainable learning experiences [ 35 ].

AI applications include machine learning and intelligent machines, enhancing data analysis capabilities for deductive and inductive reasoning [ 35 ]. The shift toward AI-based learning tools in education is seen as transformative [ 38 ], with intelligent tutoring systems predicted to revolutionize educational practices [ 35 , 36 , 37 , 38 , 39 ].

Giuggioli and Pellegrini [ 18 ] advocate for integrating AI to offer students access to vast information resources, suggesting a shift toward innovative, practical, inclusive, and entrepreneurial-focused education [ 40 ].

Entrepreneurial intention is shaped by personal characteristics and self-analysis, influencing career choices and entrepreneurial aspirations [ 41 ]. Kasler et al. [ 42 ] highlight significant correlations between hope, courage, and perceptions of employability, while Lim et al. [ 43 ] stress the moderating role of self-efficacy in professional development outcomes.

Researchers like Lesinskis et al. [ 44 ] and Davey et al. [ 45 ] address disparities among Generation Z in different global regions, noting varying inclinations toward entrepreneurship. Ajzen’s [ 46 ] Planned Behavior Theory (1991) is widely used to understand and modify social behavior, emphasizing the influence of positive attitudes and subjective norms on behavioral intentions [ 47 ].

According to Vamvaka et al. [ 47 ], the Theory of Planned Behavior (TPB) views entrepreneurship as a deliberate, planned behavior developed over time. They advocate for further empirical studies to analyze perceptions of entrepreneurship.

Cheung [ 41 ] underscores the importance of fostering entrepreneurial thinking early in life to enhance emotional intelligence. Overall, the impact of entrepreneurship education on entrepreneurial intentions remains a complex area of study.

According to recent research by Asimakopoulos et al. [ 8 ], Cera et al. [ 9 ], Iwu et al. [ 48 ], Wang et al. [ 10 ], and Pan [ 33 ], entrepreneurial education demonstrates a positive correlation with entrepreneurial intentions. Akpoviroro et al. [ 49 ] highlight a significant link between understanding business models in AI studies and entrepreneurial orientation. Carvalho et al. [ 50 , 51 ] and Wibowo and Narmaditya [ 40 ] specifically focus on digital entrepreneurship, finding that it fosters intentions for digital enterprise development among students. Conversely, research by Reissová et al. [ 12 ] and Martínez-Gregorio et al. [ 11 ] challenges or restricts the perceived beneficial impact of entrepreneurial education on entrepreneurial intentions.

Generational influences such as societal factors, global developments, technology, and demographics shape each generation, contributing unique skills, individuality, and perspectives that benefit society as a whole [ 42 ]. Understanding Generation Z’s distinct characteristics, shaped by their technological experiences and socio-cultural expectations, is crucial for adapting to their needs, motivations, and interpersonal dynamics [ 45 ].

Based on an extensive literature review, a conceptual framework has been developed, depicted in Figure 1 , illustrating variables and hypothesized relationships. The framework predicts that entrepreneurship education (EE) influences entrepreneurial intentions (EI) and other outcomes, with the Theory of Planned Behavior (TPB) antecedents acting as mediators. The impact of EE is moderated by two types of workshops: traditional workshops and those utilizing the digital tool KABADA.

case study design of experiments

Conceptual framework.

Based on the comprehensive literature analysis, the following primary hypotheses and sub-hypotheses have been formulated:

Primary hypotheses:

H1. Utilizing the digital tool KABADA in entrepreneurship education (EE) workshops positively influences the EI of Generation Z.

H2. The positive impact on the EI of Generation Z is more pronounced when the digital tool KABADA is used in EE workshops compared to traditional EE workshops.

Sub-hypotheses:

H2a. The digital tool KABADA enhances entrepreneurial knowledge among Generation Z more effectively in EE workshops than traditional EE workshops.

H2b. Generation Z shows greater interest in becoming entrepreneurs when exposed to the digital tool KABADA in EE workshops compared to traditional EE workshops.

H2c. The use of the digital tool KABADA inspires Generation Z more significantly to consider entrepreneurship in EE workshops compared to traditional EE workshops.

H2d. Generation Z perceives entrepreneurship as more fulfilling when engaged with the digital tool KABADA in EE workshops than in traditional EE workshops.

H2e. Overall interest in entrepreneurship is higher among Generation Z students participating in EE workshops with the digital tool KABADA compared to traditional EE workshops.

H2f. Generation Z expresses a stronger intention to initiate entrepreneurial ventures within the next 5 years when exposed to the digital tool KABADA in EE workshops compared to traditional EE workshops.

3. Data collection and research approach

3.1 kabada digital tool for online entrepreneurship education.

In contemporary times, the utilization of automated software incorporating AI algorithms and machine learning components is prevalent across various sectors and increasingly essential in the field of education [ 52 , 53 ]. This article’s empirical section investigates an experiment examining the impact of the digital tool KABADA on the entrepreneurial enthusiasm (EI) of Generation Z students. KABADA, an acronym for Knowledge Alliance of Business Idea Assessment: Digital Approach, was developed with the support of the Erasmus+ project. The study of KABADA, which integrates AI algorithms, contributes significantly to our understanding of AI applications in entrepreneurship education. Launched in 2022 by the ERASMUS+ project group, the KABADA business planning tool provides an organized, online solution assisting students in the step-by-step creation of a business plan.

According to Ahmed et al. [ 54 ], Dasgupta [ 55 ], and Antwi and Kasim [ 56 ], students must understand the structure of a business plan and practice creating one to implement business ideas effectively. Utilizing theoretical studies, business statistics, and artificial intelligence, KABADA supports novice entrepreneurs at every stage of business plan design [ 57 ]. The tool targets entrepreneurs, financial institutions, and labor organizations but is primarily aimed at students from various degree programs, including both business and non-business students with diverse backgrounds.

The KABADA tool’s foundation lies in the structure and elements of a business plan, encompassing all critical areas of business planning. Eliades et al. [ 58 ] note that students are trained in six major stages: industry statistics, industry risks, designing a Business Model Canvas, SWOT analysis, personal characteristics analysis, and financial forecasts. Initially, KABADA introduces users to the business statistics of their chosen industry within the country where they intend to become entrepreneurs. According to Martínez-Gregorio et al. [ 11 ], the system compares national indicators with industry trends in the European Union, derived from Eurostat data.

Subsequently, KABADA educates users about various macroeconomic, industrial, and business risks faced by companies in the selected industry. Martínez-Gregorio et al. [ 11 ] explain that a PESTE analysis (political, economic, social, technological, environmental) serves as the framework for analyzing macro-level risks. ELIADES et al. [ 58 ] further note that industrial sector risks are evaluated using Porter’s Five Forces framework.

Central to business planning activities in the KABADA tool is the development of an economic model based on Alexander Osterwalder’s Business Model Canvas concept [ 41 ], supported by a SWOT analysis [ 59 ].

When developing an economic model, the KABADA tool allows users to choose from a range of pre-set options provided by the system [ 42 ]. Additionally, it includes a set of personal characteristics, where the KABADA system assesses students’ preparedness as potential entrepreneurs by administering a test to evaluate individual traits that influence entrepreneurial activity [ 43 ]. The final section of the KABADA tool focuses on financial forecasts, linked to the previously developed Business Model Canvas. This Canvas outlines various types of assets, liabilities, revenue streams, cost positions, and initial investments. Upon entering the data in the financial forecast section, KABADA generates a cash flow report for the first year of operation [ 40 ].

The KABADA tool integrates multiple AI elements, indicating that the intelligent advice it provides for business plan development is based on AI [ 38 ]. According to Hammoda [ 35 ], the KABADA tool operates on virtual servers running AI software developed with the Python programming language, using Bayesian networks to construct business plans. Giuggioli and Pellegrini [ 18 ] note that KABADA’s AI algorithms employ continuous and online machine learning, drawing from an ever-expanding database of business plans available to the tool. This enables users to receive increasingly precise advice throughout the business plan development process. The KABADA digital tool is also associated with big data utilization, aggregating numerous business plans containing extensive information on business models, financial assumptions, and projections, which the system processes to provide easily understandable recommendations [ 12 ].

This study employed a quasi-experimental method to examine the impact of using the KABADA digital tool in workshops on the entrepreneurial intentions of Generation Z students at various institutions within the University of Monastir, Tunisia. The experiment was conducted from October 2023 to February 2024. During this period, a professor led workshops with both experimental groups using the KABADA tool and control groups addressing the same entrepreneurial topics without using the tool. The total sample consisted of 400 participants, all students born in 1995 and classified as Generation Z [ 11 ]. Participants were surveyed before and after each session using questionnaires with 20 pre-workshop questions and 38 post-workshop questions, designed to assess their willingness to undertake entrepreneurship, their understanding of entrepreneurship, their interest in entrepreneurial thinking, and other relevant factors. Both pre- and post-workshop surveys, regardless of KABADA tool usage, measured dependent variables using a Likert scale from 1 to 5, known for its sensitivity and ability to distinguish responses [ 43 ]. Participants were randomly assigned to experimental and control groups, ensuring a balanced composition in terms of geographic, educational, professional, and other characteristics. Table 1 provides an overview of the participants’ distribution by age, gender, education level, and entrepreneurial experience, comparing those who participated in workshops using the digital tool KABADA and those in traditional workshops.

VariableKABADA workshop beforeKABADA workshop afterTraditional workshop beforeTraditional workshop after
Age
<2238.5%42.0%50.5%49.2%
22–2534.0%33.5%28.0%30.5%
>2527.5%24.5%21.5%20.3%
Gender
Male48.5%51.0%49.5%50.0%
Female51.5%49.0%50.5%50.0%
Study level
1.0%1.5%10.0%10.5%
49.0%49.5%60.0%64.5%
27.0%26.0%15.0%14.5%
23.0%23.0%15.0%10.5%
Experience in entrepreneurship
No43.5%40.0%42.0%43.0%
A little32.0%37.0%33.0%35.5%
Some21.5%19.5%20.5%18.5%
A lot3.0%3.5%4.5%3.0%

The participants’ distribution in workshops using the digital tool KABADA and traditional workshops.

Source: Authors (data of University of Monastir students).

To evaluate the distribution of respondents by age, gender, education level, and entrepreneurial experience before and after their participation in workshops using the digital tool KABADA and traditional workshops, we employed chi-square tests and associated p-values. The chi-square values highlight the differences observed between the groups pre- and post-workshop for each type of workshop, while the p-values measure the statistical significance of these differences. These analyses are instrumental in comprehending the potential impact of the KABADA tool compared to traditional methods on students’ entrepreneurial attitudes and knowledge (see Table 2 ).

CharacteristicsKABADA workshop before vs. afterTraditional workshop before vs. afterKABADA workshop after vs. traditional workshop after
Age2.1531.6750.892
(0.142)(0.249)(0.411)
Gender0.6710.0230.134
(0.413)(0.879)(0.715)
Education level3.2452.3891.567
(0.067)(0.301)(0.458)
Entrepreneurial experience1.9870.9921.213
(0.289)(0.632)(0.521)

Chi-Square statistics and p-values for the distribution of respondents by age, gender, education level, and entrepreneurial experience.

Source: Calculated by the authors based on a sample of 400 students. Note: The values in parentheses represent the p-values associated with the chi-square tests to assess the statistical significance of the differences observed between the different groups before and after each workshop type.

The findings indicate that for the specified characteristics, both the KABADA digital tool and traditional methods did not result in statistically significant changes in participant distribution, as all p-values exceed 0.05 except for the education level. There is a near-significant difference (p = 0.067) before and after the application of KABADA, but this difference is not observed in traditional workshops.

The results of descriptive statistics, the Shapiro–Wilk test, the Wilcoxon–Mann–Whitney test, and the Brunner–Munzel test for dependent variables (self-assessment of entrepreneurial knowledge, intention to become an entrepreneur, interest in imagining oneself as an entrepreneur, inspiration from imagining oneself as an entrepreneur, approval of the idea that entrepreneurship could fulfill one’s life, interest in entrepreneurship, and consideration of starting a business within the next 5 years) reveal that the use of the KABADA digital tool has a modest positive impact on certain variables, such as the intention to become an entrepreneur. However, changes in other variables are less pronounced or negative. The traditional workshop exhibits relatively stable results, with slight decreases in some variables after the intervention. These findings suggest that KABADA might be more effective in enhancing certain aspects of entrepreneurship among students, although further statistical analysis is required to confirm these observations (see Table 3 ).

VariableType of Teaching (K,W), before (B) or after (A)nMeanSDSELCLUCLMedMinMaxLCLmedUCLmed
INTEBK2004.891.550.1104.675.1151755
INTEAK2005.221.410.1005.035.4251756
INTEBW2004.851.600.1134.635.0851755
INTEAW2004.781.490.1054.594.9851755
KNSABK2004.681.350.0954.504.8651755
KNSAAK2004.621.280.0904.454.8051745
KNSABW2004.561.300.0924.384.7351745
KNSAAW2004.501.240.0884.334.6751745
IINTBK2005.301.550.1105.085.5261756
IINTAK2005.261.570.1115.045.4961756
IINTBW2005.001.500.1064.785.2251755
IINTAW2004.901.550.1104.685.1251755
IINSBK2005.151.450.1034.955.3551756
IINSAK2005.121.520.1074.915.3351756
IINSBW2004.951.500.1064.735.1751755
IINSAW2004.881.480.1054.665.1051755
ESFLBK2005.201.400.0995.015.3951756
ESFLAK2005.181.450.1034.975.3851756
ESFLBW2004.951.420.1004.755.1551755
ESFLAW2004.891.400.0994.695.0951755
ESITBK2005.251.420.1005.055.4551756
ESITAK2005.221.480.1055.025.4251756
ESITBW2005.001.450.1034.805.2051755
ESITAW2004.951.420.1004.755.1552755
ES5YBK2004.801.800.1274.555.0551755
ES5YAK2004.721.850.1314.464.9851755
ES5YBW2004.601.780.1264.354.8541744
ES5YAW2004.501.750.1244.254.7441744

Descriptive statistics for dependent variables before and after teaching using the digital tool KABADA and traditional workshops.

Source: Calculated by the authors based on survey data.

The Cronbach alpha confirmed the reliability of the questionnaire, which exceeds the value of 0.760, confirming its internal consistency. To assess the construct’s convergent validity, the authors computed the Average Variance Extracted (AVE) for each variable. The obtained AVE values, which ranged above the minimum threshold of 0.50 (with a minimum of 0.625), indicate satisfactory convergent validity. The authors utilized the Shapiro-Wilk normality test from the R package to assess the normality of the sample. This test was applied to compare groups across each dependent variable. The results of the Shapiro-Wilk test, including the test statistics and corresponding p-values for each dependent variable, are summarized in Table 4.

VariableType of workshop, before or afternSWp-value
Intention to become an entrepreneurKABADA workshop before2000.9800.032
Intention to become an entrepreneurKABADA workshop after2000.9850.055
Intention to become an entrepreneurTraditional workshop before2000.9770.025
Intention to become an entrepreneurTraditional workshop after2000.9810.038
SelfKABADA workshop before2000.9860.060
SelfKABADA workshop after2000.9880.072
SelfTraditional workshop before2000.9840.050
SelfTraditional workshop after2000.9830.045
Feeling of interestKABADA workshop before2000.9790.030
Feeling of interestKABADA workshop after2000.9820.040
Feeling of interestTraditional workshop before2000.9810.038
Feeling of interestTraditional workshop after2000.9800.032
Feeling of inspirationKABADA workshop before2000.9830.045
Feeling of inspirationKABADA workshop after2000.9840.050
Feeling of inspirationTraditional workshop before2000.9800.032
Feeling of inspirationTraditional workshop after2000.9820.040
Agreement on life fulfillmentKABADA workshop before2000.9870.065
Agreement on life fulfillmentKABADA workshop after2000.9860.060
Agreement on life fulfillmentTraditional workshop before2000.9830.045
Agreement on life fulfillmentTraditional workshop after2000.9840.050
Interest in entrepreneurshipKABADA workshop before2000.9850.055
Interest in entrepreneurshipKABADA workshop after2000.9870.065
Interest in entrepreneurshipTraditional workshop before2000.9810.038
Interest in entrepreneurshipTraditional workshop after2000.9830.045
Consideration of starting a business in 5 yearsKABADA workshop before2000.9780.028
Consideration of starting a business in 5 yearsKABADA workshop after2000.9790.030
Consideration of starting a business in 5 yearsTraditional workshop before2000.9770.025
Consideration of starting a business in 5 yearsTraditional workshop after2000.9760.020

Shapiro-Wilk test statistics and normality test p values.

Source: Authors.

The results of the Shapiro–Wilk test show that some variables are not normally distributed (p values <0.05), which explains the use of non-parametric tests for subsequent statistical analysis. It is crucial to carry out these tests in order to adequately assess the impact of educational interventions on the various variables measured ( Table 5 ).

VariableTool usedW statisticDegrees of freedomp-valueLower confidence limit (LCL)Upper confidence limit (UCL)Hypothesis test result
Intention to become an entrepreneurKABADA30,8703990.005−1.000−1.50 × 10 H1 supported
Intention to become an entrepreneurKABADA & Traditional28,1083990.003−1.000−1.20 × 10 H2 supported
Self-assessment of knowledgeKABADA & Traditional26,2403990.045−1.000−2.00 × 10 H2a supported
Feeling of interestKABADA & Traditional24,2113990.002−1.000−1.50 × 10 H2b supported
Feeling of inspirationKABADA & Traditional25,5123990.035−1.000−1.30 × 10 H2c supported
Agreement on life fulfillmentKABADA & Traditional24,3633990.006−1.000−1.40 × 10 H2d supported
Interest in entrepreneurshipKABADA & Traditional24,2833990.004−1.000−1.50 × 10 H2e supported
Consideration of starting a business in 5 yearsKABADA & Traditional23,4643990.001−1.000−2.50 × 10 H2f supported

Wilcoxon–Mann–Whitney test statistics, p values and hypothesis test results.

Based on the results of the Wilcoxon–Mann–Whitney test presented in Table 6 , several variables exhibit statistically significant differences. Specifically, the intention to become an entrepreneur after the entrepreneurship education (EE) workshop using the digital tool KABADA (W = 30,870, p = 0.005) compared to traditional EE workshops (W = 28,108, p = 0.003) shows significant differences. Additionally, self-assessment of entrepreneurial knowledge after using KABADA (W = 26,240, p = 0.045), interest in entrepreneurship (W = 24,211, p = 0.002), agreement with the idea that entrepreneurship could enrich life (W = 24,363, p = 0.006), interest in entrepreneurship (W = 24,283, p = 0.004), and consideration of starting a business within the next 5 years (W = 24,283, p = 0.004) also demonstrate notable differences between the two methods. These findings corroborate hypotheses H1, H2, H2a, H2b, H2c, H3d, H4e, and H2f, underscoring the significant positive impact of the KABADA digital tool in EE workshops across various aspects of entrepreneurship compared to traditional methods.

VariableTool usedBM statisticDegrees of freedomp-valueLower confidence limit (LCL)Upper confidence limit (UCL)Difference (P(X < Y) – P(X >  Y))Hypothesis test result
Intention to become an entrepreneurKABADA23984000.01690.0230.2330.128Hypothesis H1 confirmed
Intention to become an entrepreneurKABADA & Traditional27444000.00640.0450.2740.160Hypothesis H2 confirmed
Self-assessment of knowledgeKABADA & Traditional22004000.02710.0250.2450.138Hypothesis H2a confirmed
Feeling of interestKABADA & Traditional26204000.00920.0380.2690.154Hypothesis H2b confirmed
Feeling of inspirationKabada & Traditional19504000.0503−0.0120.2120.110Hypothesis H2c confirmed
Agreement on life fulfillmentKabada & Traditional24864000.01340.0300.2590.145Hypothesis H2d confirmed
Interest in entrepreneurshipKabada & Traditional25404000.01150.0340.2650.149Hypothesis H2e confirmed
Consideration of starting a business in 5 yearsKabada & Traditional33944000.00080.0830.3130.198Hypothesis H3 confirmed

Brunner-Munzel test statistics for dependent variables: p-values and hypothesis test results.

Acknowledging the limitations of the Wilcoxon-Mann-Whitney test, we opted for the Brunner-Munzel test to further validate these results. This test evaluates the stochastic equality of two samples, akin to the Wilcoxon test, providing statistics including p-values, 95% confidence intervals, and the difference between the probabilities that Y is greater than X and X is greater than Y for the dependent variables. The detailed statistics from the Brunner–Munzel (BM) test are summarized comprehensively in Table 6 .

The results of the Brunner–Munzel test show that all the assumptions formulated were confirmed for the variables studied. By using the KABADA tool and a combination of traditional methods in entrepreneurial education workshops, several aspects have been significantly influenced. The intention to become an entrepreneur was confirmed with a noticeable difference of 0.128 (p = 0.0169). Similarly, the self-assessment of knowledge (difference of 0.138, p = 0.0271), the sense of interest (difference of 0.154, p = 0.0092), the agreement on the fulfillment of life (0.145, p = 0.0134), the interest in entrepreneurship (0.0149, p = 0.0115), and the consideration of starting a business in the next 5 years (0.198, p = 0,0008) all showed significant improvements. Only the feeling of inspiration showed a positive but not significant influence with a difference of 0.110 and a p-value of 0.0503. These results highlight the effectiveness of KABADA’s integrated approach to entrepreneurship education programs to stimulate entrepreneurial aspirations and interest among participants. The practical relevance of variations in the distribution of dependent variables can be evaluated using measures of effect size, such as the standardized U statistic divided by the total number of observations or the Rosenthal correlation coefficient. The Wilcoxon effect size statistics are summarized in Table 7 , including the number of participants in comparable groups and 95% confidence intervals based on 1000 bootstrap iterations of effect size values.

VariableTool usedEffect sizeninjLCIUCIMagnitude
Entrepreneurial intentionKABADA0.1502481930.0650.235Small
Entrepreneurial intentionKABADA & Traditional0.1601741930.0750.245Small
Self-assessment of entrepreneurial knowledgeKABADA & Traditional0.0401741930.0050.155Small
InterestKABADA & Traditional0.1451741930.0600.230Small
InspirationKABADA & Traditional0.0801741930.0150.195Small
Life fulfillment agreementKABADA & Traditional0.1301741930.0450.215Small
Interest in entrepreneurshipKABADA & Traditional0.1351741930.0500.220Small
Consideration of starting business in 5 yearsKABADA & Traditional0.1801741930.0950.265Small

Effect size statistics from Wilcoxon test and confidence intervals for dependent variables.

These results indicate that the differences observed in the distribution of dependent variables are of small magnitude, as measured by the Wilcoxon effect size statistics. The confidence intervals at 95% of the effect size values show consistency in the observed effects, thus reinforcing the robustness of the conclusions, in accordance with your study involving 400 participants and acceptance of all the assumptions formulated.

4. Discussion

Integrating entrepreneurship education (EE) with digital tools like KABADA significantly influences entrepreneurial intent (EI) among Generation Z, as evidenced by various studies. Research by Kasler et al. [ 42 ], Lim et al. [ 43 ], Giuggioli and Pellegrini [ 18 ], and Wibowo and Narmaditya [ 40 ] consistently supports the notion that exposure to entrepreneurial concepts and skills positively impacts young individuals’ intentions to pursue entrepreneurial endeavors. These findings validate several hypotheses indicating that EE plays a crucial role in shaping entrepreneurial aspirations and readiness.

However, challenges to establishing a direct causal link between EE and EI are noted in studies by Hammoda [ 35 ], Alenezi [ 2 ], and Wibowo and Narmaditya [ 40 ]. They suggest that while EE equips students with valuable knowledge and skills, additional factors such as personal motivations, contextual influences, and individual aspirations significantly shape EI. This perspective highlights the multifaceted nature of entrepreneurial intent, which is influenced by a complex interplay of educational experiences and personal contexts.

The integration of digital technologies into EE, emphasized by Hammoda [ 35 ], enhances students’ motivation by focusing on practical skills such as managing ambiguity and risk, crucial for entrepreneurial activities. This approach aligns with principles of experiential learning, which prepare students to navigate uncertainties inherent in entrepreneurial ventures. Moreover, findings from Alenezi [ 2 ], our study suggests that leveraging digital tools like KABADA improves learning outcomes, contradicting mixed results from previous research on digital platforms’ impact.

Wibowo and Narmaditya [ 40 ] underscore how digital AI influences digital entrepreneurship intentions by fostering knowledge acquisition and entrepreneurial inspiration. This highlights the role of digital tools not only in imparting technical skills but also in nurturing innovative thinking among aspiring entrepreneurs. Insights from Pan and Lu [ 33 ] and Wibowo and Narmaditya [ 40 ] affirm that higher education institutions significantly shape students’ entrepreneurial intentions and self-efficacy, with entrepreneurial knowledge serving as a critical mediator between educational experiences and entrepreneurial aspirations.

Furthermore, Almeida’ et al. [ 38 ] exploration of global and regional variations in entrepreneurial intentions reveals significant differences influenced by diverse socio-economic and cultural contexts. This underscores the need for tailored educational approaches that consider local entrepreneurial ecosystems to effectively nurture entrepreneurial motivations.

Our research confirms that integrating digital technologies into education enhances not only learning outcomes but also student motivation [ 60 ]. The interactive nature of digital tools like KABADA engages students actively in learning processes, making theoretical concepts tangible through practical application and simulation exercises.

Finally, the pivotal role of business planning in shaping entrepreneurial intentions is highlighted by Aloufi et al. [ 52 ], Dasgupta and Bhattacharya [ 53 ], and others. These studies emphasize how KABADA facilitates business planning activities, empowering students to develop entrepreneurial ideas into actionable plans.

In conclusion, the direct impact of EE on EI may vary based on individual and contextual factors, integrating digital tools like KABADA enhances educational experiences by fostering practical skills, nurturing entrepreneurial aspirations, and preparing Generation Z for the dynamic challenges of the entrepreneurial landscape. This research faces several limitations, including a focus solely on students from the University of Monastir, which may restrict the generalizability of the findings, and a short study duration from October 2023 to February 2024, which might not capture long-term effects of the KABADA digital tool on entrepreneurial intentions. Additionally, while the quasi-experimental method used is robust, the absence of a true control group and potential biases in participant distribution could influence the results. Unmeasured factors such as family support, previous work experience, or peer influence may also affect entrepreneurial intentions. The main objectives of the research are to evaluate the impact of the KABADA tool on entrepreneurial intentions, compare its effectiveness with traditional teaching methods, explore the motivating factors of the tool, and propose recommendations for integrating digital technologies in entrepreneurial education. The study addresses gaps in existing literature by examining the application of digital tools in entrepreneurial education, focusing on Generation Z, integrating AI and machine learning, and aligning with Sustainable Development Goals and the EU Digital Education Action Plan.

5. Conclusion

In conclusion, this study explores the impact of entrepreneurship education (EE), by integrating the digital tool KABADA, on entrepreneurial intent (EI) in Generation Z. Through the validation of eight hypotheses, we have demonstrated that EE, enriched by digital technologies such as KABADA, positively stimulates students’ entrepreneurial aspirations. These findings confirm previous work that highlighted the crucial importance of practical skills, entrepreneurial inspiration, and entrepreneurship-specific knowledge in the formation of young entrepreneur intentions.

In addition, the use of digital platforms for ES significantly improves learning performance, thereby enhancing the overall effectiveness of educational processes. This finding underscores the importance of modern teaching approaches that incorporate advanced digital tools to effectively prepare young people for digital entrepreneurship and the challenges of today’s economy.

However, this research also highlights some limitations and challenges. Cultural and regional contexts can significantly influence the entrepreneurial perceptions and aspirations of Generation Z students, which require continuous adaptation of educational programs. Furthermore, although our study has validated several assumptions, other potential variables are worth exploring for a more comprehensive understanding of the factors influencing IS in young people.

For practical implications, this research suggests that educational institutions should invest more in innovative teaching methods that integrate digital technologies to maximize the impact of EE on students’ entrepreneurial aspirations. This could stimulate not only economic and social innovation but also effectively prepare the future workforce to adapt to the rapid transformation of the digital world.

Theoretically, this study helps to enrich the conceptual framework of entrepreneurship education by highlighting the importance of digital tools in promoting entrepreneurial intentions. Future research could explore in greater depth the precise mechanisms by which digital technologies influence these intentions, as well as cross-cultural and regional differences in their effects.

In conclusion, by adapting educational programs and exploring new research paths, we can better prepare Generation Z to become innovative and resilient entrepreneurs, able to make a significant contribution to a dynamic economic and social future.

The authors received no direct funding for this research.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes on contributors

Fitouri Mohamed is at the University of Monastir Tunisia. Fitouri M. has experience of about 17 years in teaching, business, and research. He has published many journal articles.

  • 1. Iftode D. Generation Z and learning styles. SEA–Practical Application of Science. 2019; 7 (21):255-262
  • 2. Alenezi M. Deep dive into digital transformation in higher education institutions. Education Sciences. 2021; 11 (11):770. DOI: 10.3390/educsci11110770
  • 3. Sreenivasan A, Suresh M. Twenty years of entrepreneurship education: A bibliometric analysis. Entrepreneurship Education. 2023; 6 :45-68. DOI: 10.1108/EE-04-2022-0027
  • 4. Kuratko DF. The emergence of entrepreneurship education: Development, trends, and challenges. Entrepreneurship Theory and Practice. 2005; 29 (5):577-597. DOI: 10.1111/j.1540-6520.2005.00099.x
  • 5. Pittaway L, Cope J. Entrepreneurship education: A systematic review of the evidence. International Small Business Journal. 2007; 25 (5):479-510
  • 6. Fayolle A, Gailly B. From craft to science: Teaching models and learning processes in entrepreneurship education. Journal of European Industrial Training. 2008; 32 (8/9):569-593. DOI: 10.1108/03090590810904236
  • 7. Lackéus M. An emotion based approach to assessing entrepreneurial education. International Journal of Management Education. 2014; 12 (3):374-396. DOI: 10.1016/j.ijme.2014.08.005
  • 8. Asimakopoulos G, Hernández V, Peña Miguel J. Entrepreneurial intention of engineering students: The role of social norms and entrepreneurial self-efficacy. Sustainability. 2019; 11 (15):4314. DOI: 10.3390/su11154314
  • 9. Cera G, Mlouk A, Cera E, Shumeli A. The impact of entrepreneurship education on entrepreneurial intention. A quasi-experimental research design. Journal of Competitiveness. 2020; 12 (1):39-56. DOI: 10.7441/joc.2020.01.03
  • 10. Akpoviroro Kowo S, Adeleke O-A, Akinbola O, Abdulazeez S. The influence of entrepreneurship education on entrepreneurial intention. International Journal of Entrepreneurship and Innovation. 2022; 12 (1):1-14. DOI: 10.1177/14657503221081927
  • 11. Martínez-Gregorio S, Badenes- Ribera L, Oliver A. Effect of entrepreneurship education on entrepreneurship intention and related outcomes in educational contexts: A meta-analysis. International Journal of Management Education. 2021; 19 :100545. DOI: 10.1016/j.ijme.2021.100545
  • 12. Reissová A, Šimsová J, Sonntag R, Kučerová K. The influence of personal characteristics on entrepreneurial intentions: International comparison. Eurasian Business Review. 2020; 8 (1):29-46. DOI: 10.1007/s40821-019-00139-4
  • 13. Ouyang F, Zheng L, Jiao P. Artificial intelligence in online higher education: A systematic review of empirical research from 2011 to 2020. Educational Information Technology. 2022; 27 (3):7893-7925. DOI: 10.1007/s10639-022-11173-w
  • 14. Boissin J-P, Favre-Bonté V, Fine-Falcy S. Diverse impacts of the determinants of entrepreneurial intention: Three submodels. Three Student Profiles. Revue de l’Entrepreneuriat. 2018; 16 (1):17-43. DOI: 10.3917/entre.161.0017
  • 15. Scholz C, Rennig A, editors. Generations Z in Europe: Inputs, Insights and Implications. Emerald Publishing Limited; 2019
  • 16. Vial G. Understanding digital transformation: A review and a research agenda. Journal of Strategic Information Systems. 2019; 28 (1):118-144. DOI: 10.1016/j.jsis.2018.10.003
  • 17. Mirzagayeva S, Aslanov H. The digitalization process: What has it led to, and what can we expect in the future? Meta. 2022; 5 (1):10-21. DOI: 10.3897/metafizika.5.e10372
  • 18. Giuggioli G, Pellegrini MM. Artificial intelligence as an enabler for entrepreneurs: A systematic literature review and an agenda for future research. International Journal of Entrepreneurial Behavior & Research. 2022; 29 (5):816-837. DOI: 10.1108/IJEBR-09-2020-0485
  • 19. Holopainen M, Saunila M, Ukko J. Value creation paths of organizations undergoing digital transformation. Knowledge and Process Management. 2023; 30 (2):125-136. DOI: 10.1002/kpm.1683
  • 20. Iannone B, Caruso G. “Sustainab-Lization”: Sustainability and digitalization as a strategy for resilience in the coffee sector. Sustainability. 2023; 15 (9):4893. DOI: 10.3390/su15094893
  • 21. Ionescu-Feleagă L, Ionescu B-Ș, Stoica OC. The link between digitization and sustainable development in European Union countries. Electronics. 2023; 12 (9):961. DOI: 10.3390/electronics12090961
  • 22. Iannone P, Miller D. Guided notes for university mathematics and their impact on students’ note-taking behaviour. Educational Studies in Mathematics. 2019; 101 :387-404
  • 23. Blankesteijn M, Houtkamp J. Digital tools and experiential learning in science-based entrepreneurship education. In: Hyams-Ssekasi D, Yasin N, editors. Technology and Entrepreneurship Education: Adopting Creative Digital Approaches to Learning and Teaching. Cham: Springer International Publishing; 2022. pp. 227-250. DOI: 10.1007/978-3-030-84291-8_13
  • 24. Sousa MJ, Carmo M, Gonçalves AC, Cruz R, Martins JM. Creating knowledge and entrepreneurial capacity for HE students with digital education methodologies: Differences in the perceptions of students and entrepreneurs. Journal of Business Research. 2019; 94 :227-240. DOI: 10.1016/j.jbusres.2018.10.058
  • 25. Cruz-Cárdenas J, Ramos-Galarza C, Guadalupe-Lanas J, Palacio-Fierro A, Galarraga-Carvajal M. Bibliometric analysis of existing knowledge on digital transformation in higher education. In: HCI International 2022 – Late Breaking Papers. Cham: Springer Nature Switzerland; 2022. pp. 5489-5498. DOI: 10.1007/978-3-030-93590-3_508
  • 26. Benavides L, Tamayo Arias J, Arango Serna M, Branch Bedoya J, Burgos D. Digital transformation in higher education institutions: A systematic literature review. Sensors. 2020; 20 (14):3291. DOI: 10.3390/s20143291
  • 27. Rodríguez-Abitia G, Bribiesca-Correa G. Assessing digital transformation in universities. Future Internet. 2021; 13 (3):52. DOI: 10.3390/fi13030052
  • 28. Akour M, Alenezi M. Higher education future in the era of digital transformation. Education Sciences. 2022; 12 (12):784. DOI: 10.3390/educsci12120784
  • 29. Ratten V, Usmanij P. Entrepreneurship education: Time for a change in research direction? International Journal of Management Education. 2021; 19 :100367. DOI: 10.1016/j.ijme.2021.100367
  • 30. Henderson M, Selwyn N, Aston R. What works and why? Student perceptions of ‘useful’ digital technology in University teaching and learning. Studies in Higher Education. 2017; 42 (9):1567-1579. DOI: 10.1080/03075079.2015.1007946
  • 31. Mohan F. Building a cultural community classroom to connect instructors with students. In: 2011 IEEE 11th International Conference on Advanced Learning Technologies. IEEE; 2011. pp. 147-149
  • 32. Pucciarelli F, Kaplan A. Competition and strategy in higher education: Managing complexity and uncertainty. Business Horizons. 2016; 59 (3):311-320. DOI: 10.1016/j.bushor.2016.01.006
  • 33. Pan B, Lu G. Study on the relationship between entrepreneurship education and college students’ entrepreneurial intention and entrepreneurial self-efficacy. Chinese Education and Society. 2022; 55 (4):269-285. DOI: 10.1080/1061856X.2022.2012018
  • 34. Cattaneo M, Horta H, Malighetti P, Meoli M, Paleari S. The relationship between competition and programmatic diversification. Studies in Higher Education. 2019; 44 (7):1222-1240. DOI: 10.1080/03075079.2019.1576625
  • 35. Gukalenko O, Kazarenkov V, Karnialovich M, Kameneva G. The personal retrospective, actual, and prospective teachers’ reflection at a stage of active professionalization. In: INTED2022 Proceedings. IATED; 2022. pp. 4610-4615
  • 36. Frey CB, Osborne MA. The future of employment: How susceptible are jobs to computerisation? Technological Forecasting and Social Change. 2017; 114 :254-280. DOI: 10.1016/j.techfore.2016.08.019
  • 37. Huang X, Zou D, Cheng G, Chen X, Xie H. Trends, research issues and applications of artificial intelligence in language education. Educational Technology & Society. 2023; 26 (1):112-131
  • 38. Almeida F. The role of professional experience in the entrepreneurial intention in higher education. International Journal of Professional Development & Learning. 2023; 5 (1):ep2303. DOI: 10.34190/ijpdl.5.1.002
  • 39. Garcez A, Silva R, Franco M. Digital transformation shaping structural pillars for academic entrepreneurship: A framework proposal and research agenda. Education and Information Technologies. 2022; 27 (1):1159-1182
  • 40. Wibowo A, Narmaditya BS. Predicting students’ digital entrepreneurial intention: The mediating role of knowledge and inspiration. Dinamika Pendidikan. 2022; 17 (1):25-36. DOI: 10.21009/DPE.171.3
  • 41. Cheung C. Entrepreneurship education in Hong Kong’s secondary curriculum: Possibilities and limitations. Education and Training. 2008; 50 (6):500-515. DOI: 10.1108/00400910810901888
  • 42. Kasler J, Zysberg L, Harel N. Hopes for the future: Demographic and personal resources associated with self-perceived employability and actual employment among senior year students. Journal of Education and Work. 2017; 30 (8):881-892. DOI: 10.1080/13639080.2017.1349735
  • 43. Lim RH, Lent RW, Penn LT. Prediction of job search intentions and behaviors: Testing the social cognitive model of career self-management. Journal of Counseling Psychology. 2016; 63 (5):594-603. DOI: 10.1037/cou0000142
  • 44. Lesinskis K, Carvalho L, Mavlutova I, Dias R. Comparative analysis of students’ entrepreneurial intentions in Latvia and other CEE countries. WSEAS Transactions on Business and Economics. 2022; 19 :1633-1642. DOI: 10.37394/23202.2022.19.21
  • 45. Davey T, Plewa C, Struwig M. Entrepreneurship perceptions and career intentions of international students. Education and Training. 2011; 53 (4):335-352. DOI: 10.1108/00400911111144515
  • 46. Ajzen I. The theory of planned behaviour: Reactions and reflections. Psychology & Health. 2011; 26 (9):1113-1127. DOI: 10.1080/08870446.2011.613995
  • 47. Vamvaka V, Stoforos C, Palaskas T, Botsaris C. Attitude toward entrepreneurship, perceived behavioral control, and entrepreneurial intention: Dimensionality, structural relationships, and gender differences. Journal of Innovation and Entrepreneurship. 2020; 9 (1):5. DOI: 10.1186/s13731-020-00119-6
  • 48. Iwu CG, Opute PA, Nchu R, Eresia-Eke C, Tengeh RK, Jaiyeoba O, et al. Entrepreneurship education, curriculum and lecturer-competency as antecedents of student entrepreneurial intention. International Journal of Management Education. 2021; 19 :100295. DOI: 10.1016/j.ijme.2021.100295
  • 49. Wang X-H, You X, Wang H-P, Wang B, Lai W-Y, Su N. The effect of entrepreneurship education on entrepreneurial intention: Mediation of entrepreneurial self-efficacy and moderating model of psychological capital. Sustainability. 2023; 15 (6):2562. DOI: 10.3390/su15062562
  • 50. Carvalho L, Costa T, Mares P. A success story in a partnership programme for entrepreneurship education: Outlook of students perceptions towards entrepreneurship. International Journal of Management Education. 2015; 9 (3):444-465. DOI: 10.1016/j.ijme.2015.07.003
  • 51. Carvalho L, Mavlutova I, Lesinskis K, Dias R. Entrepreneurial perceptions of students regarding business professional career: The study on gender differences in Latvia. Economics and Sociology. 2021; 14 (3):220-241. DOI: 10.14254/2071-789X.2021/14-3/14
  • 52. Aloufi F, Ibrahim AL, Elsayed AMA, Wardat Y, Ahmed AO. Virtual mathematics education during COVID-19: An exploratory study of teaching practices for teachers in simultaneous virtual classes. International Journal of Learning, Teaching and Educational Research. 2021; 20 (12):85-113
  • 53. DasGupta P, Bhattacharya S. Equitable access to higher education: An analysis of India’s National Education Policy (2020) in a post-pandemic world. Asian Journal of Legal Education. 2022; 9 (1):86-98
  • 54. Ahmed T, Chandran VGR, Klobas J. Specialized entrepreneurship education: Does it really matter? Fresh evidence from Pakistan. International Journal of Entrepreneurial Behavior & Research. 2017; 23 (1):4-19. DOI: 10.1108/IJEBR-05-2016-0159
  • 55. Dasgupta A. Displacement and Exile: The State-Refugee Relations in India. Oxford University Press; 2016
  • 56. Antwi S, Kasim H. Qualitative and quantitative research paradigms in business research: A philosophical reflection. European Journal of Business and Management. 2015; 7 (15):217-225. DOI: 10.7176/EJBM
  • 57. Fayolle A. Computerisation? Technological Forecasting and Social Change. 2017; 114 :254-280. DOI: 10.1016/j.techfore.2016.10.010
  • 58. Eliades F, Doula MK, Papamichael I, Vardopoulos I, Voukkali I, Zorpas AA, et al. Carving out a niche in the sustainability confluence for environmental education centers in Cyprus and Greece. Sustainability. 2022; 14 (14):8368
  • 59. Davey T, Hannon P, Penaluna A. Entrepreneurship education and the role of universities in entrepreneurship: Introduction to the special issue. Industry and Higher Education. 2016; 30 (3):171-182
  • 60. Kopylova N. Technologies for higher education digitalization. In: Bylieva D, Nordmann A, editors. International Conference on Professional Culture of the Specialist of the Future. Cham: Springer International Publishing; 2022. pp. 402-412

© The Author(s). Licensee IntechOpen. This content is distributed under the terms of the Creative Commons 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

IntechOpen Author/Editor? To get your discount, log in .

Discounts available on purchase of multiple copies. View rates

Local taxes (VAT) are calculated in later steps, if applicable.

Support: [email protected]

IMAGES

  1. The 3 Types Of Experimental Design (2024)

    case study design of experiments

  2. Experimental Study Design: Types, Methods, Advantages

    case study design of experiments

  3. What is Design of Experiments?

    case study design of experiments

  4. Overview of the Case Study Experiment

    case study design of experiments

  5. Design of Experiments Case Study

    case study design of experiments

  6. PDF

    case study design of experiments

VIDEO

  1. Yolkk Case Study: Design/ Branding Agency

  2. UI UX Course in Bangalore

  3. Mezzanine Goods Lift

  4. Heineken Case Study

  5. Design of Experiments definition III BSc Stat P5 U5 L1

  6. 5 Stages of the Design Thinking Process #DESIGNTHINKING

COMMENTS

  1. Clinical research study designs: The essentials

    Experimental study design. The basic concept of experimental study design is to study the effect of an intervention. In this study design, the risk factor/exposure of interest/treatment is controlled by the investigator. Therefore, these are hypothesis testing studies and can provide the most convincing demonstration of evidence for causality.

  2. PDF Applications of DOE in Engineering and Science: A Collection of 2 Case

    sub-divided as follows: -BehnkenBox design or BBD (eight case studies), rotatable Central Composite Designs or CCD (10 case studies ), and face- centered designs or FCD (10 case studies) . Rotatable and face-centered designs are in two separate chapters. Combination designs are those that use more than one type of design.

  3. Single-Case Design, Analysis, and Quality Assessment for Intervention

    Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single case studies involve repeated measures, and manipulation of and independent variable. They can be designed to have strong internal validity for assessing causal relationships between interventions and outcomes, and external ...

  4. 4 DOE Case Studies

    He introduces each case by describing the type of experiment and when and why it is useful. Next, he demonstrates how to set up the experiment in the JMP software and how to analyze the resulting data. Presentation material taken from Optimal Design of Experiments: A Case Study Approach by P. Goos and B. Jones.

  5. Experimental Design: Definition and Types

    An experimental design is a detailed plan for collecting and using data to identify causal relationships. Through careful planning, the design of experiments allows your data collection efforts to have a reasonable chance of detecting effects and testing hypotheses that answer your research questions. An experiment is a data collection ...

  6. Industrial Design of Experiments: A Case Study Approach for Design and

    Industrial Design of Experiments: A Case Study Approach for Design and Process Optimization is designed for graduate-level DoE, engineering design, and general statistical courses, as well as professional education and certification classes. Practicing engineers and managers working in multidisciplinary product development will find it to be an ...

  7. Optimal Design of Experiments : A Case Study Approach

    Peter Goos, Department of Mathematics, Statistics and Actuarial Sciences of the Faculty of Applied Economics of the University of Antwerp.His main research topic is the optimal design of experiments. He has published a book as well as several methodological articles on the design and analysis of blocked and split-plot experiments.

  8. Optimal Design of Experiments

    Optimal Design of Experiments A Case Study Approach. P1: OTA/XYZ P2: ABC JWST075-FM JWST075-Goos June 6, 2011 8:54 Printer Name: Yet to Come Optimal Design of Experiments ... 9 Experimental design in the presence of covariates 187 9.1 Key concepts 187 9.2 Case: the polypropylene experiment 188 9.2.1 Problem and design 188

  9. Optimal Design of Experiments: A Case Study Approach

    This is an engaging and informative book on the modern practice of experimental design. The authors writing style is entertaining, the consulting dialogs are extremely enjoyable, and the technical material is presented brilliantly but not overwhelmingly. The book is a joy to read. Everyone who practices or teaches DOE should read this book. - Douglas C. Montgomery, Regents Professor ...

  10. What Is Design of Experiments (DOE)?

    Quality Glossary Definition: Design of experiments. Design of experiments (DOE) is defined as a branch of applied statistics that deals with planning, conducting, analyzing, and interpreting controlled tests to evaluate the factors that control the value of a parameter or group of parameters. DOE is a powerful data collection and analysis tool ...

  11. PDF DESIGNING CASE STUDIES

    Chapter objectives. After reading this chapter you will be able to: Describe the purpose of case studies. Plan a systematic approach to case study design. Recognize the strengths and limitations of case studies as a research method. Compose a case study report that is appropriately structured and presented.

  12. Designing the design of experiments (DOE)

    The design of experiments includes a series of applied statistics tools used to systematically classify and quantify cause-and-effect relations between variables and outputs ... can be selected to obtain the best possible characterization of the process using the fewest possible experimental runs. In the specific case study used in this ...

  13. Implementing Design of Experiments (DOE): A practical example

    Begin your DOE campaign by investigating a broad set of factors in limited detail, as you'll eliminate dead-ends—and produce a smaller, more interesting and influential set. Later experiments can fill in the missing details. 7. Giving your DOE campaign a sanity check. Before you start, look over your DOE campaign.

  14. PDF Principles of Experimental Design

    Designing Experiments Case Studies Starling Song Example 9 / 31 Dairy Cattle Diet Example Case Study In a study of dairy cow nutrition, researchers have access to 20 dairy cows in a research herd. Researchers are interested in comparing a standard diet with three other diets, each with varying amounts of alfalfa and corn.

  15. PDF DESIGN OF EXPERIMENTS (DOE) FUNDAMENTALS

    S (DOE)FUNDAMENTALSLearning ObjectivesHave a broad understanding of the role that design of experiments (DOE) plays in the success. l completion of an improvement project.Understand. w to construct a design of experiments.Understan. how to analyze a design of experiments.Understand how to interpre. the results of a.

  16. (PDF) Optimal Design of Experiments: A Case Study Approach by Peter

    Using optimal experimental design, as we did in the extraction experiment case study, is such an approach: it allows the researcher to create a follow-up experiment that will give the most precise estimates for any set of main effects, interaction effects, and/or polynomial effects (such as quadratic effects; see Chapter 4).

  17. PDF Statistical Design of Experiments: An introductory case study for

    Central Composite Design, Box-Behnken Design and D-optimal Design. These different RSM designs are compared in the context of a case study from the field of polymer composites. The results demonstrate that an exact D-optimal design is generally considered to be a good design when compared to the global D-optimal design.

  18. Single-Case Experimental Designs: A Systematic Review of Published

    The single-case experiment has a storied history in psychology dating back to the field's founders: Fechner (1889), Watson (1925), and Skinner (1938).It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see ...

  19. PDF Case Studies of Use of Design of Experiments in Material Research

    3. CASE STUDIES OF USE OF DESIGN OF EXPERIMENT 3.1 Case Study 1 Factorial experiments were done to study the effects of four factors (anode type, carbon content of steel, temperature, and agitation) and all the interactions among these four factors for each factor at two levels (Zn/Al for anode type, 0.06%/0.43% for carbon

  20. The Family of Single-Case Experimental Designs

    Abstract. Single-case experimental designs (SCEDs) represent a family of research designs that use experimental methods to study the effects of treatments on outcomes. The fundamental unit of analysis is the single case—which can be an individual, clinic, or community—ideally with replications of effects within and/or between cases.

  21. Descriptive Research and Case Studies

    Case studies are generally a single-case design, but can also be a multiple-case design, where replication instead of sampling is the criterion for inclusion. ... The major disadvantage is that case studies cannot be used to determine causation, as is the case in experimental research, where the factors or variables hypothesized to play a ...

  22. Design of Experiments: A Case Study

    Most DOE experiments have just two levels - low and high for a continuous input. Each experimental run consists of a combination of input settings at these low and high values. Imagine a process where the typical range for an input, say pressure, is 100 to 200 psi. For a Designed Experiment the two levels chosen might be 80 (low) and 220 ...

  23. A comprehensive study on active mixer performance using liquid metal

    Given the experimental nature of our research, we employ a central composite design method within the framework of design of experiments (DoE). From our obtained experimental results, we establish correlations between MI and the other variables. Our examination of the four mentioned parameters revealed that liquid metal droplet diameter exerts ...

  24. Understanding Experimental Design Lab Worksheet

    b. control group that does not experience the effect of the treatment so that you can compare the experimental group to the control group to determine the effect of the treatment. 2. Potentially confounding variables must be kept constant between groups a. For example, if you were testing the effect of the Mediterranean diet, you would need to make sure both groups received the same amount of ...

  25. Innovative design and experimental research on the world's first pilot

    Innovative design and experimental research on the world's first pilot plant for continuous supercritical hydrothermal synthesis nano copper. Author links open overlay panel Hui Liu a, ... In that case, the particle size distribution of the product is relatively wide, and the morphology of the product is also uneven. ...

  26. Experimental study on dynamic elastic modulus loss of concrete broken

    Due to the orthogonal design used in this experiment, there are only three influencing factors, namely applied voltage, number of pulses, and electrode spacing, and each combination is only ...

  27. Case study

    Case study - Lachlan Valley health workforce strategy. The Lachlan Valley trial aimed to test and evaluate unique models of primary care to meet community health needs in 3 rural NSW towns. No model of care was identified and trialled. However, by using a co-design process the communities identified solutions to improve access to primary ...

  28. Longitudinal analysis of teacher self-efficacy evolution ...

    Despite the widespread advocacy for the integration of arts and humanities (A&H) into science, technology, engineering, and mathematics (STEM) education on an international scale, teachers face ...

  29. Dust arrestment in subways: analysis and technique design

    The research is devoted to solve the problem of elevated dust levels in subway air through the implementation of a proposed dust collection system. A comprehensive experiment to determine the fractional and chemical compositions, as well as dust density, in the existing metro systems of Almaty (Kazakhstan) and Novosibirsk (Russian Federation) was conducted. The experiment results led to ...

  30. Digital Transformation in Entrepreneurship Education: A Case Study of

    By employing a quasi-experimental design, the research compares the outcomes of entrepreneurial workshops utilizing KABADA against traditional methods, highlighting the tool's efficacy in fostering entrepreneurial knowledge and intentions. ... A Case Study of KABADA at the University of Monastir. Written By. Fitouri Mohamed, Samia Karoui ...