P-Value And Statistical Significance: What It Is & Why It Matters

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P Value Calculator From T Score
  • P-Value Calculator For Chi-Square
  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

Educational resources and simple solutions for your research journey

What is p-value: How to calculate it and statistical significance

What is p-value: How to Calculate It and Statistical Significance

What is p-value: How to calculate it and statistical significance

“What is a p-value?” are words often uttered by early career researchers and sometimes even by more experienced ones. The p-value is an important and frequently used concept in quantitative research. It can also be confusing and easily misused. In this article, we delve into what is a p-value, how to calculate it, and its statistical significance.

Table of Contents

What is a p-value

The p-value, or probability value, is the probability that your results occurred randomly given that the null hypothesis is true. P-values are used in hypothesis testing to find evidence that differences in values or groups exist. P-values are determined through the calculation of the test statistic for the test you are using and are based on the assumed or known probability distribution.

For example, you are researching a new pain medicine that is designed to last longer than the current commonly prescribed drug. Please note that this is an extremely simplified example, intended only to demonstrate the concepts. From previous research, you know that the underlying probability distribution for both medicines is the normal distribution, which is shown in the figure below.

What is p-value: How to calculate it and statistical significance

You are planning a clinical trial for your drug. If your results show that the average length of time patients are pain-free is longer for the new drug than that for the standard medicine, how will you know that this is not just a random outcome? If this result falls within the green shaded area of the graph, you may have evidence that your drug has a longer effect. But how can we determine this scientifically? We do this through hypothesis testing.

What is a null hypothesis

Stating your null and alternative hypotheses is the first step in conducting a hypothesis test. The null hypothesis (H 0 ) is what you’re trying to disprove, usually a statement that there is no relationship between two variables or no difference between two groups. The alternative hypothesis (H a ) states that a relationship exists or that there is a difference between two groups. It represents what you’re trying to find evidence to support.

Before we conduct the clinical trial, we create the following hypotheses:

H 0 : the mean longevity of the new drug is equal to that of the standard drug

H a : the mean longevity of the new drug is greater than that of the standard drug

Note that the null hypothesis states that there is no difference in the mean values for the two drugs. Because H a includes “greater than,” this is an upper-tailed test. We are not interested in the area under the lower side of the curve.

Next, we need to determine our criterion for deciding whether or not the null hypothesis can be rejected. This is where the critical p-value comes in. If we assume the null hypothesis is true, how much longer does the new drug have to last?

what is p value in research methodology

Let’s say your results show that the new drug lasts twice as long as the standard drug. In theory, this could still be a random outcome, due to chance, even if the null hypothesis were true. However, at some point, you must consider that the new drug may just have a better longevity. The researcher will typically set that point, which is the probability of rejecting the null hypothesis given that it is true, prior to conducting the trial. This is the critical p-value. Typically, this value is set at p = .05, although, depending on the circumstances, it could be set at another value, such as .10 or .01.

Another way to consider the null hypothesis that might make the concept clearer is to compare it to the adage “innocent until proven guilty.” It is assumed that the null hypothesis is true unless enough strong evidence can be found to disprove it. Statistically significant p-value results can provide some of that evidence, which makes it important to know how to calculate p-values.

How to calculate p-values

The p-value that is determined from your results is based on the test statistic, which depends on the type of hypothesis test you are using. That is because the p-value is actually a probability, and its value, and calculation method, depends on the underlying probability distribution. The p-value also depends in part on whether you are conducting a lower-tailed test, upper-tailed test, or two-tailed test.

The actual p-value is calculated by integrating the probability distribution function to find the relevant areas under the curve using integral calculus. This process can be quite complicated. Fortunately, p-values are usually determined by using tables, which use the test statistic and degrees of freedom, or statistical software, such as SPSS, SAS, or R.

For example, with the simplified clinical test we are performing, we assumed the underlying probability distribution is normal; therefore, we decide to conduct a t-test to test the null hypothesis. The resulting t-test statistic will indicate where along the x-axis, under the normal curve, our result is located. The p-value will then be, in our case, the area under the curve to the right of the test statistic.

Many factors affect the hypothesis test you use and therefore the test statistic. Always make sure to use the test that best fits your data and the relationship you’re testing. The sample size and number of independent variables you use will also impact the p-value.

P-Value and statistical significance

You have completed your clinical trial and have determined the p-value. What’s next? How can the result be interpreted? What does a statistically significant result mean?

A statistically significant result means that the p-value you obtained is small enough that the result is not likely to have occurred by chance. P-values are reported in the range of 0–1, and the smaller the p-value, the less likely it is that the null hypothesis is true and the greater the indication that it can be rejected. The critical p-value, or the point at which a result can be considered to be statistically significant, is set prior to the experiment.

In our simplified clinical trial example, we set the critical p-value at 0.05. If the p-value obtained from the trial was found to be p = .0375, we can say that the results were statistically significant, and we have evidence for rejecting the null hypothesis. However, this does not mean that we can be absolutely certain that the null hypothesis is false. The results of the test only indicate that the null hypothesis is likely false.  

what is p value in research methodology

P-value table

So, how can we interpret the p-value results of an experiment or trial? A p-value table, prepared prior to the experiment, can sometimes be helpful. This table lists possible p-values and their interpretations.

P-value range Interpretation
> 0.05 Results are not statistically significant; do not reject the null hypothesis
< 0.05 Results are statistically significant; in general, reject the null hypothesis
0.01 Results are highly statistically significant; reject the null hypothesis

How to report p-values in research

P-values, like all experimental outcomes, are usually reported in the results section, and sometimes in the abstract, of a research paper. Enough information also needs to be provided so that the readers can place the p-values into context. For our example, the test statistic and effect size should also be included in the results.

To enable readers to clearly understand your results, the significance threshold you used, the critical p-value should be reported in the methods section of your paper. For our example, we might state that “In this study, the statistical threshold was set at p = .05.” The sample sizes and assumptions should also be discussed there as they will greatly impact the p-value.

How one can use p-value to compare two different results of a hypothesis test?

What if we conduct two experiments using the same null and alternative hypotheses? Or what if we conduct the same clinical trial twice with different drugs? Can we use the resulting p-values to compare them?

In general, it is not a good idea to compare results using only p-values. A p-value only reflects the probability that those specific results occurred by chance; it is not related at all to any other results and does not indicate degree. So, just because you obtained a p-value of .04 in with one drug and a value of .025 in with a second drug does not necessarily mean that the second drug is better.

Using p-values to compare two different results may be more feasible if the experiments are exactly the same and all other conditions are controlled except for the one being studied. However, so many different factors impact the p-value that it would be difficult to control them all.

Why just using p-values is not enough while interpreting two different variables

P-values can indicate whether or not the null hypothesis should be rejected; however, p-values alone are not enough to show the relative size differences between groups. Therefore, both the statistical significance and the effect size should be reported when discussing the results of a study.

For example, suppose the sample size in our clinical trials was very large, maybe 1,000, and we found the p-value to be .035. The difference between the two drugs is statistically significant because the p-value was less than .05. However, if we looked at the difference in the actual times the drugs were effective, we might find that the new drug lasted only 2 minutes longer than the standard drug. Large sample sizes generally show even very small differences to be significant. We would need this information to make any recommendations based on the results of the trial.

Statistical significance, or p-values, are dependent on both sample size and effect size. Therefore, they all need to be reported for readers to clearly understand the results.

Things to consider while using p-values

P-values are very useful tools for researchers. However, much care must be taken to avoid treating them as black and white indicators of a study’s results or misusing them. Here are a few other things to consider when using p-values:

  • When using p-values in your research report, it’s a good idea to pay attention to your target journal’s guidelines on formatting. Typically, p-values are written without a leading zero. For example, write p = .01 instead of p = 0.01. Also, p-values, like all other variables, are usually italicized, and spaces are included on both sides of the equal sign.
  • The significance threshold needs to be set prior to the experiment being conducted. Setting the significance level after looking at the data to ensure a positive result is considered unethical.
  • P-values have nothing to say about the alternative hypothesis. If your results indicate that the null hypothesis should be rejected, it does not mean that you accept the alternative hypothesis.
  • P-values never prove anything. All they can do is provide evidence to support rejecting or not rejecting the null hypothesis. Statistics are extremely non-committal.
  • “Nonsignificant” is the opposite of significant. Never report that the results were “insignificant.”

Frequently Asked Questions (FAQs) on p-value  

Q: What influences p-value?   

The primary factors that affect p-value in statistics include the size of the observed effect, sample size, variability within the data, and the chosen significance level (alpha). A larger effect size, a larger sample size, lower variability, and a lower significance level can all contribute to a lower p-value, indicating stronger evidence against the null hypothesis.  

Q: What does p-value of 0.05 mean?   

A p-value of 0.05 is a commonly used threshold in statistical hypothesis testing. It represents the level of significance, typically denoted as alpha, which is the probability of rejecting the null hypothesis when it is true. If the p-value is less than or equal to 0.05, it suggests that the observed results are statistically significant at the 5% level, meaning they are unlikely to occur by chance alone.  

Q: What is the p-value significance of 0.15?  

The significance of a p-value depends on the chosen threshold, typically called the significance level or alpha. If the significance level is set at 0.05, a p-value of 0.15 would not be considered statistically significant. In this case, there is insufficient evidence to reject the null hypothesis. However, it is important to note that significance levels can vary depending on the specific field or study design.  

Q: Which p-value to use in T-Test?   

When performing a T-Test, the p-value obtained indicates the probability of observing the data if the null hypothesis is true. The appropriate p-value to use in a T-Test is based on the chosen significance level (alpha). Generally, a p-value less than or equal to the alpha indicates statistical significance, supporting the rejection of the null hypothesis in favour of the alternative hypothesis.  

Q: Are p-values affected by sample size?   

Yes, sample size can influence p-values. Larger sample sizes tend to yield more precise estimates and narrower confidence intervals. This increased precision can affect the p-value calculations, making it easier to detect smaller effects or subtle differences between groups or variables. This can potentially lead to smaller p-values, indicating statistical significance. However, it’s important to note that sample size alone is not the sole determinant of statistical significance. Consider it along with other factors, such as effect size, variability, and chosen significance level (alpha), when determining the p-value.  

Editage All Access is a subscription-based platform that unifies the best AI tools and services designed to speed up, simplify, and streamline every step of a researcher’s journey. The Editage All Access Pack is a one-of-a-kind subscription that unlocks full access to an AI writing assistant, literature recommender, journal finder, scientific illustration tool, and exclusive discounts on professional publication services from Editage.  

Based on 22+ years of experience in academia, Editage All Access empowers researchers to put their best research forward and move closer to success. Explore our top AI Tools pack, AI Tools + Publication Services pack, or Build Your Own Plan. Find everything a researcher needs to succeed, all in one place –  Get All Access now starting at just $14 a month !    

Related Posts

research funding sources

What are the Best Research Funding Sources

inductive research

Inductive vs. Deductive Research Approach

The p value – definition and interpretation of p-values in statistics

This article examines the most common statistic reported in scientific papers and used in applied statistical analyses – the p -value . The article goes through the definition illustrated with examples, discusses its utility, interpretation, and common misinterpretations of observed statistical significance and significance levels. It is structured as follows:

What does ‘ p ‘ in ‘ p -value’ stand for?

What does p measure and how to interpret it.

  • A p-value only makes sense under a specified null hypothesis

How to calculate a p -value?

A practical example, p -values as convenient summary statistics.

  • Quantifying the relative uncertainty of data

Easy comparison of different statistical tests

  • p -value interpretation in outcomes of experiments (randomized controlled trials)
  • p -value interpretation in regressions and correlations of observational data

Mistaking statistical significance with practical significance

Treating the significance level as likelihood for the observed effect, treating p -values as likelihoods attached to hypotheses, a high p -value means the null hypothesis is true, lack of statistical significance suggests a small effect size, p -value definition and meaning.

The technical definition of the p -value is (based on [4,5,6]):

A p -value is the probability of the data-generating mechanism corresponding to a specified null hypothesis to produce an outcome as extreme or more extreme than the one observed.

However, it is only straightforward to understand for those already familiar in detail with terms such as ‘probability’, ‘null hypothesis’, ‘data generating mechanism’, ‘extreme outcome’. These, in turn, require knowledge of what a ‘hypothesis’, a ‘statistical model’ and ‘statistic’ mean, and so on. While some of these will be explained on a cursory level in the following paragraphs, those looking for deeper understanding should consider consulting the following glossary definitions: statistical model , hypothesis , null hypothesis , statistic .

A slightly less technical and therefore more accessible definition is:

A p -value quantifies how likely it is to erroneously reject a specific statistical hypothesis, were it true, based on a given set of data.

Let us break these down and examine several examples to make both of these definitions make sense.

p stands for p robability where probability means the frequency with which an event occurs under certain assumptions. The most common example is the frequency with which a coin lands heads under the assumption that it is equally balanced (a fair coin toss ). That frequency is 0.5 (50%).

Capital ‘P’ stands for probability in general, whereas lowercase ‘ p ‘ refers to the probability of a particular data realization. To expand on the coin toss example: P would stand for the probability of heads in general, whereas p could refer to the probability of landing a series of five heads in a row, or the probability of landing less than or equal to 38 heads out of 100 coin flips.

Given that it was established that p stands for probability, it is easy to figure out it measures a sort of probability.

In everyday language the term ‘probability’ might be used as synonymous to ‘chance’, ‘likelihood’, ‘odds’, e.g. there is 90% probability that it will rain tomorrow. However, in statistics one cannot speak of ‘probability’ without specifying a mechanism which generates the observed data. A simple example of such a mechanism is a device which produces fair coin tosses. A statistical model based on this data-generating mechanism can be put forth and under that model the probability of 38 or less heads out of 100 tosses can be estimated to be 1.05%, for example by using a binomial calculator . The p -value against the model of a fair coin would be ~0.01 (rounding it to 0.01 from hereon for the purposes of the article).

The way to interpret that p -value is: observing 38 heads or less out of the 100 tosses could have happened in only 1% of infinitely many series of 100 fair coin tosses. The null hypothesis in this case is defined as the coin being fair, therefore having a 50% chance for heads and 50% chance for tails on each toss.

Assuming the null hypothesis is true allows the comparison of the observed data to what would have been expected under the null. It turns out the particular observation of 38/100 heads is a rather improbable and thus surprising outcome under the assumption of the null hypothesis. This is measured by the low p -value which also accounts for more extreme outcomes such as 37/100, 36/100, and so on all the way to 0/100.

If one had a predefined level of statistical significance at 0.05 then one would claim that the outcome is statistically significant since it’s p -value of 0.01 meets the 0.05 significance level (0.01 ≤ 0.05). A visual representation of the relationship between p -values, significance level ( p -value threshold), and statistical significance of an outcome is illustrated visually in this graph:

P-value and significance level explained

In fact, had the significance threshold been at any value above 0.01, the outcome would have been statistically significant, therefore it is usually said that with a p -value of 0.01, the outcome is statistically significant at any level above 0.01 .

Continuing with the interpretation: were one to reject the null hypothesis based on this p -value of 0.01, they would be acting as if a significance level of 0.01 or lower provides sufficient evidence against the hypothesis of the coin being fair. One could interpret this as a rule for a long-run series of experiments and inferences . In such a series, by using this p -value threshold one would incorrectly reject the fair coin hypothesis in at most 1 out of 100 cases, regardless of whether the coin is actually fair in any one of them. An incorrect rejection of the null is often called a type I error as opposed to a type II error which is to incorrectly fail to reject a null.

A more intuitive interpretation proceeds without reference to hypothetical long-runs. This second interpretation comes in the form of a strong argument from coincidence :

  • there was a low probability (0.01 or 1%) that something would have happened assuming the null was true
  • it did happen so it has to be an unusual (to the extent that the p -value is low) coincidence that it happened
  • this warrants the conclusion to reject the null hypothesis

( source ). It stems from the concept of severe testing as developed by Prof. Deborah Mayo in her various works [1,2,3,4,5] and reflects an error-probabilistic approach to inference.

A p -value only makes sense under a specified null hypothesis

It is important to understand why a specified ‘null hypothesis’ should always accompany any reported p -value and why p-values are crucial in so-called Null Hypothesis Statistical Tests (NHST) . Statistical significance only makes sense when referring to a particular statistical model which in turn corresponds to a given null hypothesis. A p -value calculation has a statistical model and a statistical null hypothesis defined within it as prerequisites, and a statistical null is only interesting because of some tightly related substantive null such as ‘this treatment improves outcomes’. The relationship is shown in the chart below:

The relationship between a substantive hypothesis to a statistical model, significance threshold and p-value

In the coin example, the substantive null that is interesting to (potentially) reject is the claim that the coin is fair. It translates to a statistical null hypothesis (model) with the following key properties:

  • heads having 50% chance and tails having 50% chance, on each toss
  • independence of each toss from any other toss. The outcome of any given coin toss does not depend on past or future coin tosses.
  • homogeneity of the coin behavior over time (the true chance does not change across infinitely many tosses)
  • a binomial error distribution

The resulting p -value of 0.01 from the coin toss experiment should be interpreted as the probability only under these particular assumptions.

What happens, however, if someone is interested in rejecting the claim that the coin is somewhat biased against heads? To be precise: the claim that it has a true frequency of heads of 40% or less (hence 60% for tails) is the one they are looking to deny with a certain evidential threshold.

The p -value needs to be recalculated under their null hypothesis so now the same 38 heads out of 100 tosses result in a p -value of ~0.38 ( calculation ). If they were interested in rejecting such a null hypothesis, then this data provide poor evidence against it since a 38/100 outcome would not be unusual at all if it were in fact true (p ≤ 0.38 would occur with probability 38%).

Similarly, the p -value needs to be recalculated for a claim of bias in the other direction, say that the coin produces heads with a frequency of 60% or more. The probability of observing 38 or fewer out of 100 under this null hypothesis is so extremely small ( p -value ~= 0.000007364 or 7.364 x 10 -6 in standard form , calculation ) that maintaining a claim for 60/40 bias in favor of heads becomes near-impossible for most practical purposes.

A p -value can be calculated for any frequentist statistical test. Common types of statistical tests include tests for:

  • absolute difference in proportions;
  • absolute difference in means;
  • relative difference in means or proportions;
  • goodness-of-fit;
  • homogeneity
  • independence
  • analysis of variance (ANOVA)

and others. Different statistics would be computed depending on the error distribution of the parameter of interest in each case, e.g. a t value, z value, chi-square (Χ 2 ) value, f -value, and so on.

p -values can then be calculated based on the cumulative distribution functions (CDFs) of these statistics whereas pre-test significance thresholds (critical values) can be computed based on the inverses of these functions. You can try these by plugging different inputs in our critical value calculator , and also by consulting its documentation.

In its generic form, a p -value formula can be written down as:

p = P(d(X) ≥ d(x 0 ); H 0 )

where P stands for probability, d(X) is a test statistic (distance function) of a random variable X , x 0 is a typical realization of X and H 0 is the selected null hypothesis. The semi-colon means ‘assuming’. The distance function is the aforementioned cumulative distribution function for the relevant error distribution. In its generic form a distance function equation can be written as:

Standard score distance function

X -bar is the arithmetic mean of the observed values, μ 0 is a hypothetical or expected mean to which X is compared, and n is the sample size. The result of a distance function will often be expressed in a standardized form – the number of standard deviations between the observed value and the expected value.

The p -value calculation is different in each case and so a different formula will be applied depending on circumstances. You can see examples in the p -values reported in our statistical calculators, such as the statistical significance calculator for difference of means or proportions , the Chi-square calculator , the risk ratio calculator , odds ratio calculator , hazard ratio calculator , and the normality calculator .

A very fresh (as of late 2020) example of the application of p -values in scientific hypothesis testing can be found in the recently concluded COVID-19 clinical trials. Multiple vaccines for the virus which spread from China in late 2019 and early 2020 have been tested on tens of thousands of volunteers split randomly into two groups – one gets the vaccine and the other gets a placebo. This is called a randomized controlled trial (RCT). The main parameter of interest is the difference between the rates of infections in the two groups. An appropriate test is the t-test for difference of proportions, but the same data can be examined in terms of risk ratios or odds ratio.

The null hypothesis in many of these medical trials is that the vaccine is at least 30% efficient. A statistical model can be built about the expected difference in proportions if the vaccine’s efficiency is 30% or less, and then the actual observed data from a medical trial can be compared to that null hypothesis. Most trials set their significance level at the minimum required by the regulatory bodies (FDA, EMA, etc.), which is usually set at 0.05 . So, if the p -value from a vaccine trial is calculated to be below 0.05, the outcome would be statistically significant and the null hypothesis of the vaccine being less than or equal to 30% efficient would be rejected.

Let us say a vaccine trial results in a p -value of 0.0001 against that null hypothesis. As this is highly unlikely under the assumption of the null hypothesis being true, it provides very strong evidence against the hypothesis that the tested treatment has less than 30% efficiency.

However, many regulators stated that they require at least 50% proven efficiency. They posit a different null hypothesis and so the p -value presented before these bodies needs to be calculated against it. This p -value would be somewhat increased since 50% is a higher null value than 30%, but given that the observed effects of the first vaccines to finalize their trials are around 95% with 95% confidence interval bounds hovering around 90%, the p -value against a null hypothesis stating that the vaccine’s efficiency is 50% or less is likely to still be highly statistically significant, say at 0.001 . Such an outcome is to be interpreted as follows: had the efficiency been 50% or below, such an extreme outcome would have most likely not been observed, therefore one can proceed to reject the claim that the vaccine has efficiency of 50% or less with a significance level of 0.001 .

While this example is fictitious in that it doesn’t reference any particular experiment, it should serve as a good illustration of how null hypothesis statistical testing (NHST) operates based on p -values and significance thresholds.

The utility of p -values and statistical significance

It is not often appreciated how much utility p-values bring to the practice of performing statistical tests for scientific and business purposes.

Quantifying relative uncertainty of data

First and foremost, p -values are a convenient expression of the uncertainty in the data with respect to a given claim. They quantify how unexpected a given observation is, assuming some claim which is put to the test is true. If the p-value is low the probability that it would have been observed under the null hypothesis is low. This means the uncertainty the data introduce is high. Therefore, anyone defending the substantive claim which corresponds to the statistical null hypothesis would be pressed to concede that their position is untenable in the face of such data.

If the p-value is high, then the uncertainty with regard to the null hypothesis is low and we are not in a position to reject it, hence the corresponding claim can still be maintained.

As evident by the generic p -value formula and the equation for a distance function which is a part of it, a p -value incorporates information about:

  • the observed effect size relative to the null effect size
  • the sample size of the test
  • the variance and error distribution of the statistic of interest

It would be much more complicated to communicate the outcomes of a statistical test if one had to communicate all three pieces of information. Instead, by way of a single value on the scale of 0 to 1 one can communicate how surprising an outcome is. This value is affected by any change in any of these variables.

This quality stems from the fact that assuming that a p -value from one statistical test can easily and directly be compared to another. The minimal assumptions behind significance tests mean that given that all of them are met, the strength of the statistical evidence offered by data relative to a null hypothesis of interest is the same in two tests if they have approximately equal p -values.

This is especially useful in conducting meta-analyses of various sorts, or for combining evidence from multiple tests.

p -value interpretation in outcomes of experiments

When a p -value is calculated for the outcome of a randomized controlled experiment, it is used to assess the strength of evidence against a null hypothesis of interest, such as that a given intervention does not have a positive effect. If H 0 : μ 0 ≤ 0% and the observed effect is μ 1 = 30% and the calculated p -value is 0.025, this can be used to reject the claim H 0 : μ 0 ≤ 0% at any significance level ≥ 0.025. This, in turn, allows us to claim that H 1 , a complementary hypothesis called the ‘alternative hypothesis’, is in fact true. In this case since H 0 : μ 0 ≤ 0% then H 1 : μ 1 > 0% in order to exhaust the parameter space, as illustrated below:

Composite null versus composite alternative hypothesis in NHST

A claim as the above corresponds to what is called a one-sided null hypothesis . There could be a point null as well, for example the claim that an intervention has no effect whatsoever translates to H 0 : μ 0 = 0%. In such a case the corresponding p -value refers to that point null and hence should be interpreted as rejecting the claim of the effect being exactly zero. For those interested in the differences between point null hypotheses and one-sided hypotheses the articles on onesided.org should be an interesting read. TLDR: most of the time you’d want to reject a directional claim and hence a one-tailed p -value should be reported [8] .

These finer points aside, after observing a low enough p -value, one can claim the rejection of the null and hence the adoption of the complementary alternative hypothesis as true. The alternative hypothesis is simply a negation of the null and is therefore a composite claim such as ‘there is a positive effect’ or ‘there is some non-zero effect’. Note that any inference about a particular effect size within the alternative space has not been tested and hence claiming it has probability equal to p calculated against a zero effect null hypothesis (a.k.a. the nil hypothesis) does not make sense.

p – value interpretation in regressions and correlations of observational data

When performing statistical analyses of observational data p -values are often calculated for regressors in addition to regression coefficients and for the correlation in addition to correlation coefficients. A p -value falling below a specific statistical significance threshold measures how surprising the observed correlation or regression coefficient would be if the variable of interest is in fact orthogonal to the outcome variable. That is – how likely would it be to observe the apparent relationship, if there was no actual relationship between the variable and the outcome variable.

Our correlation calculator outputs both p -values and confidence intervals for the calculated coefficients and is an easy way to explore the concept in the case of correlations. Extrapolating to regressions is then straightforward.

Misinterpretations of statistically significant p -values

There are several common misinterpretations [7] of p -values and statistical significance and no calculator can save one from falling for them. The following errors are often committed when a result is seen as statistically significant.

A result may be highly statistically significant (e.g. p -value 0.0001) but it might still have no practical consequences due to a trivial effect size. This often happens with overpowered designs, but it can also happen in a properly designed statistical test. This error can be avoided by always reporting the effect size and confidence intervals around it.

Observing a highly significant result, say p -value 0.01 does not mean that the likelihood that the observed difference is the true difference. In fact, that likelihood is much, much smaller. Remember that statistical significance has a strict meaning in the NHST framework.

For example, if the observed effect size μ 1 from an intervention is 20% improvement in some outcome and a p -value against the null hypothesis of μ 0 ≤ 0% has been calculated to be 0.01, it does not mean that one can reject μ 0 ≤ 20% with a p -value of 0.01. In fact, the p -value against μ 0 ≤ 20% would be 0.5, which is not statistically significant by any measure.

To make claims about a particular effect size it is recommended to use confidence intervals or severity, or both.

For example, stating that a p -value of 0.02 means that there is 98% probability that the alternative hypothesis is true or that there is 2% probability that the null hypothesis is true . This is a logical error.

By design, even if the null hypothesis is true, p -values equal to or lower than 0.02 would be observed exactly 2% of the time, so one cannot use the fact that a low p -value has been observed to argue there is only 2% probability that the null hypothesis is true. Frequentist and error-statistical methods do not allow one to attach probabilities to hypotheses or claims, only to events [4] . Doing so requires an exhaustive list of hypotheses and prior probabilities attached to them which goes firmly into decision-making territory. Put in Bayesian terms, the p -value is not a posterior probability.

Misinterpretations of statistically non-significant outcomes

Statistically non-significant p-values – that is, p is greater than the specified significance threshold α (alpha), can lead to a different set of misinterpretations. Due to the ubiquitous use of p -values, these are committed often as well.

Treating a high p -value / low significance level as evidence, by itself, that the null hypothesis is true is a common mistake. For example, after observing p = 0.2 one may claim this is evidence that there is no effect, e.g. no difference between two means, is a common mistake.

However, it is trivial to demonstrate why it is wrong to interpret a high p -value as providing support for the null hypothesis. Take a simple experiment in which one measures only 2 (two) people or objects in the control and treatment groups. The p -value for this test of significance will surely not be statistically significant. Does that mean that the intervention is ineffective? Of course not, since that claim has not been tested severely enough. Using a statistic such as severity can completely eliminate this error [4,5] .

A more detailed response would say that failure to observe a statistically significant result, given that the test has enough statistical power, can be used to argue for accepting the null hypothesis to the extent warranted by the power and with reference to the minimum detectable effect for which it was calculated. For example, if the statistical test had 99% power to detect an effect of size μ 1 at level α and it failed, then it could be argued that it is quite unlikely that there exists and effect of size μ 1 or greater as in that case one would have most likely observed a significant p -value.

This is a softer version of the above mistake wherein instead of claiming support for the null hypothesis, a low p -value is taken, by itself, as indicating that the effect size must be small.

This is a mistake since the test might have simply lacked power to exclude many effects of meaningful size. Examining confidence intervals and performing severity calculations against particular hypothesized effect sizes would be a way to avoid this issue.

References:

[1] Mayo, D.G. 1983. “An Objective Theory of Statistical Testing.” Synthese 57 (3): 297–340. DOI:10.1007/BF01064701. [2] Mayo, D.G. 1996 “Error and the Growth of Experimental Knowledge.” Chicago, Illinois: University of Chicago Press. DOI:10.1080/106351599260247. [3] Mayo, D.G., and A. Spanos. 2006. “Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction.” The British Journal for the Philosophy of Science 57 (2): 323–357. DOI:10.1093/bjps/axl003. [4] Mayo, D.G., and A. Spanos. 2011. “Error Statistics.” Vol. 7, in Handbook of Philosophy of Science Volume 7 – Philosophy of Statistics , by D.G., Spanos, A. et al. Mayo, 1-46. Elsevier. [5] Mayo, D.G. 2018 “Statistical Inference as Severe Testing.” Cambridge: Cambridge University Press. ISBN: 978-1107664647 [6] Georgiev, G.Z. (2019) “Statistical Methods in Online A/B Testing”, ISBN: 978-1694079725 [7] Greenland, S. et al. (2016) “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations”, European Journal of Epidemiology 31:337–350; DOI:10.1007/s10654-016-0149-3 [8] Georgiev, G.Z. (2018) “Directional claims require directional (statistical) hypotheses” [online, accessed on Dec 07, 2020, at https://www.onesided.org/articles/directional-claims-require-directional-hypotheses.php]

what is p value in research methodology

An applied statistician, data analyst, and optimizer by calling, Georgi has expertise in web analytics, statistics, design of experiments, and business risk management. He covers a variety of topics where mathematical models and statistics are useful. Georgi is also the author of “Statistical Methods in Online A/B Testing”.

Recent Articles

  • Mastering Formulas in Baking: Baker’s Math, Kitchen Conversions, + More
  • Margin vs. Markup: Decoding Profitability in Simple Terms
  • How Much Do I Have to Earn Per Hour to Afford the Cost of Living?
  • How to Calculate for VAT When Traveling Abroad
  • Mathematics in the Kitchen
  • Search GIGA Articles
  • Cybersecurity
  • Home & Garden
  • Mathematics
  • Search Search Please fill out this field.

What Is P-Value?

Understanding p-value.

  • P-Value in Hypothesis Testing

The Bottom Line

  • Corporate Finance
  • Financial Analysis

P-Value: What It Is, How to Calculate It, and Why It Matters

what is p value in research methodology

Yarilet Perez is an experienced multimedia journalist and fact-checker with a Master of Science in Journalism. She has worked in multiple cities covering breaking news, politics, education, and more. Her expertise is in personal finance and investing, and real estate.

what is p value in research methodology

In statistics, a p-value is defined as In statistics, a p-value indicates the likelihood of obtaining a value equal to or greater than the observed result if the null hypothesis is true.

The p-value serves as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means stronger evidence in favor of the alternative hypothesis.

P-value is often used to promote credibility for studies or reports by government agencies. For example, the U.S. Census Bureau stipulates that any analysis with a p-value greater than 0.10 must be accompanied by a statement that the difference is not statistically different from zero. The Census Bureau also has standards in place stipulating which p-values are acceptable for various publications.

Key Takeaways

  • A p-value is a statistical measurement used to validate a hypothesis against observed data.
  • A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true.
  • The lower the p-value, the greater the statistical significance of the observed difference.
  • A p-value of 0.05 or lower is generally considered statistically significant.
  • P-value can serve as an alternative to—or in addition to—preselected confidence levels for hypothesis testing.

Jessica Olah / Investopedia

P-values are usually calculated using statistical software or p-value tables based on the assumed or known probability distribution of the specific statistic tested. While the sample size influences the reliability of the observed data, the p-value approach to hypothesis testing specifically involves calculating the p-value based on the deviation between the observed value and a chosen reference value, given the probability distribution of the statistic. A greater difference between the two values corresponds to a lower p-value.

Mathematically, the p-value is calculated using integral calculus from the area under the probability distribution curve for all values of statistics that are at least as far from the reference value as the observed value is, relative to the total area under the probability distribution curve. Standard deviations, which quantify the dispersion of data points from the mean, are instrumental in this calculation.

The calculation for a p-value varies based on the type of test performed. The three test types describe the location on the probability distribution curve: lower-tailed test, upper-tailed test, or two-tailed test . In each case, the degrees of freedom play a crucial role in determining the shape of the distribution and thus, the calculation of the p-value.

In a nutshell, the greater the difference between two observed values, the less likely it is that the difference is due to simple random chance, and this is reflected by a lower p-value.

The P-Value Approach to Hypothesis Testing

The p-value approach to hypothesis testing uses the calculated probability to determine whether there is evidence to reject the null hypothesis. This determination relies heavily on the test statistic, which summarizes the information from the sample relevant to the hypothesis being tested. The null hypothesis, also known as the conjecture, is the initial claim about a population (or data-generating process). The alternative hypothesis states whether the population parameter differs from the value of the population parameter stated in the conjecture.

In practice, the significance level is stated in advance to determine how small the p-value must be to reject the null hypothesis. Because different researchers use different levels of significance when examining a question, a reader may sometimes have difficulty comparing results from two different tests. P-values provide a solution to this problem.

Even a low p-value is not necessarily proof of statistical significance, since there is still a possibility that the observed data are the result of chance. Only repeated experiments or studies can confirm if a relationship is statistically significant.

For example, suppose a study comparing returns from two particular assets was undertaken by different researchers who used the same data but different significance levels. The researchers might come to opposite conclusions regarding whether the assets differ.

If one researcher used a confidence level of 90% and the other required a confidence level of 95% to reject the null hypothesis, and if the p-value of the observed difference between the two returns was 0.08 (corresponding to a confidence level of 92%), then the first researcher would find that the two assets have a difference that is statistically significant , while the second would find no statistically significant difference between the returns.

To avoid this problem, the researchers could report the p-value of the hypothesis test and allow readers to interpret the statistical significance themselves. This is called a p-value approach to hypothesis testing. Independent observers could note the p-value and decide for themselves whether that represents a statistically significant difference or not.

Example of P-Value

An investor claims that their investment portfolio’s performance is equivalent to that of the Standard & Poor’s (S&P) 500 Index . To determine this, the investor conducts a two-tailed test.

The null hypothesis states that the portfolio’s returns are equivalent to the S&P 500’s returns over a specified period, while the alternative hypothesis states that the portfolio’s returns and the S&P 500’s returns are not equivalent—if the investor conducted a one-tailed test , the alternative hypothesis would state that the portfolio’s returns are either less than or greater than the S&P 500’s returns.

The p-value hypothesis test does not necessarily make use of a preselected confidence level at which the investor should reset the null hypothesis that the returns are equivalent. Instead, it provides a measure of how much evidence there is to reject the null hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis.

Thus, if the investor finds that the p-value is 0.001, there is strong evidence against the null hypothesis, and the investor can confidently conclude that the portfolio’s returns and the S&P 500’s returns are not equivalent.

Although this does not provide an exact threshold as to when the investor should accept or reject the null hypothesis, it does have another very practical advantage. P-value hypothesis testing offers a direct way to compare the relative confidence that the investor can have when choosing among multiple different types of investments or portfolios relative to a benchmark such as the S&P 500.

For example, for two portfolios, A and B, whose performance differs from the S&P 500 with p-values of 0.10 and 0.01, respectively, the investor can be much more confident that portfolio B, with a lower p-value, will actually show consistently different results.

Is a 0.05 P-Value Significant?

A p-value less than 0.05 is typically considered to be statistically significant, in which case the null hypothesis should be rejected. A p-value greater than 0.05 means that deviation from the null hypothesis is not statistically significant, and the null hypothesis is not rejected.

What Does a P-Value of 0.001 Mean?

A p-value of 0.001 indicates that if the null hypothesis tested were indeed true, then there would be a one-in-1,000 chance of observing results at least as extreme. This leads the observer to reject the null hypothesis because either a highly rare data result has been observed or the null hypothesis is incorrect.

How Can You Use P-Value to Compare 2 Different Results of a Hypothesis Test?

If you have two different results, one with a p-value of 0.04 and one with a p-value of 0.06, the result with a p-value of 0.04 will be considered more statistically significant than the p-value of 0.06. Beyond this simplified example, you could compare a 0.04 p-value to a 0.001 p-value. Both are statistically significant, but the 0.001 example provides an even stronger case against the null hypothesis than the 0.04.

The p-value is used to measure the significance of observational data. When researchers identify an apparent relationship between two variables, there is always a possibility that this correlation might be a coincidence. A p-value calculation helps determine if the observed relationship could arise as a result of chance.

U.S. Census Bureau. “ Statistical Quality Standard E1: Analyzing Data .”

what is p value in research methodology

  • Terms of Service
  • Editorial Policy
  • Privacy Policy

P-Value in Statistical Hypothesis Tests: What is it?

P value definition.

A p value is used in hypothesis testing to help you support or reject the null hypothesis . The p value is the evidence against a null hypothesis . The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

P values are expressed as decimals although it may be easier to understand what they are if you convert them to a percentage . For example, a p value of 0.0254 is 2.54%. This means there is a 2.54% chance your results could be random (i.e. happened by chance). That’s pretty tiny. On the other hand, a large p-value of .9(90%) means your results have a 90% probability of being completely random and not due to anything in your experiment. Therefore, the smaller the p-value, the more important (“ significant “) your results.

When you run a hypothesis test , you compare the p value from your test to the alpha level you selected when you ran the test. Alpha levels can also be written as percentages.

p value

P Value vs Alpha level

Alpha levels are controlled by the researcher and are related to confidence levels . You get an alpha level by subtracting your confidence level from 100%. For example, if you want to be 98 percent confident in your research, the alpha level would be 2% (100% – 98%). When you run the hypothesis test, the test will give you a value for p. Compare that value to your chosen alpha level. For example, let’s say you chose an alpha level of 5% (0.05). If the results from the test give you:

  • A small p (≤ 0.05), reject the null hypothesis . This is strong evidence that the null hypothesis is invalid.
  • A large p (> 0.05) means the alternate hypothesis is weak, so you do not reject the null.

P Values and Critical Values

p-value

What if I Don’t Have an Alpha Level?

In an ideal world, you’ll have an alpha level. But if you do not, you can still use the following rough guidelines in deciding whether to support or reject the null hypothesis:

  • If p > .10 → “not significant”
  • If p ≤ .10 → “marginally significant”
  • If p ≤ .05 → “significant”
  • If p ≤ .01 → “highly significant.”

How to Calculate a P Value on the TI 83

Example question: The average wait time to see an E.R. doctor is said to be 150 minutes. You think the wait time is actually less. You take a random sample of 30 people and find their average wait is 148 minutes with a standard deviation of 5 minutes. Assume the distribution is normal. Find the p value for this test.

  • Press STAT then arrow over to TESTS.
  • Press ENTER for Z-Test .
  • Arrow over to Stats. Press ENTER.
  • Arrow down to μ0 and type 150. This is our null hypothesis mean.
  • Arrow down to σ. Type in your std dev: 5.
  • Arrow down to xbar. Type in your sample mean : 148.
  • Arrow down to n. Type in your sample size : 30.
  • Arrow to <μ0 for a left tail test . Press ENTER.
  • Arrow down to Calculate. Press ENTER. P is given as .014, or about 1%.

The probability that you would get a sample mean of 148 minutes is tiny, so you should reject the null hypothesis.

Note : If you don’t want to run a test, you could also use the TI 83 NormCDF function to get the area (which is the same thing as the probability value).

Dodge, Y. (2008). The Concise Encyclopedia of Statistics . Springer. Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial.

what is p value in research methodology

Understanding P-values | Definition and Examples

P-values, or probability values, play a crucial role in statistical hypothesis testing. They help researchers determine the significance of their findings and whether they can reject the null hypothesis. Here’s a comprehensive guide to understanding p-values, including their definition, interpretation, and examples:

What is a P-value?

A p-value is a statistical measure that helps assess the evidence against a null hypothesis. In hypothesis testing, the null hypothesis (often denoted as H0) represents a statement of no effect or no difference. The p-value quantifies the probability of observing a result as extreme as, or more extreme than, the one obtained if the null hypothesis were true.

Interpreting P-values:

The interpretation of a p-value is based on a predetermined significance level, commonly denoted as alpha (α). The significance level is the threshold below which the results are considered statistically significant.

  • The result is considered statistically significant.
  • There is enough evidence to reject the null hypothesis.
  • Researchers may conclude that there is a significant effect or difference.
  • The result is not considered statistically significant.
  • There is insufficient evidence to reject the null hypothesis.
  • Researchers may fail to reject the null hypothesis, indicating a lack of significant effect or difference.

Common Significance Levels:

The choice of significance level depends on the researcher’s judgment and the field’s conventions. Commonly used significance levels include:

  • α = 0.05 (5%)
  • α = 0.01 (1%)
  • α = 0.10 (10%)

Examples of P-values:

  • H0: The new drug has no effect.
  • H1: The new drug is effective.
  • Result: p-value = 0.03 (less than 0.05).
  • Interpretation: The result is statistically significant at the 0.05 level. There is evidence to reject the null hypothesis, suggesting that the new drug is effective.
  • H0: There is no association between variables A and B.
  • H1: There is an association between variables A and B.
  • Result: p-value = 0.20 (greater than 0.05).
  • Interpretation: The result is not statistically significant at the 0.05 level. There is insufficient evidence to reject the null hypothesis, indicating no significant association.

Considerations and Limitations:

  • A low p-value does not prove that the research hypothesis is true. It only suggests that the evidence against the null hypothesis is strong.
  • Larger sample sizes may lead to smaller p-values, but significance should be interpreted in the context of practical importance.
  • Conducting multiple tests increases the likelihood of finding a significant result by chance. Adjustments (e.g., Bonferroni correction) may be applied to control for this.
  • Significance should be interpreted in the context of the specific study and its practical implications.

Conclusion:

Understanding p-values is essential for researchers conducting hypothesis tests. The p-value provides a quantitative measure of the evidence against the null hypothesis, helping researchers make informed decisions about the significance of their findings. Researchers should interpret p-values cautiously, considering the context, significance level, and practical implications of their results.

Get Help with Data Analysis, Research, Thesis, Dissertation and Assignments.

Order Now Datapott Analytics

Dissertation Writing Help

  • Dissertation Writing
  • Dissertation Proposal Writing
  • Dissertation Objectives Writing
  • Dissertation Literature Review
  • Dissertation Methodology
  • Dissertation Data Analysis

Need Our Services?

Thesis writing help.

  • Thesis Proposal Writing
  • Thesis Problem Statement
  • Thesis Introduction Writing
  • Thesis Literature Review
  • Data Analysis for Thesis
  • Thesis Methodology Writing
  • Thesis Data analysis & Interpratation
  • Thesis Discussions Writing

Editing & Proofreading Services

  • Dissertation Proofreading & Editing
  • Thesis Proofreading & Editing
  • APA Formatting & References
  • Harvard Formating & Reference
  • MLA Format & Reference
  • Professional Formatting Services
  • Grammar and Structure Checking

Stuck with Your Research or Data Analysis Project? L et Our Experts Help You :

Whatsapp us:.

what is p value in research methodology

We Make Sense out of your Data

  • ""Let Us Help you with Data Analytics & Research""
  • Company Overview
  • About the CEO

PRIVACY & TOS

  • Privacy Policy
  • Terms & Conditions
  • How it works

researchprospect post subheader

P-Value: A Complete Guide

Published by Owen Ingram at August 31st, 2021 , Revised On August 3, 2023

You might have come across this term many times in hypothesis testing .  Can you tell me what p-value is and how to calculate it? For those who are new to this term, sit back and read this guide to find out all the answers. Those already familiar with it, continue reading because you might get a chance to dig deeper about the p-value and its significance in statistics .

Before we start with what a p-value is, there are a few other terms you must be clear of. And these are the null hypothesis and alternative hypothesis .

What are the Null Hypothesis and Alternative Hypothesis?

 The alternative hypothesis is your first hypothesis predicting a relationship between different variables . On the contrary, the null hypothesis predicts that there is no relationship between the variables you are playing with.

For instance, if you want to check the impact of two fertilizers on the growth of two sets of plants. Group A of plants is given fertilizer A, while B is given fertilizer B. Now by using a two-tailed t-test , you can find out the difference between the two fertilizers.

Null Hypothesis : There is no difference in growth between the two sets of plants.

Alternative Hypothesis: There is a difference in growth between the two groups.

What is the P-value?

The p-value in statistics is the probability of getting outcomes at least as extreme as the outcomes of a statistical hypothesis test, assuming the null hypothesis to be correct. To put it in simpler words, it is a calculated number from a statistical test that shows how likely you are to have found a set of observations if the null hypothesis were plausible.

This means that p-values are used as alternatives to rejection points for providing the smallest level of significance at which the null hypothesis can be rejected . If the p-value is small, it implies that the evidence in favour of the alternative hypothesis is bigger. Similarly, if the value is big, the evidence in favour of the alternative hypothesis would be small.

How is the P-value Calculated?

You can either use the p-value tables or statistical software to calculate the p-value. The calculated numbers are based on the known probability distribution of the statistic being tested.

The online p-value tables depict how frequently you can expect to see test statistics under the null hypothesis. P-value depends on the statistical test one uses to test a hypothesis.

  • Different statistical tests can have different predictions, hence developing different test statistics. Researchers can choose a statistical test depending on what best suits their data and the effect they want to test
  • The number of independent variables in your test determines how large or small the test statistic must be to produce the same p-value

Get statistical analysis help at an affordable price

  • An expert statistician will complete your work
  • Rigorous quality checks
  • Confidentiality and reliability
  • Any statistical software of your choice
  • Free Plagiarism Report

Get statistical analysis help at an affordable price

When is a P-value Statistically Significant?

Before we talk about when a p-value is statistically significant, let’s first find out what does it mean to be statistically significant.

Any guesses?

To be statistically significant is another way of saying that a p-value is so small that it might reject a null hypothesis.

Now the question is how small?

If a p-value is smaller than 0.05 then it is statistically significant. This means that the evidence against the null hypothesis is strong. The fact that there is less than a 5 per cent chance of the null hypothesis being correct and plausible, we can accept the alternative hypothesis and reject the null hypothesis.

Nevertheless , if the p-value is less than the threshold of significance , the null hypothesis can be rejected, but that does not mean there would be a 95 percent probability of the alternative hypothesis being true. Note that the p-value is dependent or conditioned upon the null hypothesis is plausible, but it is not related to the correctness and falsity of the alternative hypothesis.

When the p-value is greater than 0.05, it is not statistically significant. It also indicates that the evidence for the null hypothesis is strong. So, the alternative hypothesis, in this case, is rejected, and the null hypothesis is retained. An important thing to keep in mind here is that you still cannot accept the null hypothesis. You can only fail to reject it or reject it.

Here is a table showing hypothesis interpretations:

P-value Decision
Not statistically significant and do not rejects the null hypothesis.
Statistically significant and rejects the null hypothesis in favour of the alternative hypothesis.
Highly statistically significant and rejects the null hypothesis in favour of the alternative hypothesis.

Is it clear now? We thought so! Let’s move on to the next heading, then.

How to Use P-value in Hypothesis Testing?

Follow these three simple steps to use p-value in hypothesis testing .

Step 1: Find the level of significance. Make sure to choose the significance level during the initial steps of the design of a hypothesis test. It is usually 0.10, 0.05, and 0.01.

Step 2: Now calculate the p-value. As we discussed earlier, there are two ways of calculating it. A simple way out would be using Microsoft Excel, which allows p-value calculation with Data Analysis ToolPak .

Step 3: Start comparing the p-value with the significance level and deduce conclusions accordingly. Following the general rule, if the value is less than the level of significance, there is enough evidence to reject the null hypothesis of an experiment.

FAQs About P-Value

What is a null hypothesis.

It is a statistical theory suggesting that there is no relationship between a set of variables .

What is an alternative hypothesis?

The alternative hypothesis is your first hypothesis predicting a relationship between different variables .

What is the p-value?

The p-value in statistics is the probability of getting outcomes at least as extreme as the outcomes of a statistical hypothesis test, assuming the null hypothesis to be correct. It is a calculated number from a statistical test that shows how likely you are to have found a set of observations if the null hypothesis were plausible.

What is the level of significance?

To be statistically significant is another way of saying that a p-value is so small that it might reject a null hypothesis. This table shows when the p-value is significant.

You May Also Like

Regression analysis is the process of mathematically figuring out which of the variables actually have an impact and which are not plausible.

This introductory article aims to define, elaborate and exemplify transferability in qualitative research.

There may be two kinds of errors that can occur when testing your hypothesis. These errors are known as the Type 1 error and the Type 2 error.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals

You are here

  • Volume 15, Issue 2
  • What is a p value and what does it mean?
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Dorothy Anne Forbes
  • Correspondence to Dorothy Anne Forbes Faculty of Nursing, University of Alberta, Level 3, Edmonton Clinic Health Academy, Edmonton, Alberta, T6G 1C9, Canada; dorothy.forbes{at}ualberta.ca

https://doi.org/10.1136/ebnurs-2012-100524

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Researchers aim to make the strongest possible conclusions from limited amounts of data. To do this, they need to overcome two problems. First, important differences in the findings can be obscured by natural variability and experimental imprecision. Thus, it is difficult to distinguish real differences from random variability. Second, researchers' natural inclination is to conclude that differences are real, and to minimise the contribution of random variability. Statistical probability minimises this from happening. 1

Statistical probability or p values reveal whether the findings in a research study are statistically significant, meaning that the findings are unlikely to have occurred by chance. To understand the p value concept, it is important to understand its relationship with the α level. Before conducting a study, researchers specify the α level which is most often set at 0.05 (5%). This conventional level was based on the writings of Sir Ronald Fisher, an influential statistician, who in 1926 reported that he preferred the 0.05 cut-off for separating the probable from the improbable. 2 Researchers who set α at 0.05 are willing to accept that there is a 5% chance that their findings are wrong. However, researchers may adopt probability cut-offs that are more generous (eg, an α set at 0.10 means there is a 10% chance that the conclusions are wrong) or more stringent (eg, an α set at 0.01 means there is a 1% chance that the conclusions are wrong). The design of the study, purpose or intuition may influence the researcher's setting of the α level. 2

To illustrate how setting the α level may affect the conclusions of a study, let us examine a research study that compared the annual incomes of hospital based nurses and community based nurses. The mean annual income for hospital based nurses was reported to be $70 000 and for community based nurses to be $60 000. The p value of this study was 0.08. If the researchers set the α level at 0.05, they would conclude that there was no significant difference between the annual incomes of hospital and community-based nurses, since the p value of 0.08 exceeded the α level of 0.05. However, if the α level had been set at 0.10, the p value of 0.08 would be less than the α level and the researchers would conclude that there was a significant difference between the annual incomes of hospital and community based nurses. Two very different conclusions. 3

It is easy to read far too much into the word significant because the statistical use of the word has a meaning entirely distinct from its usual meaning. Just because a difference is statistically significant does not mean that it is important or interesting. In the example above, at the 0.10 α level, although the findings are statistically significant, results due to chance occur 1 out of 10 times. Thus, chance of conclusion error is higher than when the α level is set at 0.05 and results due to chance occur 5 out of 100 times or 1 in 20 times. In the end, the reader must decide if the researchers selected the appropriate α level and whether the conclusions are meaningful or not.

  • ↵ Graphpad . What is a p value ? 2011 . http://www.graphpad.com/articles/pvalue.htm (accessed 10 Dec 2011) .
  • Munroe BH ,
  • Jacobsen BS
  • El-Masri MM

Competing interests None.

Read the full text or download the PDF:

what is p value in research methodology

Complete Data Science by ML+

what is p value in research methodology

What is P-Value? – Understanding the meaning, math and methods

P-value intuition and simplest explanation..

what is p value in research methodology

Introduction

In Data Science interviews, one of the frequently asked questions is ‘What is P-Value?”.

Believe it or not, even experienced Data Scientists often fail to answer this question. This is partly because of the way statistics is taught and the definitions available in textbooks and online sources.

According to American Statistical Association, “a p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.”

That’s hard to grasp, yes?

Alright, lets understand what really is p value in small meaningful pieces so ultimately it all makes sense.

When and how is p-value used?

To understand p-value, you need to understand some background and context behind it. So, let’s start with the basics.

p-values are often reported whenever you perform a statistical significance test (like t-test, chi-square test etc). These tests typically return a computed test statistic and the associated p-value. This reported value is used to establish the statistical significance of the relationships being tested.

So, whenever you see a p-value, there is an associated statistical test.

That means, there is a Hypothesis testing being conducted with a defined Null Hypothesis (H0) and a corresponding Alternate hypothesis (HA).

The p-value reported is used to make a decision on whether the null hypothesis being tested can be rejected or not.

Let’s understand a little bit more about the null and alternate hypothesis.

Now, how to frame a Null hypothesis in general?

While the null hypothesis itself changes with every statistical test, there is a general principle to frame it:

The null hypothesis assumes there is ‘no effect’ or ‘relationship’ by default .

For example: if you are testing if a drug treatment is effective or not, then the null hypothesis will assume there is not difference in outcome between the treated and untreated groups. Likewise, if you are testing if one variable influences another (say, car weight influences the mileage), then null hypothesis will postulate there is no relationship between the two.

It simply implies the absence of an effect.

Examples of Statistical Tests reporting out p-value

Here are some examples of Null hypothesis (H0) for popular statistical tests:

Welch Two Sample t-Test: The true difference in means of two samples is equal to 0

Linear Regression: The beta coefficient(slope) of the X variable is zero

Chi Square test: There is no difference between expected frequencies and observed frequencies.

Get the feel?

But how would the alternate hypothesis would look like?

The alternate hypothesis (HA) is always framed to negate the null hypothesis. The corresponding HA for above tests are as follows:

Welch Two Sample t-Test: The true difference in means of two samples is NOT equal to 0

Linear Regression: The beta coefficient(slope) of the X variable is NOT zero

Chi Square test: The difference between expected frequencies and observed frequencies is NOT zero.

What p-value really is

Now, back to the discussion on p-value.

Along with every statistical test, you will get a corresponding p-value in the results output.

What is this meant for?

It is used to determine if the data is statistically incompatible with the null hypothesis.

Not clear eh?

Let me put it in another way.

The P Value basically helps to answer the question: ‘Does the data really represent the observed effect?’.

This leads us to a more mathematical definition of P-Value.

The P Value is the probability of seeing the effect(E) when the null hypothesis is true .

p-value formula

If you think about it, we want this probability to be very low.

Having said that, it is important to remember that p-value refers to not only what we observed but also observations more extreme than what was observed. That is why the formal definition of p-value contain the statement ‘would be equal to or more extreme than its observed value.’

How is p-value used to establish statistical significance

Now that you know, p value measures the probability of seeing the effect when the null hypothesis is true.

A sufficiently low value is required to reject the null hypothesis.

Notice how I have used the term ‘Reject the Null Hypothesis’ instead of stating the ‘Alternate Hypothesis is True’.

That’s because, we have tested the effect against the null hypothesis only.

So, when the p-value is low enough, we reject the null hypothesis and conclude the observed effect holds.

But how low is ‘low enough’ for rejecting the null hypothesis?

This level of ‘low enough’ cutoff is called the alpha level, and you need to decide it before conducting a statistical test.

But how low is ‘low enough’?

Practical Guidelines to set the cutoff of Statistical Significance (alpha level)

Let’s first understand what is Alpha level.

It is the cutoff probability for p-value to establish statistical significance for a given hypothesis test. For an observed effect to be considered as statistically significant, the p-value of the test should be lower than the pre-decided alpha value.

Typically for most statistical tests(but not always), alpha is set as 0.05.

In which case, it has to be less than 0.05 to be considered as statistically significant.

What happens if it is say, 0.051?

It is still considered as not significant. We do NOT call it as a weak statistical significant. It is either black or white. There is no gray with respect to statistical significance.

Now, how to set the alpha level?

Well, the usual practice is to set it to 0.05.

But when the occurrence of the event is rare, you may want to set a very low alpha. The rarer it is, the lower the alpha.

For example in the CERN’s Hadron collider experiment to detect Higgs-Boson particles(which was very rare), the alpha level was set so low to 5 Sigma levels , which means a p value of less than 3 * 10^-7 is required reject the null hypothesis.

Whereas for a more likely event, it can go up to 0.1.

Secondly, more the samples (number of observations) you have the lower should be the alpha level. Because, even a small effect can be made to produce a lower p-value just by increasing the number of observations. The opposite is also true, that is, a large effect can be made to produce high p value by reducing the sample size.

In case you don’t know how likely the event can occur, its a common practice to set it as 0.05. But, as a thumb rule, never set the alpha greater than 0.1.

Having said that the alpha=0.05 is mostly an arbitrary choice. Then why do most people still use p=0.05? That’s because that’s what is taught in college courses and being traditionally used by the scientific community and publishers.

What P Value is Not

Given the uncertainty around the meaning of p-value, it is very common to misinterpret and use it incorrectly.

Some of the common misconceptions are as follows:

P-Value is the probability of making a mistake. Wrong!

P-Value measures the importance of a variable. Wrong!

P-Value measures the strength of an effect. Wrong!

A smaller p-value does not signify the variable is more important or even a stronger effect.

Because, like I mentioned earlier, any effect no matter how small can be made to produce smaller p-value only by increasing the number of observations (sample size).

Likewise, a larger value does not imply a variable is not important.

For a sound communication, it is necessary to report not just the p-value but also the sample size along with it. This is especially necessary if the experiments involve different sample sizes.

Secondly, making inferences and business decisions should not be based only on the p-value being lower than the alpha level.

Analysts should understand the business sense, understand the larger picture and bring out the reasoning before making an inference and not just rely on the p-value to make the inference for you.

Does this mean the p-value is not useful anymore?

Not really. It is a useful tool because it provides an objective standard for everyone to assess. Its just that you need to use it the right way.

Example: How to find p-value for linear regression

Linear regression is a traditional statistical modeling algorithm that is used to predict a continuous variable (a.k.a dependent variable) using one or more explanatory variables.

Let’s see an example of extracting the p-value with linear regression using the mtcars dataset. In this dataset the specifications of the vehicle and the mileage performance is recorded.

We want to use linear regression to test if one of the specs “the ‘weight’ ( wt ) of the vehicle” has a significant relationship (linear) with the ‘mileage’ ( mpg ).

This can be conveniently done using python’s statsmodels library. But first, let’s load the data.

With statsmodels library

what is p value in research methodology

The X( wt ) and Y ( mpg ) variables are ready.

Null Hypothesis (H0): The slope of the line of best fit (a.k.a beta coefficient) is zero Alternate Hypothesis (H1): The beta coefficient is not zero.

To implement the test, use the smf.ols() function available in the formula.api of statsmodels . You can pass in the formula itself as the first argument and call fit() to train the linear model.

Once model is trained, call model.summary() to get a comprehensive view of the statistics.

The p-value is located in under the P>|t| against wt row. If you want to extract that value into a variable, use model.pvalues .

Since the p-value is much lower than the significance level (0.01), we reject the null hypothesis that the slope is zero and take that the data really represents the effect.

Well, that was just one example of computing p-value.

Whereas p-value can be associated with numerous statistical tests. If you are interested in finding out more about how it is used, see more examples of statistical tests with p-values.

In this post we covered what exactly is a p-value and how and how not to use it. We also saw a python example related to computing the p-value associated with linear regression.

Now with this understanding, let’s conclude what is the difference between Statistical Model from Machine Learning model?

Well, while both statistical as well as machine learning models are associated with making predictions, there can be many differences between these two. But most simply put, any predictive model that has p-values associated with it are considered as statistical model.

Happy learning!

what is p value in research methodology

Ready for more?

  • Open access
  • Published: 24 June 2020

P-values – a chronic conundrum

  • Jian Gao   ORCID: orcid.org/0000-0001-8101-740X 1  

BMC Medical Research Methodology volume  20 , Article number:  167 ( 2020 ) Cite this article

22k Accesses

21 Citations

28 Altmetric

Metrics details

In medical research and practice, the p -value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain language and concrete examples, this paper is intended to elucidate the p -value confusion from its root, to explicate the difference between significance and hypothesis testing, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p -value.

The confusion with p -values has plagued the research community and medical practitioners for decades. However, efforts to clarify it have been largely futile, in part, because intuitive yet mathematically rigorous educational materials are scarce. Additionally, the lack of a practical alternative to the p -value for guarding against randomness also plays a role. The p -value confusion is rooted in the misconception of significance and hypothesis testing. Most, including many statisticians, are unaware that p -values and significance testing formed by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson. And most otherwise great statistics textbooks tend to cobble the two paradigms together and make no effort to elucidate the subtle but fundamental differences between them. The p -value is a practical tool gauging the “strength of evidence” against the null hypothesis. It informs investigators that a p -value of 0.001, for example, is stronger than 0.05. However, p -values produced in significance testing are not the probabilities of type I errors as commonly misconceived. For a p -value of 0.05, the chance a treatment does not work is not 5%; rather, it is at least 28.9%.

Conclusions

A long-overdue effort to understand p -values correctly is much needed. However, in medical research and practice, just banning significance testing and accepting uncertainty are not enough. Researchers, clinicians, and patients alike need to know the probability a treatment will or will not work. Thus, the calibrated p -values (the probability that a treatment does not work) should be reported in research papers.

Peer Review reports

Without any exaggeration, humankind’s wellbeing is profoundly affected by p -values: Health depends on prevention and intervention, ascertaining their efficacies relies on research, and research findings hinge on p -values. The p-value is a sine qua non for deciding if a research finding is real or by chance, a treatment is effective or even harmful, a paper will be accepted or rejected, a grant will be funded or declined, or if a drug will be approved or denied by U.S. Food & Drug Administration (FDA).

Yet, the misconception of p -values is pervasive and virtually universal. “The P value is probably the most ubiquitous and at the same time, misunderstood, misinterpreted, and occasionally miscalculated index in all of biomedical research [ 1 ].” Even “among statisticians there is a near ubiquitous misinterpretation of p values as frequentist error probabilities [ 2 ].”

The extent of the p -value confusion is well illustrated by a survey of medical residents published in the Journal of the American Medical Association ( JAMA) . In the article, 88% of the residents expressed fair to complete confidence in understanding p -values, but 100% of them had the p-value interpretation wrong [ 1 , 3 ]. Make no mistake, they are the future experts and leaders in clinical research that will affect public health policies, treatment options, and ultimately people’s health.

The survey published in JAMA used multiple-choice format with four potential answers for a correct interpretation of p  > 0.05 [ 3 ]:

The chances are greater than 1 in 20 that a difference would be found again if the study were repeated.

The probability is less than 1 in 20 that a difference this large could occur by chance alone.

The probability is greater than 1 in 20 that a difference this large could occur by chance alone.

The chance is 95% that the study is correct.

How could it be possible that 100% of the residents selected incorrect answers when one of the possible choices was supposed to be correct? As reported in the paper [ 3 ], 58.8% of the residents selected choice c which was designated by the authors as the correct answer. The irony is that choice c is not correct either. In fact, none of the four choices are correct. So, not only were the residents who picked choice c wrong but also the authors as well. Keep in mind, the paper was peer-reviewed and published by one of the most prestigious medical journals in the world.

This is no coincidence -- most otherwise great statistics textbooks make no effort or fail to clarify the massive confusion about p -values, and even provide outright wrong interpretations. The confusion is near-universal among medical researchers and clinicians [ 4 , 5 , 6 ].

Unfortunately, the misunderstanding of p -values is not inconsequential. For a p-value of 0.05, the chance a treatment doesn’t work is not 5%; rather, it is at least 28.9% [ 7 ].

After decades of misunderstanding and inaction, the pendulum of p -values finally started to swing in 2014 when the American Statistical Association (ASA) was taunted by two pairs of questions and answers on its discussion forum:

Q: Why do so many colleges and grad schools teach p  = 0.05?

A: Because that’s still what the scientific community and journal editors use.

Q: Why do so many people still use p  = 0.05?

A: Because that’s what they were taught in college or grad school.

The questions and answers, posted by George Cobb, Professor Emeritus of Mathematics & Statistics from Mount Holyoke College, spurred the ASA Board into action. In 2015, for the first time, the ASA board decided to take on the challenge of developing a policy statement on p -values, much like the American Heart Association (AHA) policy statement on dietary fat and heart disease. After months of preparation, in October 2015, a group of 20 experts gathered at the ASA Office in Alexandria, Virginia and laid out the roadmap during a two-day meeting. Over the next three months, multiple drafts of the p -value statement were produced. On January 29, 2016, the ASA Executive Committee approved the p -value statement with six principles listed on what p -values are or are not [ 8 ].

Although the statement hardly made any ripples in medical journals, it grabbed many statisticians’ attention and ignited a rebellion against p -values among some scientists. In March 2019, Nature published a comment with over 800 signatories calling for an end of significance testing with p  < 0.05 [ 9 ]. At the same time, the American Statistician that carried the ASA’s p -value statement published a special issue with 43 articles exploring ways to report results without statistical significance testing. Unfortunately, no consensus was reached for a better alternative in gauging the reliability of studies, and the authors even disagreed on whether the p -value should continue to be used or abandoned. The only agreement reached was the abolishment of significance testing as summarized in the special issue’s editorial: “statistically significant” – don’t say it and don’t use it [ 10 ].

So, for researchers, practitioners, and journals in the medical field, what will replace significance testing? And what is significance testing anyway? Is it different from hypothesis testing? Should p -values be banned too? If not, how should p-values be used and interpreted? In healthcare or medicine, we must accept uncertainty as the editorial of the special issue urged, but do we need to know how likely a given treatment will work or not?

To answer these questions, we must get to the bottom of the misconception and confusion, and we must identify a practical alternative(s). However, despite numerous publications on this topic, few studies aimed for these goals are understandable to non-statisticians and retain mathematical rigor at the same time. This paper is intended to fill this gap by using plain language and concrete examples to elucidate the p -value confusion from its root, to intuitively describe the true meaning of p -values, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p -value.

The root of confusion

The p-value confusion began 100 years ago when the father of modern statistics, Ronald Aylmer Fisher, formed the paradigm of significance testing. But it should be noted Fisher bears no blame for the misconception; it is the users who tend to muddle Fisher’s significance testing with hypothesis testing developed by Jerzy Neyman and Egon Pearson. To clarify the confusion, this section uses concrete examples and plain language to illustrate the essence of significance and hypothesis testing and to explicate the difference between the p -value and the type I error.

  • Significance testing

Suppose a painkiller has a proven track record of lasting for 24 h and now another drug manufacturer claims its new over-the-counter painkiller lasts longer. An investigator wants to test if the claim is true. Instead of collecting data from all the patients who took the new medication, which is often infeasible, the investigator decided to randomly survey 50 patients to gather data on how long (hours) the new painkiller lasts. Thus, the investigator now has a random variable \( \overline{X} \) , the average hours from a sample of 50 patients. This is a random variable because the 50 patients are randomly selected, and nobody knows what value this variable will take before conducting the survey and calculating the average. Nevertheless, each survey does produce a fixed number, \( \overline{x} \) , which itself is not a random variable, rather it is a realization or observation of the random variable \( \overline{X} \) (hereafter, let \( \overline{X} \) denote a random variable and \( \overline{x} \) denote a fixed value, an observation of \( \overline{X} \) ).

Intuitively, if the survey yielded a value (average hours the painkiller lasts) very close to 24, say, 23 or 25, the investigator would not believe the new painkiller is worse or better. If the survey came to an average of 32 h the investigator would believe it indeed lasts longer. However, it would be hard to form an opinion if the survey showed an average of 22 or 26 h. Does the new painkiller really last shorter, longer, or it is due to random chance (after all, only 50 patients were surveyed)?

This is where the significance test formulated by Fisher in the 1920s comes in. Note that although modern significance testing began with the Student’s t -test in 1908, it was Fisher who extended the test to the testing of two samples, regression coefficients, as well as analysis of variance, and created the paradigm of significance testing.

In Fisher’s significance testing, the Central Limit Theorem (CLT) plays a vital role, which states that given a population with a mean of μ and a variance of σ 2 , regardless of the shape of its distribution, the sample mean statistic \( \overline{X} \) has a normal distribution with the same mean μ and variance σ 2 /n, or \( \frac{\left(\overline{X}-\upmu \right)}{\sigma /\sqrt{n}} \) has a standard normal distribution with a mean of 0 and a variance of 1, as long as the sample size n is large enough. In practice, the distribution of the study population is often unknown and n  ≥ 30 is considered sufficient for the sample mean statistic to have an approximately normal distribution.

In conducting the significance test, a null hypothesis is first formed, i.e., there is no difference between the new and old painkillers, or the new painkiller also lasts for 24 h (the mean of \( \overline{X} \) = μ =24). Under this assumption and based on CLT, \( \overline{X} \) is normally distributed with a mean of 24 and a variance of σ 2 /50. Assume σ 2  = 200 (typically σ 2 is unknown but can be estimated), then \( \overline{X} \) has a normal distribution N (24, 2) , or \( Z=\left(\overline{X}-24\right)/2 \) has a standard normal distribution with a mean of 0 and standard deviation of 1 (Z is a standardized random variable). The next step is to calculate z = \( \mid \overline{x}-24\mid /2 \) based on the survey data and then find the p -value or the probability of |Z| > z from a normal distribution table (z is a fixed value or an observation of Z). Fisher suggested if the p -value is smaller than 0.05 then the hypothesis is rejected. He argued that the farther the sample mean \( \overline{x} \) from the population mean μ, the smaller the p -value, the less likely the null hypothesis is true. Just as Fisher stated, “Either an exceptionally rare chance has occurred or the theory [H 0 ] is not true [ 11 ].”

Based on this paradigm, if the survey came back with an average of 26 h, i.e., \( \overline{x}=26, \) then z  = 1 and p  = 0.3173, as a result, the investigator accepts the null hypothesis (orthodoxically, fails to reject the null hypothesis), i.e., the new painkiller does not last longer and the difference between 24 and 26 h is due to chance or random factors. On the other hand, if the survey revealed an average of 28 h, i.e., \( \overline{x}=28, \) then z  = 2, and p  = 0.0455, thus the null hypothesis is rejected. In other words, the new painkiller is deemed lasting longer.

Now, can the p -value, 0.0455, be interpreted as the probability of the type I error, or only 4.55% chance the new painkiller does not last longer (no difference), or the probability that the difference between 24 and 28 h is due to chance, or the investigator could make a mistake by rejecting the null hypothesis but only wrong about 4.55% of the time? The answer is No.

So, what is a p -value? Precisely, a p-value tells us how often we would see a difference as extreme as or more extreme than what is observed if there really were no difference . Drawing a bell curve with the p -value on it will readily delineate this definition or concept.

In the example above, if the new painkiller also lasts for 24 h, the p-value of 0.0455 means there is a 4.55% chance that the investigator would observe \( \overline{x}\le 20 \) or \( \overline{x}\ge 28 \) ; it is not 4.55% chance the new painkiller also lasts for 24 h. It is categorically wrong to believe the p -value is the probability of the null hypothesis being true (there is no difference), or 1 – p is the probability of the null hypothesis being false (there is a difference) because the p -value is deduced based on the premise that the null hypothesis is true. The p-value, a conditional probability given H 0 is true, is totally invalidated if the null hypothesis is deemed not true.

In addition, p -values are data-dependent: each test (survey) produces a different p-value; for the same analysis, it is illogical to say the error rate or the type I error is 31.73% based on one sample (survey) and 4.55% based on another. There is no theoretical or empirical basis for such frequency interpretations. In fact, Fisher himself was fully aware that his p -value, a relative measure of evidence against the null hypothesis, does not bear any interpretation of the long-term error rate. When the p -value was misinterpreted, he protested the p-value was not the type I error rate, had no long-run frequentist characteristics, and should not be explained as a frequency of error if the test was repeated [ 12 ].

Interestingly, Fisher was an abstract thinker at the highest level, but often developed solutions and tests without solid theoretical foundation. He was an obstinate proponent of inductive inference, i.e., reasoning from specific to general, or from sample to population, which is reflected by his significance testing.

  • Hypothesis testing

On the contrary, mathematicians Jerzy Neyman and Egon Pearson dismissed the idea of inductive inference all together and insisted reasoning should be deductive, i.e., from general to specific. In 1928, they published the landmark paper on the theoretical foundation for a statistical inference method that they called “hypothesis test [ 12 ].” In the paper, they introduced the concepts of alternative hypothesis H 1, type I and type II errors, which were groundbreaking. The Neyman and Pearson’s hypothesis test is deductive in nature, i.e., reasoning from general to particular. The type I and type II errors, which must be set ahead, formulate a “rule of behavior” such that “in the long run of experience, we shall not be too often wrong,” as stated by Neyman and Pearson [ 13 ].

The hypothesis test can be illustrated by a four-step process with the painkiller example.

The first step is to lay out what the investigator seeks to test, i.e. to establish a null hypothesis, H 0, and an alternative hypothesis, H 1 :

The second step is to set the criteria for the decision, or to specify an acceptable rate of mistake if the test is conducted many times. Specifically, that is to set the probability of the type I error, α, and the probability of the type II error, β.

A type I error refers to the mistake of rejecting the null hypothesis when it is true (claiming the treatment works or the new drug lasts longer but actually it does not). Conventionally and almost universally, the probability of the type I error or α is set to 0.05, which means 5% of the time one will be wrong if carrying out the test many times. A type II error is the failure to reject the null hypothesis that is not true; the probability of the type II error, β, is conventionally set as 0.2, which is equivalent to a power of 0.8, the probability of detecting the difference if it exists. Table  1 summarizes the type I and type II errors.

The third step is to select a statistic and the associated distribution for the test. For the painkiller example, the statistic is Z = ( \( \overline{X}-24 \) )/2, and the distribution is the standard normal. Because the type I error has been set to 0.05 and Z has a standard normal distribution under the null hypothesis, as shown in Fig.  1 , 1.96 becomes the critical value, − 1.96  ≤ z ≤  1.96 becomes the acceptance region, and z < − 1.96 or z > 1.96 becomes the rejection regions.

figure 1

Standard Normal Distribution with Critical Value 1.96 

The final step is to calculate the z value and make a decision. Similar to significance testing, if the survey resulted in \( \overline{x}=26, \) then z = 1 < 1.96 and the investigator accepts the null hypothesis; if the survey revealed \( \overline{x}=28, \) then z = 2 > 1.96 and the investigator rejects the null hypothesis and accepts the alternative hypothesis. It is interesting to note, in significance testing, “one can never accept the null hypothesis, only failed to reject it,” while that is not the case in hypothesis testing.

Unlike Fisher’s significance test, the hypothesis test possesses a nice frequency explanation: one can be wrong by rejecting the null hypothesis but cannot be wrong more than 5% of the time in the long run if the test is performed many times. Quite intuitively, every time the null hypothesis is rejected (when z < − 1.96 or z > 1. 96) there is a chance that the null hypothesis is true, and a mistake is made. When the null hypothesis is true, Z = ( \( \overline{X}-24 \) )/2 is a random variable with a standard normal distribution as shown in Fig.  1 , thus 5% of the time z = ( \( \overline{x}-24 \) )/2 would fall outside (− 1.96, 1.96) and the decision will be wrong 5% of the time. Of course, when the null hypothesis is not true, rejecting it is not a mistake.

Noticeably, the p -value plays no role in hypothesis testing under the framework of the Neyman-Pearson paradigm [ 12 , 14 ]. However, most, including many statisticians, are unaware that p -values and significance testing created by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson [ 14 , 15 ], and many statistics textbooks tend to cobble them together [ 2 , 14 ]. The near-universal confusion is, at least in part, caused by the subtle similarities and differences between the two tests:

Both the significance and hypothesis tests use the same statistic and distribution, for example, Z = ( \( \overline{X}-24 \) )/2 and N (0, 1).

The hypothesis test compares the observed z with the critical value 1.96, while the significance test compares the p -value (based on z) to 0.05, which are linked by P (| Z | > 1.96) = 0.05.

The hypothesis test sets the type I error α at 0.05, while the significance test also uses 0.05 as the significance level.

One of the key differences is, for the p -value to be meaningful in significance testing, the null hypothesis must be true, while this is not the case for the critical value in hypothesis testing. Although the critical value is derived from α based on the null hypothesis, rejecting the null hypothesis is not a mistake when it is not true; when it is true, there is a 5% chance that z = ( \( \overline{x}-24 \) )/2 will fall outside (− 1.96, 1.96), and the investigator will be wrong 5% of the time (bear in mind, the null hypothesis is either true or false when a decision is made). In addition, the type I error and the resultant critical value is set ahead and fixed, while the p -value is a moving “target” varying from sample to sample even for the same test.

As if it was not confusing enough, the understanding and interpretation of p -values are also complicated by non-experimental studies where model misspecifications and even p-hacking are common, which often misleads the audience to believe the model and the findings are valid for its small p -values [ 16 ]. In fact, p-values have little value in assessing if the relationship between an outcome and exposure(s) is causal or just an artifact of confounding – one cannot claim the use of smartphones causes gun violence even if the p -value for their correlation is close to zero. To see the p-value problem at its core and to elucidate the confusion, the discussion of p-values should be in the context of experimental designs such as randomized controlled trials where the model or the outcome and exposure(s) are correctly specified.

The Link between P -values and Type I Errors

The p-value fallacy can be readily quantified under a Bayesian framework [ 7 , 17 , 18 ]. However, “those ominous words [Bayes theorem], with their associations of hazy prior probabilities and abstruse mathematical formulas, strike fear into the hearts of most us, clinician, researcher, and editor alike [ 19 ],” as Frank Davidoff, former Editor of the Annals of Internal Medicine , wrote. It is understandable but still unfortunate that Bayesian methods such as Bayes factors, despite their merit, are still considered exotic by the medical research community.

Thanks to Sellke, Bayarri, and Berger, the difference between the p -value and the type I error is quantified [ 7 ]. Based on the conditional frequentist approach, which was formalized by Kiefer and further developed by others [ 20 , 21 , 22 , 23 ], Berger and colleagues established the lower bound of the error rate P(H 0 │| Z| >z 0 ) or the type I error given the p-value [ 7 ]: 

As shown, the lower bound equation is mathematically straightforward. Noteworthy is that the derivation of the lower bound is also ingeniously elegant (a simplified proof is provided in the Supplementary File for those who are interested in it). The relationship between p -values and type I errors (lower bound) can be readily seen from Table  2 showing some of the commonly reported results [ 7 ].

As seen in Table 2 , the difference between p -values and the error probabilities (lower bound) is quite striking. A p-value of 0.05, commonly misinterpreted as only 5% chance the treatment does not work, seems to offer strong evidence against the null hypothesis; however, the true probability of the treatment not working is at least 0.289. Keep in mind, the relationship between the p-value and the type I error is the lower bound; in fact, many prefer to report the upper bound [ 6 , 7 ].

The discrepancy between the p-value and the lower-bound error rate explains the big puzzle of why so many wonder drugs and treatments worldwide lose their amazing power outside clinical trials [ 24 , 25 , 26 ]. This discrepancy likely also contributes to the frequently reported contradictory findings on risk factors and health outcomes in observational studies. For example, an early study published in the New England Journal of Medicine found drinking coffee was associated with a high risk of pancreatic cancer [ 27 ]. The finding became a big headline in The New York Times [ 28 ] and the leading author and probably many frightened readers stopped drinking coffee. Later studies, however, concluded the finding was a fluke [ 29 , 30 ]. Likewise, the p -value fallacy may also contribute to the ongoing confusion of dietary fat intake and heart disease. On the one hand, a meta-analysis published in Annals of Internal Medicine in 2014 concluded “Current evidence does not clearly support cardiovascular guidelines that encourage high consumption of polyunsaturated fatty acids and low consumption of total saturated fats [ 31 ].” On the other hand, in the 2017 recommendation, the American Heart Association (AHA) stated “Taking into consideration the totality of the scientific evidence, satisfying rigorous criteria for causality, we conclude strongly that lowering intake of saturated fat and replacing it with unsaturated fats, especially polyunsaturated fats, will lower the incidence of CVD [ 32 ].”

In short, the misunderstanding and misinterpretation of the relationship between the p -value and the type I error all too often exaggerate the true effects of treatments and risk factors, which in turn leads to conflicting findings with real public health consequences.

The future of P -values

It is readily apparent that the p-value conundrum poses a serious challenge to researchers and practitioners alike in healthcare with real-life consequences. To address the p-value complications, some believe the use of p -values should be banned or discouraged [ 33 , 34 ]. In fact, since 2015, Basic and Applied Social Psychology has officially banned significance tests and p-values [ 35 ], and Epidemiology has a longstanding policy discouraging the use of significance testing and p -values [ 36 , 37 ]. On the other hand, many are against a total ban [ 38 , 39 ]. P -values do possess practical utility -- they offer insight into what is observed and are the first line of defense against being fooled by randomness. You would be more suspicious of a coin being fair if nine heads turned up after ten flips versus, for example, if seven heads did. Similarly, you would like to see how strong the evidence is against the null hypothesis: say, a p-value of 0.0499 or 0.0001.

“It is hopeless to expect users to change their reliance on p -values unless they are offered an alternative way of judging the reliability of their conclusions [ 40 ].” Rather than banning the use of p-values, many believe the conventional significance level of 0.05 should be lowered for better research reproducibility [ 41 ]. In 2018, 72 statisticians and scientists made the case for changing p  < 0.05 to p  < 0.005 [ 42 ]. Inevitably, like most medical treatments, the proposed change is accompanied by some side effects: For instance, to achieve the same power of 80%, α = 0.005 requires a 70% larger sample size compared to α = 0.05, which could lead to fewer studies due to limited resources [ 43 ].

Other alternatives (e.g., second-generation p -values [ 44 ], and analysis of credibility [ 45 ]) have been proposed in the special issue of the American Statistician ; however, no consensus was reached. As a result, instead of recommending a ban of p -values, the accompanying editorial of the special issue called for an end of statistical significance testing [ 46 ]: “‘statistically significant’ – don’t say it and don’t use it [ 10 ].”

Will researchers and medical journals heed the “mandate” banning significance testing? It does not seem to be likely, at least thus far. Even if they do, it is no more than just a quibble – a significance test is done as long as the p -value is produced or reported – anyone seeing the result would know the p-value is greater or less than 0.05; the only difference is “Don’t ask, don’t tell.”

In any event, it is the right call to end dichotomizing the p-value and using it as the sole criterion to judge the results [ 47 ]. There is no practical difference between p  = 0.049 and p  = 0.051, and “God loves the .06 nearly as much as the .05 [ 48 ].” Furthermore, not all the results with a p -value close to 0.05 are valueless. Doctors and patients need to put p -values into context when making treatment choices, which can be well illustrated by a hypothetical but not unrealistic example. Suppose a study finds a new procedure (a kind of spine surgery) is effective in relieving debilitating neck and back pain with a p -value of 0.05, but when the procedure fails, it cripples the patient. If the patient believes there is only a 5% chance the procedure does not work or fails, he or she would probably take the chance. However, after learning the actual chance of failure is nearly 30% or higher based on the calibrated p-value, one would probably think twice. On the other hand, even if the p-value is 0.1 and the real chance of failure is nearly 40% or higher, if it does not cause serious side effects when the procedure fails, one would probably like to give it a try.

Taken together, in medicine or healthcare, the use of p -values needs more context (the balance of harms and benefits) than thresholds. However, banning significance testing and accepting uncertainty as called for by the editorial of the special issue are not enough [ 10 ]. When making treatment decisions, what researchers, practitioners, and patients alike need to know is the probability that a treatment does or does not work (the type I error). In this regard, the calibrated p -value, compared to other proposals [ 44 , 45 ], offers several advantages: (1) It provides a lower-bound, (2) it is fully frequentist although it can have a Bayesian interpretation, (3) it is easy to understand, and (4) it is easy to implement. Of course, other recommendations for improving the use of p -values may work well under different circumstances such as improving research reproducibility [ 49 , 50 ].

In medical research and practice, the p-value produced from significance testing has been widely misconstrued as the probability of the type I error, or the probability a treatment does not work. This misunderstanding comes with serious consequences: poor research reproducibility and inflated medical treatment effects. For a p -value of 0.05, the type I error or the chance a treatment does not work is not 5%; rather, it is at least 28.9%. Nevertheless, banning significance testing and accepting uncertainty, albeit well justified in many circumstances, offer little to apprise clinicians and patients of the probability a treatment will or will not work. In this respect, the calibrated p-value, a link between the p-value and the type I error, is practical and instructive.

In short, a long-overdue effort to understand p -values correctly is urgently needed and better education on statistical reasoning including Bayesian methods is desired [ 15 ]. Meanwhile, a rational action that medical journals can take is to require authors to report both conventional p-values and calibrated ones in research papers.

Availability of data and materials

Not applicable.

Abbreviations

U.S. Food & Drug Administration

Journal of the American Medical Association

American Statistical Association

Central Limit Theorem

American Heart Association

Goodman S. A dirty dozen: twelve p-value misconceptions. Semin Hematol. 2008;45(3):135–40.

Article   PubMed   Google Scholar  

Hubbard R, Bayarri MR. Confusion over measures of evidence (p's) versus errors (α's) in classical statistical testing. Am Stat. 2003;57(3):171–82.

Article   Google Scholar  

Windish DM, Huot SJ, Green ML. Medicine residents’ understanding of the biostatistics and results in the medical literature. JAMA. 2007;298:1010–22.

Article   CAS   PubMed   Google Scholar  

Berger JO, Sellke T. Testing a point null hypothesis: the irreconcilability of p-values and evidence (with discussions). J Am Stat Assoc. 1987;82(397):112–39.

Google Scholar  

Schervish MJ. P values: what they are and what they are not. Am Stat. 1996;50(3):203–6.

Berger JO. Could Fisher, Jeffreys and Neyman have agreed on testing? Stat Sci. 2003;18:1–32.

Sellke T, Bayarri MJ, Berger JO. Calibration of p value for testing precise null hypothesis. Am Stat. 2001;55(1):62–71.

Wassersteinm RL, Lazar NA. The ASA's statement on p-values: context, process, and purpose. Am Stat. 2016;70(2):129–33.

Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature. 2019;567(7748):305–7.

Wasserstein RL, Schirm AL, Lazar AN. Moving to a world beyond “p < 0.05”. Am Stat. 2019;73(S1):1–19.

Fisher RA. Statistical methods and scientific inference (2nd edition). Edinburgh: Oliver and Boyd; 1959.

Lehmann EL. Neyman's statistical philosophy. Probab Math Stat. 1995;15:29–36.

Neyman J, Pearson E. On the use and interpretation of certain test criteria for purpose of statistical inference. Biometrika. 1928;20:175–240.

Lehmann EL. The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two? J Am Stat Assoc. 1993;88(424):1242–9.

Gigerenzer G. Mindless statistics. J Socio-Econ. 2004;33:587–606.

Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337–50.

Article   PubMed   PubMed Central   Google Scholar  

Goodman SN. Toward evidence-based medical statistics. 1: the P value fallacy. Ann Intern Med. 1999;130(12):995–1004.

Goodman SN. Toward evidence-based medical statistics. 2: the Bayes factor. Ann Intern Med. 1999;130(12):1005–13.

Davidoff F. Standing statistics right up. Ann Intern Med. 1999;130(12):1019–21.

Kiefer J. Admissibility off conditional confidence procedures. Ann Math Statist. 1976;4:836–65.

Kiefer J. Conditional confidence statements and confidence estimators (with discussion). J Am Stat Assoc. 1977;72:789–827.

Berger JO, Brown LD, Wolper RL. A unified conditional frequentist and Bayesian test for fixed and sequential simple hypothesis testing. Ann Stat. 1994;22:1787–807.

Berger JO, Boukai B, Wang Y. Unified frequentist and Bayesian testing of a precise hypothesis (with discussion). Stat Sci. 1997;12:133–60.

Matthews R. The great health hoax. The Sunday Telegraph; 1998. Archived: http://junksciencearchive.com/sep98/matthews.html . Accessed 6 June 2020.

Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005;294(2):218–28.

Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008;19(5):640–8.

MacMahon B, Yen S, Trichopoulos D, Warren K, Nardi G. Coffee and cancer of the pancreas. N Engl J Med. 1981;304(11):630–3.

http://www.nytimes.com/1981/03/12/us/study-links-coffee-use-to-pancreas-cancer.html . Accessed 2 May 2020.

Hsieh CC, MacMahon B, Yen S, Trichopoulos D, Warren K, Nardi G. Coffee and pancreatic cancer (chapter 2). N Engl J Med. 1986;315(9):587–9.

CAS   PubMed   Google Scholar  

Turati F, Galeone C, Edefonti V. A meta-analysis of coffee consumption and pancreatic cancer. Ann Oncol. 2012;23(2):311–8.

Chowdhury R, Warnakula S, Kunutsor S, et al. Association of dietary, circulating, and supplement fatty acids with coronary risk: a systematic review and meta-analysis. Ann Intern Med. 2014;160:398–406.

Sacks FM, Lichtenstein AH, Wu JHY, et al. Dietary fats and cardiovascular disease: a presidential advisory from the American Heart Association. Circulation. 2017;136(3):e1–e23.

Goodman SN. Why is getting rid of p-values so hard? Musings on science and statistics. Am Stat. 2019;73(S1):26–30.

Tong C. Statistical inference enables bad science; statistical thinking enables good science. Am Stat. 2019;73(S1):246–26.

Trafimow D, Marks M. Editorial. Basic Appl Soc Psychol. 2015;37:1–2.

Lang JM, Rothman KJ, Cann CI. That confounded P-value. Epidemiology. 1998;9(1):7–8.

http://journals.lww.com/epidem/Pages/informationforauthors.aspx . Accessed 2 May 2020.

Kruege JI, Heck PR. Putting the p-value in its place. Am Stat. 2019;73(S1):122–8.

Greenland S. Valid p-values behave exactly as they should: some misleading criticisms of p-values and their resolution with s-values. Am Stat. 2019;73(S1):106–14.

Colquhoun D. The false positive risk: a proposal concerning what to do about p-values. Am Stat. 2019;73(S1):192–201.

Johnson VE. Revised standards for statistical evidence. Proc Natl Acad Sci U S A. 2013;110(48):19313–7.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Benjamin DJ, Berger JO, Johannesson M, et al. Redefine statistical significance. Nat Hum Behav. 2018;2:6–10.

Lakens D, Adolfi FG, Albers CJ, et al. Justify your alpha. Nat Hum Behav. 2018;2:168–71.

Blume JD, Greevy RA, Welty VF, et al. An introduction to second-generation p-values. Am Stat. 2019;73(S1):157–67.

Matthews RAJ. Moving towards the post p < 0.05 era via the analysis of credibility. Am Stat. 2019;73(S1):202–12.

McShane BB, Gal D, Gelman A, et al. Abandon statistical significance. Am Stat. 2019;73(S1):235–45.

Liao JM, Stack CB, Goodman S. Annals understanding clinical research: interpreting results with large p values. Ann Intern Med. 2018;169(7):485–6.

Rosnow RL, Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Am Psychol. 1989;44:1276–84.

Benjamin D, Berger JO. Three recommendations for improving the use of p-values. Am Stat. 2019;73(S1):186–91.

Held L, Ott M. On p-values and Bayes factors. Annual Review of Statistics and Its Application. 2018;5:393–419.

Download references

Acknowledgements

This material is based upon work supported (or supported in part) by the Department of Veterans Affairs, Veterans Health Administration, Office of Research and Development. The author is indebted to Mr. Fred Malphurs, a retired senior healthcare executive, a visionary leader, who devoted his entire 38-year career to Veterans healthcare, for his unwavering support of research to improve healthcare efficiency and effectiveness. The author is also grateful to the Reviewers and Editorial Board members for their insightful and constructive comments and advice. The author would also like to thank Andrew Toporek and an anonymous reviewer for their helpful suggestions and assistance.

Author information

Authors and affiliations.

Department of Veterans Affairs, Office of Productivity, Efficiency and Staffing (OPES, RAPID), Albany, USA

You can also search for this author in PubMed   Google Scholar

Contributions

JG conceived/designed the study and wrote the manuscript. The author read and approved the final manuscript.

Author’s information

Director of Analytical Methodologies, Office of Productivity, Efficiency and Staffing, RAPID, U.S. Department of Veterans Affairs.

Corresponding author

Correspondence to Jian Gao .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests, additional information, publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Gao, J. P-values – a chronic conundrum. BMC Med Res Methodol 20 , 167 (2020). https://doi.org/10.1186/s12874-020-01051-6

Download citation

Received : 21 February 2020

Accepted : 12 June 2020

Published : 24 June 2020

DOI : https://doi.org/10.1186/s12874-020-01051-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Type I error
  • Research reproducibility
  • Calibrated P -values

BMC Medical Research Methodology

ISSN: 1471-2288

what is p value in research methodology

Frequently asked questions

What is a p-value.

A p -value , or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test .

Frequently asked questions: Statistics

As the degrees of freedom increase, Student’s t distribution becomes less leptokurtic , meaning that the probability of extreme values decreases. The distribution becomes more and more similar to a standard normal distribution .

The three categories of kurtosis are:

  • Mesokurtosis : An excess kurtosis of 0. Normal distributions are mesokurtic.
  • Platykurtosis : A negative excess kurtosis. Platykurtic distributions are thin-tailed, meaning that they have few outliers .
  • Leptokurtosis : A positive excess kurtosis. Leptokurtic distributions are fat-tailed, meaning that they have many outliers.

Probability distributions belong to two broad categories: discrete probability distributions and continuous probability distributions . Within each category, there are many types of probability distributions.

Probability is the relative frequency over an infinite number of trials.

For example, the probability of a coin landing on heads is .5, meaning that if you flip the coin an infinite number of times, it will land on heads half the time.

Since doing something an infinite number of times is impossible, relative frequency is often used as an estimate of probability. If you flip a coin 1000 times and get 507 heads, the relative frequency, .507, is a good estimate of the probability.

Categorical variables can be described by a frequency distribution. Quantitative variables can also be described by a frequency distribution, but first they need to be grouped into interval classes .

A histogram is an effective way to tell if a frequency distribution appears to have a normal distribution .

Plot a histogram and look at the shape of the bars. If the bars roughly follow a symmetrical bell or hill shape, like the example below, then the distribution is approximately normally distributed.

Frequency-distribution-Normal-distribution

You can use the CHISQ.INV.RT() function to find a chi-square critical value in Excel.

For example, to calculate the chi-square critical value for a test with df = 22 and α = .05, click any blank cell and type:

=CHISQ.INV.RT(0.05,22)

You can use the qchisq() function to find a chi-square critical value in R.

For example, to calculate the chi-square critical value for a test with df = 22 and α = .05:

qchisq(p = .05, df = 22, lower.tail = FALSE)

You can use the chisq.test() function to perform a chi-square test of independence in R. Give the contingency table as a matrix for the “x” argument. For example:

m = matrix(data = c(89, 84, 86, 9, 8, 24), nrow = 3, ncol = 2)

chisq.test(x = m)

You can use the CHISQ.TEST() function to perform a chi-square test of independence in Excel. It takes two arguments, CHISQ.TEST(observed_range, expected_range), and returns the p value.

Chi-square goodness of fit tests are often used in genetics. One common application is to check if two genes are linked (i.e., if the assortment is independent). When genes are linked, the allele inherited for one gene affects the allele inherited for another gene.

Suppose that you want to know if the genes for pea texture (R = round, r = wrinkled) and color (Y = yellow, y = green) are linked. You perform a dihybrid cross between two heterozygous ( RY / ry ) pea plants. The hypotheses you’re testing with your experiment are:

  • This would suggest that the genes are unlinked.
  • This would suggest that the genes are linked.

You observe 100 peas:

  • 78 round and yellow peas
  • 6 round and green peas
  • 4 wrinkled and yellow peas
  • 12 wrinkled and green peas

Step 1: Calculate the expected frequencies

To calculate the expected values, you can make a Punnett square. If the two genes are unlinked, the probability of each genotypic combination is equal.

RRYY RrYy RRYy RrYY
RrYy rryy Rryy rrYy
RRYy Rryy RRyy RrYy
RrYY rrYy RrYy rrYY

The expected phenotypic ratios are therefore 9 round and yellow: 3 round and green: 3 wrinkled and yellow: 1 wrinkled and green.

From this, you can calculate the expected phenotypic frequencies for 100 peas:

Round and yellow 78 100 * (9/16) = 56.25
Round and green 6 100 * (3/16) = 18.75
Wrinkled and yellow 4 100 * (3/16) = 18.75
Wrinkled and green 12 100 * (1/16) = 6.21

Step 2: Calculate chi-square

Round and yellow 78 56.25 21.75 473.06 8.41
Round and green 6 18.75 −12.75 162.56 8.67
Wrinkled and yellow 4 18.75 −14.75 217.56 11.6
Wrinkled and green 12 6.21 5.79 33.52 5.4

Χ 2 = 8.41 + 8.67 + 11.6 + 5.4 = 34.08

Step 3: Find the critical chi-square value

Since there are four groups (round and yellow, round and green, wrinkled and yellow, wrinkled and green), there are three degrees of freedom .

For a test of significance at α = .05 and df = 3, the Χ 2 critical value is 7.82.

Step 4: Compare the chi-square value to the critical value

Χ 2 = 34.08

Critical value = 7.82

The Χ 2 value is greater than the critical value .

Step 5: Decide whether the reject the null hypothesis

The Χ 2 value is greater than the critical value, so we reject the null hypothesis that the population of offspring have an equal probability of inheriting all possible genotypic combinations. There is a significant difference between the observed and expected genotypic frequencies ( p < .05).

The data supports the alternative hypothesis that the offspring do not have an equal probability of inheriting all possible genotypic combinations, which suggests that the genes are linked

You can use the chisq.test() function to perform a chi-square goodness of fit test in R. Give the observed values in the “x” argument, give the expected values in the “p” argument, and set “rescale.p” to true. For example:

chisq.test(x = c(22,30,23), p = c(25,25,25), rescale.p = TRUE)

You can use the CHISQ.TEST() function to perform a chi-square goodness of fit test in Excel. It takes two arguments, CHISQ.TEST(observed_range, expected_range), and returns the p value .

Both correlations and chi-square tests can test for relationships between two variables. However, a correlation is used when you have two quantitative variables and a chi-square test of independence is used when you have two categorical variables.

Both chi-square tests and t tests can test for differences between two groups. However, a t test is used when you have a dependent quantitative variable and an independent categorical variable (with two groups). A chi-square test of independence is used when you have two categorical variables.

The two main chi-square tests are the chi-square goodness of fit test and the chi-square test of independence .

A chi-square distribution is a continuous probability distribution . The shape of a chi-square distribution depends on its degrees of freedom , k . The mean of a chi-square distribution is equal to its degrees of freedom ( k ) and the variance is 2 k . The range is 0 to ∞.

As the degrees of freedom ( k ) increases, the chi-square distribution goes from a downward curve to a hump shape. As the degrees of freedom increases further, the hump goes from being strongly right-skewed to being approximately normal.

To find the quartiles of a probability distribution, you can use the distribution’s quantile function.

You can use the quantile() function to find quartiles in R. If your data is called “data”, then “quantile(data, prob=c(.25,.5,.75), type=1)” will return the three quartiles.

You can use the QUARTILE() function to find quartiles in Excel. If your data is in column A, then click any blank cell and type “=QUARTILE(A:A,1)” for the first quartile, “=QUARTILE(A:A,2)” for the second quartile, and “=QUARTILE(A:A,3)” for the third quartile.

You can use the PEARSON() function to calculate the Pearson correlation coefficient in Excel. If your variables are in columns A and B, then click any blank cell and type “PEARSON(A:A,B:B)”.

There is no function to directly test the significance of the correlation.

You can use the cor() function to calculate the Pearson correlation coefficient in R. To test the significance of the correlation, you can use the cor.test() function.

You should use the Pearson correlation coefficient when (1) the relationship is linear and (2) both variables are quantitative and (3) normally distributed and (4) have no outliers.

The Pearson correlation coefficient ( r ) is the most common way of measuring a linear correlation. It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.

This table summarizes the most important differences between normal distributions and Poisson distributions :

Characteristic Normal Poisson
Continuous
Mean (µ) and standard deviation (σ) Lambda (λ)
Shape Bell-shaped Depends on λ
Symmetrical Asymmetrical (right-skewed). As λ increases, the asymmetry decreases.
Range −∞ to ∞ 0 to ∞

When the mean of a Poisson distribution is large (>10), it can be approximated by a normal distribution.

In the Poisson distribution formula, lambda (λ) is the mean number of events within a given interval of time or space. For example, λ = 0.748 floods per year.

The e in the Poisson distribution formula stands for the number 2.718. This number is called Euler’s constant. You can simply substitute e with 2.718 when you’re calculating a Poisson probability. Euler’s constant is a very useful number and is especially important in calculus.

The three types of skewness are:

  • Right skew (also called positive skew ) . A right-skewed distribution is longer on the right side of its peak than on its left.
  • Left skew (also called negative skew). A left-skewed distribution is longer on the left side of its peak than on its right.
  • Zero skew. It is symmetrical and its left and right sides are mirror images.

Skewness of a distribution

Skewness and kurtosis are both important measures of a distribution’s shape.

  • Skewness measures the asymmetry of a distribution.
  • Kurtosis measures the heaviness of a distribution’s tails relative to a normal distribution .

Difference between skewness and kurtosis

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The t distribution was first described by statistician William Sealy Gosset under the pseudonym “Student.”

To calculate a confidence interval of a mean using the critical value of t , follow these four steps:

  • Choose the significance level based on your desired confidence level. The most common confidence level is 95%, which corresponds to α = .05 in the two-tailed t table .
  • Find the critical value of t in the two-tailed t table.
  • Multiply the critical value of t by s / √ n .
  • Add this value to the mean to calculate the upper limit of the confidence interval, and subtract this value from the mean to calculate the lower limit.

To test a hypothesis using the critical value of t , follow these four steps:

  • Calculate the t value for your sample.
  • Find the critical value of t in the t table .
  • Determine if the (absolute) t value is greater than the critical value of t .
  • Reject the null hypothesis if the sample’s t value is greater than the critical value of t . Otherwise, don’t reject the null hypothesis .

You can use the T.INV() function to find the critical value of t for one-tailed tests in Excel, and you can use the T.INV.2T() function for two-tailed tests.

You can use the qt() function to find the critical value of t in R. The function gives the critical value of t for the one-tailed test. If you want the critical value of t for a two-tailed test, divide the significance level by two.

You can use the RSQ() function to calculate R² in Excel. If your dependent variable is in column A and your independent variable is in column B, then click any blank cell and type “RSQ(A:A,B:B)”.

You can use the summary() function to view the R²  of a linear model in R. You will see the “R-squared” near the bottom of the output.

There are two formulas you can use to calculate the coefficient of determination (R²) of a simple linear regression .

R^2=(r)^2

The coefficient of determination (R²) is a number between 0 and 1 that measures how well a statistical model predicts an outcome. You can interpret the R² as the proportion of variation in the dependent variable that is predicted by the statistical model.

There are three main types of missing data .

Missing completely at random (MCAR) data are randomly distributed across the variable and unrelated to other variables .

Missing at random (MAR) data are not randomly distributed but they are accounted for by other observed variables.

Missing not at random (MNAR) data systematically differ from the observed values.

To tidy up your missing data , your options usually include accepting, removing, or recreating the missing data.

  • Acceptance: You leave your data as is
  • Listwise or pairwise deletion: You delete all cases (participants) with missing data from analyses
  • Imputation: You use other data to fill in the missing data

Missing data are important because, depending on the type, they can sometimes bias your results. This means your results may not be generalizable outside of your study because your data come from an unrepresentative sample .

Missing data , or missing values, occur when you don’t have data stored for certain variables or participants.

In any dataset, there’s usually some missing data. In quantitative research , missing values appear as blank cells in your spreadsheet.

There are two steps to calculating the geometric mean :

  • Multiply all values together to get their product.
  • Find the n th root of the product ( n is the number of values).

Before calculating the geometric mean, note that:

  • The geometric mean can only be found for positive values.
  • If any value in the data set is zero, the geometric mean is zero.

The arithmetic mean is the most commonly used type of mean and is often referred to simply as “the mean.” While the arithmetic mean is based on adding and dividing values, the geometric mean multiplies and finds the root of values.

Even though the geometric mean is a less common measure of central tendency , it’s more accurate than the arithmetic mean for percentage change and positively skewed data. The geometric mean is often reported for financial indices and population growth rates.

The geometric mean is an average that multiplies all values and finds a root of the number. For a dataset with n numbers, you find the n th root of their product.

Outliers are extreme values that differ from most values in the dataset. You find outliers at the extreme ends of your dataset.

It’s best to remove outliers only when you have a sound reason for doing so.

Some outliers represent natural variations in the population , and they should be left as is in your dataset. These are called true outliers.

Other outliers are problematic and should be removed because they represent measurement errors , data entry or processing errors, or poor sampling.

You can choose from four main ways to detect outliers :

  • Sorting your values from low to high and checking minimum and maximum values
  • Visualizing your data with a box plot and looking for outliers
  • Using the interquartile range to create fences for your data
  • Using statistical procedures to identify extreme values

Outliers can have a big impact on your statistical analyses and skew the results of any hypothesis test if they are inaccurate.

These extreme values can impact your statistical power as well, making it hard to detect a true effect if there is one.

No, the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient only tells you how closely your data fit on a line, so two datasets with the same correlation coefficient can have very different slopes.

To find the slope of the line, you’ll need to perform a regression analysis .

Correlation coefficients always range between -1 and 1.

The sign of the coefficient tells you the direction of the relationship: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.

The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.

These are the assumptions your data must meet if you want to use Pearson’s r :

  • Both variables are on an interval or ratio level of measurement
  • Data from both variables follow normal distributions
  • Your data have no outliers
  • Your data is from a random or representative sample
  • You expect a linear relationship between the two variables

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

There are various ways to improve power:

  • Increase the potential effect size by manipulating your independent variable more strongly,
  • Increase sample size,
  • Increase the significance level (alpha),
  • Reduce measurement error by increasing the precision and accuracy of your measurement devices and procedures,
  • Use a one-tailed test instead of a two-tailed test for t tests and z tests.

A power analysis is a calculation that helps you determine a minimum sample size for your study. It’s made up of four main components. If you know or have estimates for any three of these, you can calculate the fourth component.

  • Statistical power : the likelihood that a test will detect an effect of a certain size if there is one, usually set at 80% or higher.
  • Sample size : the minimum number of observations needed to observe an effect of a certain size with a given power level.
  • Significance level (alpha) : the maximum risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Expected effect size : a standardized way of expressing the magnitude of the expected result of your study, usually based on similar studies or a pilot study.

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Statistical analysis is the main method for analyzing quantitative research data . It uses probabilities and models to test predictions about a population from sample data.

The risk of making a Type II error is inversely related to the statistical power of a test. Power is the extent to which a test can correctly detect a real effect when there is one.

To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the significance level to increase statistical power.

The risk of making a Type I error is the significance level (or alpha) that you choose. That’s a value that you set at the beginning of your study to assess the statistical probability of obtaining your results ( p value ).

The significance level is usually set at 0.05 or 5%. This means that your results only have a 5% chance of occurring, or less, if the null hypothesis is actually true.

To reduce the Type I error probability, you can set a lower significance level.

In statistics, a Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s actually false.

In statistics, power refers to the likelihood of a hypothesis test detecting a true effect if there is one. A statistically powerful test is more likely to reject a false negative (a Type II error).

If you don’t ensure enough power in your study, you may not be able to detect a statistically significant result even when it has practical significance. Your study might not have the ability to answer your research question.

While statistical significance shows that an effect exists in a study, practical significance shows that the effect is large enough to be meaningful in the real world.

Statistical significance is denoted by p -values whereas practical significance is represented by effect sizes .

There are dozens of measures of effect sizes . The most common effect sizes are Cohen’s d and Pearson’s r . Cohen’s d measures the size of the difference between two groups while Pearson’s r measures the strength of the relationship between two variables .

Effect size tells you how meaningful the relationship between variables or the difference between groups is.

A large effect size means that a research finding has practical significance, while a small effect size indicates limited practical applications.

Using descriptive and inferential statistics , you can make two types of estimates about the population : point estimates and interval estimates.

  • A point estimate is a single value estimate of a parameter . For instance, a sample mean is a point estimate of a population mean.
  • An interval estimate gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.

Both types of estimates are important for gathering a clear idea of where a parameter is likely to lie.

Standard error and standard deviation are both measures of variability . The standard deviation reflects variability within a sample, while the standard error estimates the variability across samples of a population.

The standard error of the mean , or simply standard error , indicates how different the population mean is likely to be from a sample mean. It tells you how much the sample mean would vary if you were to repeat a study using new samples from within a single population.

To figure out whether a given number is a parameter or a statistic , ask yourself the following:

  • Does the number describe a whole, complete population where every member can be reached for data collection ?
  • Is it possible to collect data for this number from every member of the population in a reasonable time frame?

If the answer is yes to both questions, the number is likely to be a parameter. For small populations, data can be collected from the whole population and summarized in parameters.

If the answer is no to either of the questions, then the number is more likely to be a statistic.

The arithmetic mean is the most commonly used mean. It’s often simply called the mean or the average. But there are some other types of means you can calculate depending on your research purposes:

  • Weighted mean: some values contribute more to the mean than others.
  • Geometric mean : values are multiplied rather than summed up.
  • Harmonic mean: reciprocals of values are used instead of the values themselves.

You can find the mean , or average, of a data set in two simple steps:

  • Find the sum of the values by adding them all up.
  • Divide the sum by the number of values in the data set.

This method is the same whether you are dealing with sample or population data or positive or negative numbers.

The median is the most informative measure of central tendency for skewed distributions or distributions with outliers. For example, the median is often used as a measure of central tendency for income distributions, which are generally highly skewed.

Because the median only uses one or two values, it’s unaffected by extreme outliers or non-symmetric distributions of scores. In contrast, the mean and mode can vary in skewed distributions.

To find the median , first order your data. Then calculate the middle position based on n , the number of values in your data set.

\dfrac{(n+1)}{2}

A data set can often have no mode, one mode or more than one mode – it all depends on how many different values repeat most frequently.

Your data can be:

  • without any mode
  • unimodal, with one mode,
  • bimodal, with two modes,
  • trimodal, with three modes, or
  • multimodal, with four or more modes.

To find the mode :

  • If your data is numerical or quantitative, order the values from low to high.
  • If it is categorical, sort the values by group, in any order.

Then you simply need to identify the most frequently occurring value.

The interquartile range is the best measure of variability for skewed distributions or data sets with outliers. Because it’s based on values that come from the middle half of the distribution, it’s unlikely to be influenced by outliers .

The two most common methods for calculating interquartile range are the exclusive and inclusive methods.

The exclusive method excludes the median when identifying Q1 and Q3, while the inclusive method includes the median as a value in the data set in identifying the quartiles.

For each of these methods, you’ll need different procedures for finding the median, Q1 and Q3 depending on whether your sample size is even- or odd-numbered. The exclusive method works best for even-numbered sample sizes, while the inclusive method is often used with odd-numbered sample sizes.

While the range gives you the spread of the whole data set, the interquartile range gives you the spread of the middle half of a data set.

Homoscedasticity, or homogeneity of variances, is an assumption of equal or similar variances in different groups being compared.

This is an important assumption of parametric statistical tests because they are sensitive to any dissimilarities. Uneven variances in samples result in biased and skewed test results.

Statistical tests such as variance tests or the analysis of variance (ANOVA) use sample variance to assess group differences of populations. They use the variances of the samples to assess whether the populations they come from significantly differ from each other.

Variance is the average squared deviations from the mean, while standard deviation is the square root of this number. Both measures reflect variability in a distribution, but their units differ:

  • Standard deviation is expressed in the same units as the original values (e.g., minutes or meters).
  • Variance is expressed in much larger units (e.g., meters squared).

Although the units of variance are harder to intuitively understand, variance is important in statistical tests .

The empirical rule, or the 68-95-99.7 rule, tells you where most of the values lie in a normal distribution :

  • Around 68% of values are within 1 standard deviation of the mean.
  • Around 95% of values are within 2 standard deviations of the mean.
  • Around 99.7% of values are within 3 standard deviations of the mean.

The empirical rule is a quick way to get an overview of your data and check for any outliers or extreme values that don’t follow this pattern.

In a normal distribution , data are symmetrically distributed with no skew. Most values cluster around a central region, with values tapering off as they go further away from the center.

The measures of central tendency (mean, mode, and median) are exactly the same in a normal distribution.

Normal distribution

The standard deviation is the average amount of variability in your data set. It tells you, on average, how far each score lies from the mean .

In normal distributions, a high standard deviation means that values are generally far from the mean, while a low standard deviation indicates that values are clustered close to the mean.

No. Because the range formula subtracts the lowest number from the highest number, the range is always zero or a positive number.

In statistics, the range is the spread of your data from the lowest to the highest value in the distribution. It is the simplest measure of variability .

While central tendency tells you where most of your data points lie, variability summarizes how far apart your points from each other.

Data sets can have the same central tendency but different levels of variability or vice versa . Together, they give you a complete picture of your data.

Variability is most commonly measured with the following descriptive statistics :

  • Range : the difference between the highest and lowest values
  • Interquartile range : the range of the middle half of a distribution
  • Standard deviation : average distance from the mean
  • Variance : average of squared distances from the mean

Variability tells you how far apart points lie from each other and from the center of a distribution or a data set.

Variability is also referred to as spread, scatter or dispersion.

While interval and ratio data can both be categorized, ranked, and have equal spacing between adjacent values, only ratio scales have a true zero.

For example, temperature in Celsius or Fahrenheit is at an interval scale because zero is not the lowest possible temperature. In the Kelvin scale, a ratio scale, zero represents a total lack of thermal energy.

A critical value is the value of the test statistic which defines the upper and lower bounds of a confidence interval , or which defines the threshold of statistical significance in a statistical test. It describes how far from the mean of the distribution you have to go to cover a certain amount of the total variation in the data (i.e. 90%, 95%, 99%).

If you are constructing a 95% confidence interval and are using a threshold of statistical significance of p = 0.05, then your critical value will be identical in both cases.

The t -distribution gives more probability to observations in the tails of the distribution than the standard normal distribution (a.k.a. the z -distribution).

In this way, the t -distribution is more conservative than the standard normal distribution: to reach the same level of confidence or statistical significance , you will need to include a wider range of the data.

A t -score (a.k.a. a t -value) is equivalent to the number of standard deviations away from the mean of the t -distribution .

The t -score is the test statistic used in t -tests and regression tests. It can also be used to describe how far from the mean an observation is when the data follow a t -distribution.

The t -distribution is a way of describing a set of observations where most observations fall close to the mean , and the rest of the observations make up the tails on either side. It is a type of normal distribution used for smaller sample sizes, where the variance in the data is unknown.

The t -distribution forms a bell curve when plotted on a graph. It can be described mathematically using the mean and the standard deviation .

In statistics, ordinal and nominal variables are both considered categorical variables .

Even though ordinal data can sometimes be numerical, not all mathematical operations can be performed on them.

Ordinal data has two characteristics:

  • The data can be classified into different categories within a variable.
  • The categories have a natural ranked order.

However, unlike with interval data, the distances between the categories are uneven or unknown.

Nominal and ordinal are two of the four levels of measurement . Nominal level data can only be classified, while ordinal level data can be classified and ordered.

Nominal data is data that can be labelled or classified into mutually exclusive categories within a variable. These categories cannot be ordered in a meaningful way.

For example, for the nominal variable of preferred mode of transportation, you may have the categories of car, bus, train, tram or bicycle.

If your confidence interval for a difference between groups includes zero, that means that if you run your experiment again you have a good chance of finding no difference between groups.

If your confidence interval for a correlation or regression includes zero, that means that if you run your experiment again there is a good chance of finding no correlation in your data.

In both of these cases, you will also find a high p -value when you run your statistical test, meaning that your results could have occurred under the null hypothesis of no relationship between variables or no difference between groups.

If you want to calculate a confidence interval around the mean of data that is not normally distributed , you have two choices:

  • Find a distribution that matches the shape of your data and use that distribution to calculate the confidence interval.
  • Perform a transformation on your data to make it fit a normal distribution, and then find the confidence interval for the transformed data.

The standard normal distribution , also called the z -distribution, is a special normal distribution where the mean is 0 and the standard deviation is 1.

Any normal distribution can be converted into the standard normal distribution by turning the individual values into z -scores. In a z -distribution, z -scores tell you how many standard deviations away from the mean each value lies.

The z -score and t -score (aka z -value and t -value) show how many standard deviations away from the mean of the distribution you are, assuming your data follow a z -distribution or a t -distribution .

These scores are used in statistical tests to show how far from the mean of the predicted distribution your statistical estimate is. If your test produces a z -score of 2.5, this means that your estimate is 2.5 standard deviations from the predicted mean.

The predicted mean and distribution of your estimate are generated by the null hypothesis of the statistical test you are using. The more standard deviations away from the predicted mean your estimate is, the less likely it is that the estimate could have occurred under the null hypothesis .

To calculate the confidence interval , you need to know:

  • The point estimate you are constructing the confidence interval for
  • The critical values for the test statistic
  • The standard deviation of the sample
  • The sample size

Then you can plug these components into the confidence interval formula that corresponds to your data. The formula depends on the type of estimate (e.g. a mean or a proportion) and on the distribution of your data.

The confidence level is the percentage of times you expect to get close to the same estimate if you run your experiment again or resample the population in the same way.

The confidence interval consists of the upper and lower bounds of the estimate you expect to find at a given level of confidence.

For example, if you are estimating a 95% confidence interval around the mean proportion of female babies born every year based on a random sample of babies, you might find an upper bound of 0.56 and a lower bound of 0.48. These are the upper and lower bounds of the confidence interval. The confidence level is 95%.

The mean is the most frequently used measure of central tendency because it uses all values in the data set to give you an average.

For data from skewed distributions, the median is better than the mean because it isn’t influenced by extremely large values.

The mode is the only measure you can use for nominal or categorical data that can’t be ordered.

The measures of central tendency you can use depends on the level of measurement of your data.

  • For a nominal level, you can only use the mode to find the most frequent value.
  • For an ordinal level or ranked data, you can also use the median to find the value in the middle of your data set.
  • For interval or ratio levels, in addition to the mode and median, you can use the mean to find the average value.

Measures of central tendency help you find the middle, or the average, of a data set.

The 3 most common measures of central tendency are the mean, median and mode.

  • The mode is the most frequent value.
  • The median is the middle number in an ordered data set.
  • The mean is the sum of all values divided by the total number of values.

Some variables have fixed levels. For example, gender and ethnicity are always nominal level data because they cannot be ranked.

However, for other variables, you can choose the level of measurement . For example, income is a variable that can be recorded on an ordinal or a ratio scale:

  • At an ordinal level , you could create 5 income groupings and code the incomes that fall within them from 1–5.
  • At a ratio level , you would record exact numbers for income.

If you have a choice, the ratio level is always preferable because you can analyze data in more ways. The higher the level of measurement, the more precise your data is.

The level at which you measure a variable determines how you can analyze your data.

Depending on the level of measurement , you can perform different descriptive statistics to get an overall summary of your data and inferential statistics to see if your results support or refute your hypothesis .

Levels of measurement tell you how precisely variables are recorded. There are 4 levels of measurement, which can be ranked from low to high:

  • Nominal : the data can only be categorized.
  • Ordinal : the data can be categorized and ranked.
  • Interval : the data can be categorized and ranked, and evenly spaced.
  • Ratio : the data can be categorized, ranked, evenly spaced and has a natural zero.

No. The p -value only tells you how likely the data you have observed is to have occurred under the null hypothesis .

If the p -value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

The alpha value, or the threshold for statistical significance , is arbitrary – which value you use depends on your field of study.

In most cases, researchers use an alpha of 0.05, which means that there is a less than 5% chance that the data being tested could have occurred under the null hypothesis.

P -values are usually automatically calculated by the program you use to perform your statistical test. They can also be estimated using p -value tables for the relevant test statistic .

P -values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.

If the test statistic is far from the mean of the null distribution, then the p -value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.

The test statistic you use will be determined by the statistical test.

You can choose the right statistical test by looking at what type of data you have collected and what type of relationship you want to test.

The test statistic will change based on the number of observations in your data, how variable your observations are, and how strong the underlying patterns in the data are.

For example, if one data set has higher variability while another has lower variability, the first data set will produce a test statistic closer to the null hypothesis , even if the true correlation between two variables is the same in either data set.

The formula for the test statistic depends on the statistical test being used.

Generally, the test statistic is calculated as the pattern in your data (i.e. the correlation between variables or difference between groups) divided by the variance in the data (i.e. the standard deviation ).

  • Univariate statistics summarize only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

In statistics, model selection is a process researchers use to compare the relative value of different statistical models and determine which one is the best fit for the observed data.

The Akaike information criterion is one of the most common methods of model selection. AIC weights the ability of the model to predict the observed data against the number of parameters the model requires to reach that level of precision.

AIC model selection can help researchers find a model that explains the observed variation in their data while avoiding overfitting.

In statistics, a model is the collection of one or more independent variables and their predicted interactions that researchers use to try to explain variation in their dependent variable.

You can test a model using a statistical test . To compare how well different models fit your data, you can use Akaike’s information criterion for model selection.

The Akaike information criterion is calculated from the maximum log-likelihood of the model and the number of parameters (K) used to reach that likelihood. The AIC function is 2K – 2(log-likelihood) .

Lower AIC values indicate a better-fit model, and a model with a delta-AIC (the difference between the two AIC values being compared) of more than -2 is considered significantly better than the model it is being compared to.

The Akaike information criterion is a mathematical test used to evaluate how well a model fits the data it is meant to describe. It penalizes models which use more independent variables (parameters) as a way to avoid over-fitting.

AIC is most often used to compare the relative goodness-of-fit among different models under consideration and to then choose the model that best fits the data.

A factorial ANOVA is any ANOVA that uses more than one categorical independent variable . A two-way ANOVA is a type of factorial ANOVA.

Some examples of factorial ANOVAs include:

  • Testing the combined effects of vaccination (vaccinated or not vaccinated) and health status (healthy or pre-existing condition) on the rate of flu infection in a population.
  • Testing the effects of marital status (married, single, divorced, widowed), job status (employed, self-employed, unemployed, retired), and family history (no family history, some family history) on the incidence of depression in a population.
  • Testing the effects of feed type (type A, B, or C) and barn crowding (not crowded, somewhat crowded, very crowded) on the final weight of chickens in a commercial farming operation.

In ANOVA, the null hypothesis is that there is no difference among group means. If any group differs significantly from the overall group mean, then the ANOVA will report a statistically significant result.

Significant differences among group means are calculated using the F statistic, which is the ratio of the mean sum of squares (the variance explained by the independent variable) to the mean square error (the variance left over).

If the F statistic is higher than the critical value (the value of F that corresponds with your alpha value, usually 0.05), then the difference among groups is deemed statistically significant.

The only difference between one-way and two-way ANOVA is the number of independent variables . A one-way ANOVA has one independent variable, while a two-way ANOVA has two.

  • One-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka) and race finish times in a marathon.
  • Two-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka), runner age group (junior, senior, master’s), and race finishing times in a marathon.

All ANOVAs are designed to test for differences among three or more groups. If you are only testing for a difference between two groups, use a t-test instead.

Multiple linear regression is a regression model that estimates the relationship between a quantitative dependent variable and two or more independent variables using a straight line.

Linear regression most often uses mean-square error (MSE) to calculate the error of the model. MSE is calculated by:

  • measuring the distance of the observed y-values from the predicted y-values at each value of x;
  • squaring each of these distances;
  • calculating the mean of each of the squared distances.

Linear regression fits a line to the data by finding the regression coefficient that results in the smallest MSE.

Simple linear regression is a regression model that estimates the relationship between one independent variable and one dependent variable using a straight line. Both variables should be quantitative.

For example, the relationship between temperature and the expansion of mercury in a thermometer can be modeled using a straight line: as temperature increases, the mercury expands. This linear relationship is so certain that we can use mercury thermometers to measure temperature.

A regression model is a statistical model that estimates the relationship between one dependent variable and one or more independent variables using a line (or a plane in the case of two or more independent variables).

A regression model can be used when the dependent variable is quantitative, except in the case of logistic regression, where the dependent variable is binary.

A t-test should not be used to measure differences among more than two groups, because the error structure for a t-test will underestimate the actual error when many groups are being compared.

If you want to compare the means of several groups at once, it’s best to use another statistical test such as ANOVA or a post-hoc test.

A one-sample t-test is used to compare a single population to a standard value (for example, to determine whether the average lifespan of a specific town is different from the country average).

A paired t-test is used to compare a single population before and after some experimental intervention or at two different points in time (for example, measuring student performance on a test before and after being taught the material).

A t-test measures the difference in group means divided by the pooled standard error of the two group means.

In this way, it calculates a number (the t-value) illustrating the magnitude of the difference between the two group means being compared, and estimates the likelihood that this difference exists purely by chance (p-value).

Your choice of t-test depends on whether you are studying one group or two groups, and whether you care about the direction of the difference in group means.

If you are studying one group, use a paired t-test to compare the group mean over time or after an intervention, or use a one-sample t-test to compare the group mean to a standard value. If you are studying two groups, use a two-sample t-test .

If you want to know only whether a difference exists, use a two-tailed test . If you want to know if one group mean is greater or less than the other, use a left-tailed or right-tailed one-tailed test .

A t-test is a statistical test that compares the means of two samples . It is used in hypothesis testing , with a null hypothesis that the difference in group means is zero and an alternate hypothesis that the difference in group means is different from zero.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

A test statistic is a number calculated by a  statistical test . It describes how far your observed data is from the  null hypothesis  of no relationship between  variables or no difference among sample groups.

The test statistic tells you how different two or more groups are from the overall population mean , or how different a linear slope is from the slope predicted by a null hypothesis . Different test statistics are used in different statistical tests.

Statistical tests commonly assume that:

  • the data are normally distributed
  • the groups that are being compared have similar variance
  • the data are independent

If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences.

Ask our team

Want to contact us directly? No problem.  We  are always here for you.

Support team - Nina

Our team helps students graduate by offering:

  • A world-class citation generator
  • Plagiarism Checker software powered by Turnitin
  • Innovative Citation Checker software
  • Professional proofreading services
  • Over 300 helpful articles about academic writing, citing sources, plagiarism, and more

Scribbr specializes in editing study-related documents . We proofread:

  • PhD dissertations
  • Research proposals
  • Personal statements
  • Admission essays
  • Motivation letters
  • Reflection papers
  • Journal articles
  • Capstone projects

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.

You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .

  • Math Article

Class Registration Banner

In Statistics, the researcher checks the significance of the observed result, which is known as test static . For this test, a hypothesis test is also utilized. The P-value  or probability value concept is used everywhere in statistical analysis. It determines the statistical significance and the measure of significance testing. In this article, let us discuss its definition, formula, table, interpretation and how to use P-value to find the significance level etc. in detail.

Table of Contents:

P-value Definition

The P-value is known as the probability value. It is defined as the probability of getting a result that is either the same or more extreme than the actual observations. The P-value is known as the level of marginal significance within the hypothesis testing that represents the probability of occurrence of the given event. The P-value is used as an alternative to the rejection point to provide the least significance at which the null hypothesis would be rejected. If the P-value is small, then there is stronger evidence in favour of the alternative hypothesis.

P-value Table

The P-value table shows the hypothesis interpretations:

P-value > 0.05

The result is not statistically significant and hence don’t reject the null hypothesis.

P-value < 0.05

The result is statistically significant. Generally, reject the null hypothesis in favour of the alternative hypothesis.

P-value < 0.01

The result is highly statistically significant, and thus rejects the null hypothesis in favour of the alternative hypothesis.

Generally, the level of statistical significance is often expressed in p-value and the range between 0 and 1. The smaller the p-value, the stronger the evidence and hence, the result should be statistically significant. Hence, the rejection of the null hypothesis is highly possible, as the p-value becomes smaller.

Let us look at an example to better comprehend the concept of P-value.

Let’s say a researcher flips a coin ten times with the null hypothesis that it is fair. The total number of heads is the test statistic, which is two-tailed. Assume the researcher notices alternating heads and tails on each flip (HTHTHTHTHT). As this is the predicted number of heads, the test statistic is 5 and the p-value is 1 (totally unexceptional).

Assume that the test statistic for this research was the “number of alternations” (i.e., the number of times H followed T or T followed H), which is two-tailed once again. This would result in a test statistic of 9, which is extremely high and has a p-value of 1/2 8 = 1/256, or roughly 0.0039. This would be regarded as extremely significant, much beyond the 0.05 level. These findings suggest that the data set is exceedingly improbable to have happened by random in terms of one test statistic, yet they do not imply that the coin is biased towards heads or tails.

The data have a high p-value according to the first test statistic, indicating that the number of heads observed is not impossible. The data have a low p-value according to the second test statistic, indicating that the pattern of flips observed is extremely unlikely. There is no “alternative hypothesis,” (therefore only the null hypothesis can be rejected), and such evidence could have a variety of explanations – the data could be falsified, or the coin could have been flipped by a magician who purposefully swapped outcomes.

This example shows that the p-value is entirely dependent on the test statistic used and that p-values can only be used to reject a null hypothesis, not to explore an alternate hypothesis.

P-value Formula

We Know that P-value is a statistical measure, that helps to determine whether the hypothesis is correct or not. P-value is a number that lies between 0 and 1. The level of significance(α) is a predefined threshold that should be set by the researcher. It is generally fixed as 0.05. The formula for the calculation for P-value is

Step 1: Find out the test static Z is

P0 = assumed population proportion in the null hypothesis

N = sample size

Step 2: Look at the Z-table to find the corresponding level of P from the z value obtained.

P-Value Example

An example to find the P-value is given here.

Question: A statistician wants to test the hypothesis H 0 : μ = 120 using the alternative hypothesis Hα: μ > 120 and assuming that α = 0.05. For that, he took the sample values as

n =40, σ = 32.17 and x̄ = 105.37. Determine the conclusion for this hypothesis?

We know that,

Now substitute the given values

Now, using the test static formula, we get

t = (105.37 – 120) / 5.0865

Therefore, t = -2.8762

Using the Z-Score table , we can find the value of P(t>-2.8762)

From the table, we get

P (t<-2.8762) = P(t>2.8762) = 0.003

If P(t>-2.8762) =1- 0.003 =0.997

P- value =0.997 > 0.05

Therefore, from the conclusion, if p>0.05, the null hypothesis is accepted or fails to reject.

Hence, the conclusion is “fails to reject H 0. ”

Frequently Asked Questions on P-Value

What is meant by p-value.

The p-value is defined as the probability of obtaining the result at least as extreme as the observed result of a statistical hypothesis test, assuming that the null hypothesis is true.

What does a smaller P-value represent?

The smaller the p-value, the greater the statistical significance of the observed difference, which results in the rejection of the null hypothesis in favour of alternative hypotheses.

What does the p-value greater than 0.05 represent?

If the p-value is greater than 0.05, then the result is not statistically significant.

Can the p-value be greater than 1?

P-value means probability value, which tells you the probability of achieving the result under a certain hypothesis. Since it is a probability, its value ranges between 0 and 1, and it cannot exceed 1.

What does the p-value less than 0.05 represent?

If the p-value is less than 0.05, then the result is statistically significant, and hence we can reject the null hypothesis in favour of the alternative hypothesis.

MATHS Related Links

what is p value in research methodology

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

  • Return to Zillow.com

What is a Zestimate?

The Zestimate® home valuation model is Zillow’s estimate of a home’s market value. A Zestimate incorporates public, MLS and user-submitted data into Zillow’s proprietary formula, also taking into account home facts, location and market trends. It is not an appraisal and can’t be used in place of an appraisal.

How accurate is the Zestimate?

The nationwide median error rate for the Zestimate for on-market homes is 2.4%, while the Zestimate for off-market homes has a median error rate of 7.49%. The Zestimate’s accuracy depends on the availability of data in a home’s area. Some areas have more detailed home information available — such as square footage and number of bedrooms or bathrooms — and others do not. The more data available, the more accurate the Zestimate value will be. 

These tables break down the accuracy of Zestimates for both active listings and off-market listings.

Active listings accuracy

Last updated: April 27, 2023

Note: The Zestimate’s accuracy is computed by comparing the final sale price to the Zestimate that was published on or just prior to the sale date.

Download an Excel spreadsheet of this data .

How is the Zestimate calculated?

Zillow publishes Zestimate home valuations for 104 million homes across the country, and uses state of the art statistical and machine learning models that can examine hundreds of data points for each individual home.

To calculate a Zestimate, Zillow uses a sophisticated neural network-based model that incorporates data from county and tax assessor records and direct feeds from hundreds of multiple listing services and brokerages. The Zestimate also incorporates:

  • Home characteristics including square footage, location or the number of bathrooms.
  • On-market data such as listing price, description, comparable homes in the area and days on the market
  • Off-market data — tax assessments, prior sales and other publicly available records
  • Market trends, including seasonal changes in demand

Currently, we have data for over 110 million U.S. homes and we publish Zestimates for 104 million of them.

What changes are in the latest Zestimate?

The latest Zestimate model is our most accurate Zestimate yet. It’s based on a neural network model and uses even more historical data to produce off-market home valuations. This means the Zestimate is more responsive to market trends & seasonality that may affect a home’s market value. We also reduced overall errors and processing time in the Zestimate.

My Zestimate seems too low or too high. What gives?

The amount of data we have for your home and homes in your area directly affects the Zestimate’s accuracy, including the amount of demand in your area for homes. If the data is incorrect or incomplete, update your home facts — this may affect your Zestimate. To ensure the most accurate Zestimate, consider reporting any home updates to your local tax assessor. Unreported additions, updates and remodels aren’t reflected in the Zestimate.

Check that your tax history and price history (the sale price and date you bought your home) are accurate on Zillow. If data is missing or incorrect, let us know .

Be aware that the model that creates the Zestimate factors in changing market trends, including seasonal fluctuations in demand. So in some cases that may be the reason for a change in your Zestimate.

I just listed my home for sale. Why did my Zestimate change?

When a home goes on the market, new data can be incorporated into the Zestimate algorithm. In the simplest terms, the Zestimate for on-market homes includes listing data that provides valuable signals about the home’s eventual sale price. This data isn’t available for off-market homes.

My home is on the market. Why is the Zestimate so far off?

Properties that have been listed for a full year transition to off-market valuations because they have been listed longer than normal for that local market. This can result in a large difference between the list price and the Zestimate.

I just changed my home facts. When will my Zestimate update?

Updates to your home facts are factored into the Zestimate. However, if the updates are not significant enough to affect the home’s value (eg: paint colors), your Zestimate may not change. Zestimates for all homes update multiple times per week, but on rare occasions this schedule is interrupted by algorithmic changes or new analytical features.

How are changes to my home facts (like an additional bedroom or bathroom) valued?

The Zestimate is based on complex and proprietary algorithms that can incorporate millions of data points. The algorithms determine the approximate added value that an additional bedroom or bathroom contributes, though the amount of the change depends on many factors, including local market trends, location and other home facts.

Is the Zestimate an appraisal?

No. The Zestimate is not an appraisal and can’t be used in place of an appraisal. It is a computer-generated estimate of the value of a home today, given the available data.

We encourage buyers, sellers and homeowners to supplement the Zestimate with other research, such as visiting the home, getting a professional appraisal of the home, or requesting a comparative market analysis (CMA) from a real estate agent.

Why do I see home values for the past?

We generate historical Zestimates for most homes if we have sufficient data to do so.

Do you ever change historical Zestimates?

We occasionally recalculate historical Zestimate values along with major data upgrades or improvements to the algorithm.  These recalculations are based on a variety of considerations and, therefore, not every new algorithm release will get a corresponding update of historical values.

However, we never allow future information to influence a historical Zestimate (for example, a sale in 2019 could not influence a 2018 Zestimate). Historical Zestimates only use information known prior to the date of that Zestimate.

Does the Zestimate algorithm ever change?

Yes — Zillow’s team of researchers and engineers work every day to make the Zestimate more accurate. Since Zillow’s founding in 2006, we have deployed multiple major Zestimate algorithm updates and other incremental improvements are consistently released between major upgrades.

How often are Zestimates for homes updated?

We refresh Zestimates for all homes multiple times per week, but on rare occasions this schedule is interrupted by algorithmic changes or new analytical features.

Are foreclosure sales included in the Zestimate algorithm?

No. The Zestimate is intended to provide an estimate of the price that a home would fetch if sold for its full value, where the sale isn’t for partial ownership of the property or between family members. Our extensive analysis of foreclosure resale transactions supports the conclusion that these sales are generally made at substantial discounts compared to non-foreclosure sales. For this reason, the Zestimate does not incorporate data about these sales.

Who calculates the Zestimate? Can someone tamper with my home’s Zestimate?

The Zestimate is an automated valuation model calculated by a software process. It’s not possible to manually alter the Zestimate for a specific property.

Can the Zestimate be updated?

Yes. The Zestimate’s accuracy depends on the amount of data we have for the home. Public records can be outdated or lag behind what homeowners and real estate agents know about a property, so it’s best to update your home facts and fix any incorrect or incomplete information — this will help make your Zestimate as accurate as possible.

You can also add info about the architectural style, roof type, heat source, building amenities and more. Remember: updating home information doesn’t guarantee an increase in the value of Zestimate, but will increase the Zestimate’s accuracy.

Does Zillow delete Zestimates? Can I have my Zestimate reviewed if I believe there are errors?

We do not delete Zestimates. However, for some homes we may not have enough data to provide a home valuation that meets our standards for accuracy. In these instances, we do not publish the Zestimate until more data can be obtained. The Zestimate is designed to be a neutral estimate of the fair market value of a home, based on publicly available and user-submitted data. For this purpose, it is important that the Zestimate is based on information about all homes (e.g., beds, baths, square footage, lot size, tax assessment, prior sale price) and that the algorithm itself is consistently applied to all homes in a similar manner.

I don’t know of any homes that have sold recently in my area. How are you calculating my Zestimate?

Zestimates rely on much more than comparable sales in a given area. The home’s physical attributes, historical information and on-market data all factor into the final calculation. The more we know about homes in an area (including your home), the better the Zestimate. Our models can find neighborhoods similar to yours and use sales in those areas to extrapolate trends in your housing market. Our estimating method differs from that of a comparative market analysis completed by a real estate agent. We use data from a geographical area that is much larger than your neighborhood — up to the size of a county — to help calculate the Zestimate. Though there may not be any recent sales in your neighborhood, even a few sales in the area allow us to extrapolate trends in the local housing market.

I’m trying to sell my home and I think my Zestimate should be higher.

The Zestimate was created to give customers more information about homes and the housing market. It is intended to provide user-friendly data to promote transparent real estate markets and allow people to make more informed decisions — it should not be used to drive up the price of a home. Zestimates are designed to track the market, not drive it.

Can I use the Zestimate to get a loan?

No. The Zestimate is an automated value model and not an appraisal. Most lending professionals and institutions will only use professional appraisals when making loan-related decisions.

I have two Zestimates for my home. How do I fix this?

If you see two Zestimates for the same property, please let us know by visiting the Zillow Help Center and s e lecting Submit a request. You may see more than one Zestimate for your address if you are a homeowner with multiple parcels of land. Zillow matches the parcels on record with the county. If you officially combine parcels, the county will send us updated information.

What’s the Estimated Sale Range?

While the Zestimate is the estimated market value for an individual home, the Estimated Sale Range describes the range in which a sale price is predicted to fall, including low and high estimated values. For example, a Zestimate may be $260,503, while the Estimated Sale Range is $226,638 to $307,394. This range can vary for different homes and regions. A wider range generally indicates a more uncertain Zestimate, which might be the result of unique home factors or less data available for the region or that particular home. It’s important to consider the size of the Estimated Sale Range because it offers important context about the Zestimate’s anticipated accuracy.

How can real estate professionals work with the Zestimate?

Millions of consumers visit Zillow every month. When combined with the guidance of real estate professionals, the Zestimate can help consumers make more informed financial decisions about their homes. Real estate professionals can also help their clients claim their home on Zillow, update the home facts and account for any work they have done on the property. A home’s Zillow listing is often the first impression for prospective buyers, and accurate information helps attract interest.

  • Real Estate
  • Browse all homes
  • Albuquerque real estate
  • Atlanta real estate
  • Austin real estate
  • Baltimore real estate
  • Boston real estate
  • Charlotte real estate
  • Chicago real estate
  • Cleveland real estate
  • Colorado Springs real estate
  • Columbus real estate
  • Dallas real estate
  • Denver real estate
  • Detroit real estate
  • El Paso real estate
  • Fort Worth real estate
  • Fresno real estate
  • Houston real estate
  • Indianapolis real estate
  • Jacksonville real estate
  • Kansas City real estate
  • Las Vegas real estate
  • Long Beach real estate
  • Los Angeles real estate
  • Louisville real estate
  • Memphis real estate
  • Mesa real estate
  • Miami real estate
  • Milwaukee real estate
  • Minneapolis real estate
  • Nashville real estate
  • New Orleans real estate
  • New York real estate
  • Oakland real estate
  • Oklahoma real estate
  • Omaha real estate
  • Philadelphia real estate
  • Phoenix real estate
  • Portland real estate
  • Raleigh real estate
  • Sacramento real estate
  • San Antonio real estate
  • San Diego real estate
  • San Francisco real estate
  • San Jose real estate
  • Seattle real estate
  • Tucson real estate
  • Tulsa real estate
  • Virginia Beach real estate
  • Washington DC real estate
  • Wichita real estate
  • Rental Buildings
  • Atlanta apartments for rent
  • Austin apartments for rent
  • Baltimore apartments for rent
  • Boston apartments for rent
  • Charlotte apartments for rent
  • Chicago apartments for rent
  • Dallas apartments for rent
  • Denver apartments for rent
  • Houston apartments for rent
  • Jersey City apartments for rent
  • Long Beach apartments for rent
  • Miami apartments for rent
  • Minneapolis apartments for rent
  • New York City apartments for rent
  • Bronx NYC apartments for rent
  • Brooklyn NYC apartments for rent
  • Manhattan NYC apartments for rent
  • Queens NYC apartments for rent
  • Oakland apartments for rent
  • Oklahoma City apartments for rent
  • Philadelphia apartments for rent
  • Sacramento apartments for rent
  • San Francisco apartments for rent
  • Seattle apartments for rent
  • Washington DC apartments for rent
  • Atlanta houses for rent
  • Austin houses for rent
  • Boston houses for rent
  • Charlotte houses for rent
  • Columbus houses for rent
  • Fort Worth houses for rent
  • Fresno houses for rent
  • Indianapolis houses for rent
  • Jacksonville houses for rent
  • Las Vegas houses for rent
  • Houston houses for rent
  • Memphis houses for rent
  • Milwaukee houses for rent
  • Nashville houses for rent
  • Oakland houses for rent
  • Oklahoma City houses for rent
  • Philadelphia houses for rent
  • Phoenix houses for rent
  • Portland houses for rent
  • San Antonio houses for rent
  • San Francisco houses for rent
  • San Jose houses for rent
  • Tampa houses for rent
  • Tucson houses for rent
  • Washington DC houses for rent
  • Mortgage Rates
  • Current mortgage rates
  • Alaska mortgage rates
  • Alabama mortgage rates
  • Arkansas mortgage rates
  • Arizona mortgage rates
  • California mortgage rates
  • Colorado mortgage rates
  • Connecticut mortgage rates
  • Delaware mortgage rates
  • Florida mortgage rates
  • Georgia mortgage rates
  • Hawaii mortgage rates
  • Iowa mortgage rates
  • Idaho mortgage rates
  • Illinois mortgage rates
  • Indiana mortgage rates
  • Kansas mortgage rates
  • Kentucky mortgage rates
  • Louisiana mortgage rates
  • Massachusetts mortgage rates
  • Maryland mortgage rates
  • Maine mortgage rates
  • Michigan mortgage rates
  • Minnesota mortgage rates
  • Missouri mortgage rates
  • Mississippi mortgage rates
  • Montana mortgage rates
  • North Carolina mortgage rates
  • North Dakota mortgage rates
  • Nebraska mortgage rates
  • New Hampshire mortgage rates
  • New Jersey mortgage rates
  • New Mexico mortgage rates
  • Nevada mortgage rates
  • New York mortgage rates
  • Ohio mortgage rates
  • Oklahoma mortgage rates
  • Oregon mortgage rates
  • Pennsylvania mortgage rates
  • Rhode Island mortgage rates
  • South Carolina mortgage rates
  • South Dakota mortgage rates
  • Tennessee mortgage rates
  • Texas mortgage rates
  • Utah mortgage rates
  • Virginia mortgage rates
  • Vermont mortgage rates
  • Washington mortgage rates
  • Wisconsin mortgage rates
  • West Virginia mortgage rates
  • Wyoming mortgage rates
  • Browse Homes
  • Pennsylvania
  • North Carolina
  • Massachusetts
  • South Carolina
  • British Columbia
  • Connecticut
  • Puerto Rico
  • Mississippi
  • West Virginia
  • New Hampshire
  • Saskatchewan
  • Rhode Island
  • Nova Scotia
  • South Dakota
  • New Brunswick
  • North Dakota
  • Washington, DC
  • Newfoundland and Labrador
  • Prince Edward Island
  • Northwest Territories
  • Northern Mariana Islands
  • American Samoa
  • Virgin Islands

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Ann Ib Postgrad Med
  • v.6(1); 2008 Jun

P – VALUE, A TRUE TEST OF STATISTICAL SIGNIFICANCE? A CAUTIONARY NOTE

While it’s not the intention of the founders of significance testing and hypothesis testing to have the two ideas intertwined as if they are complementary, the inconvenient marriage of the two practices into one coherent, convenient, incontrovertible and misinterpreted practice has dotted our standard statistics textbooks and medical journals. This paper examine factors contributing to this practice, traced the historical evolution of the Fisherian and Neyman-Pearsonian schools of hypothesis testing, exposed the fallacies and the uncommon ground and common grounds approach to the problem. Finally, it offers recommendations on what is to be done to remedy the situation.

INTRODUCTION

The medical journals are replete with P values and tests of hypotheses. It is a common practice among medical researchers to quote whether the test of hypothesis they carried out is significant or non-significant and many researchers get very excited when they discover a “statistically significant” finding without really understanding what it means. Additionally, while medical journals are florid of statement such as: “statistical significant”, “unlikely due to chance”, “not significant,” “due to chance”, or notations such as, “P > 0.05”, “P < 0.05”, the decision on whether to decide a test of hypothesis is significant or not based on P value has generated an intense debate among statisticians. It began among founders of statistical inference more than 60 years ago 1 - 3 . One contributing factor for this is that the medical literature shows a strong tendency to accentuate the positive findings; many researchers would like to report positive findings based on previously reported researches as “non-significant results should not take up” journal space 4 - 7 .

The idea of significance testing was introduced by R.A. Fisher, but over the past six decades its utility, understanding and interpretation has been misunderstood and generated so much scholarly writings to remedy the situation 3 . Alongside the statistical test of hypothesis is the P value, which similarly, its meaning and interpretation has been misused. To delve well into the subject matter, a short history of the evolution of statistical test of hypothesis is warranted to clear some misunderstanding.

A Brief History of P Value and Significance Testing

Significance testing evolved from the idea and practice of the eminent statistician, R.A. Fisher in the 1930s. His idea is simple: suppose we found an association between poverty level and malnutrition among children under the age of five years. This is a finding, but could it be a chance finding? Or perhaps we want to evaluate whether a new nutrition therapy improves nutritional status of malnourished children. We study a group of malnourished children treated with the new therapy and a comparable group treated with old nutritional therapy and find in the new therapy group an improvement of nutritional status by 2 units over the old therapy group. This finding will obviously, be welcomed but it is also possible that this finding is purely due to chance. Thus, Fisher saw P value as an index measuring the strength of evidence against the null hypothesis (in our examples, the hypothesis that there is no association between poverty level and malnutrition or the new therapy does not improve nutritional status). To quantify the strength of evidence against null hypothesis “he advocated P < 0.05 (5% significance) as a standard level for concluding that there is evidence against the hypothesis tested, though not as an absolute rule’’ 8 . Fisher did not stop there but graded the strength of evidence against null hypothesis. He proposed “if P is between 0.1 and 0.9 there is certainly no reason to suspect the hypothesis tested. If it’s below 0.02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at 0.05’’ 9 . Since Fisher made this statement over 60 years ago, 0.05 cut-off point has been used by medical researchers worldwide and has become ritualistic to use 0.05 cut-off mark as if other cut-off points cannot be used. Through the 1960s it was a standard practice in many fields to report P values with the star attached to indicate P < 0.05 and two stars to indicate P < 0.01. Occasionally three stars were used to indicate P < 0.001. While Fisher developed this practice of quantifying the strength of evidence against null hypothesis some eminent statisticians where not accustomed to the subjective interpretation inherent in the method 7 . This led Jerzy Neyman and Egon Pearson to propose a new approach which they called “Hypothesis tests”. They argued that there were two types of error that could be made in interpreting the results of an experiment as shown in Table ​ Table1 1 .

Errors associated with results of experiment.

The truth
Result of experimentNull hypothesis trueNull hypothesis false
Reject null hypothesisType I error rate(α)Power = 1- β
Accept null hypothesisCorrect decisionType II error rate (β)

The outcome of the hypothesis test is one of two: to reject one hypothesis and to accept the other. Adopting this practice exposes one to two types of errors: reject null hypothesis when it should be accepted (i.e., the two therapies differ when they are actually the same, also known as a false-positive result, a type I error or an alpha error) or accept null hypothesis when it should have rejected (i.e. concluding that they are the same when in fact they differ, also known as a false-negative result, type II error or a beta error).

What does P value Mean?

The P value is defined as the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed. The P stands for probability and measures how likely it is that any observed difference between groups is due to chance. Being a probability, P can take any value between 0 and 1 . Values close to 0 indicate that the observed difference is unlikely to be due to chance, whereas a P value close to 1 suggests no difference between the groups other than due to chance. Thus, it is common in medical journals to see adjectives such as “highly significant” or “very significant” after quoting the P value depending on how close to zero the value is.

Before the advent of computers and statistical software, researchers depended on tabulated values of P to make decisions. This practice is now obsolete and the use of exact P value is much preferred. Statistical software can give the exact P value and allows appreciation of the range of values that P can take up between 0 and 1. Briefly, for example, weights of 18 subjects were taken from a community to determine if their body weight is ideal (i.e. 100kg). Using student’s t test, t turned out to be 3.76 at 17 degree of freedom. Comparing t stat with the tabulated values, t= 3.26 is more than the critical value of 2.11 at p=0.05 and therefore falls in the rejection zone. Thus we reject null hypothesis that ì = 100 and conclude that the difference is significant. But using an SPSS (a statistical software), the following information came when the data were entered, t = 3.758, P = 0.0016, mean difference = 12.78 and confidence intervals are 5.60 and 19.95. Methodologists are now increasingly recommending that researchers should report the precise P value. For example, P = 0.023 rather than P < 0.05 10 . Further, to use P = 0.05 “is an anachronism. It was settled on when P values were hard to compute and so some specific values needed to be provided in tables. Now calculating exact P values is easy (i.e., the computer does it) and so the investigator can report (P = 0.04) and leave it to the reader to (determine its significance)” 11 .

Hypothesis Tests

A statistical test provides a mechanism for making quantitative decisions about a process or processes. The purpose is to make inferences about population parameter by analyzing differences between observed sample statistic and the results one expects to obtain if some underlying assumption is true. This comparison may be a single obser ved value versus some hypothesized quantity or it may be between two or more related or unrelated groups. The choice of statistical test depends on the nature of the data and the study design.

Neyman and Pearson proposed this process to circumvent Fisher’s subjective practice of assessing strength of evidence against the null effect. In its usual form, two hypotheses are put forward: a null hypothesis (usually a statement of null effect) and an alternative hypothesis (usually the opposite of null hypothesis). Based on the outcome of the hypothesis test one hypothesis is rejected and accept the other based on a previously predetermined arbitrary benchmark. This bench mark is designated the P value. However, one runs into making an error: one may reject one hypothesis when in fact it should be accepted and vise versa. There is type I error or á error (i.e., there was no difference but really there was) and type II error or â error (i.e., when there was difference when actually there was none). In its simple format, testing hypothesis involves the following steps:

  • Identify null and alternative hypotheses.
  • Determine the appropriate test statistic and its distribution under the assumption that the null hypothesis is true.
  • Specify the significance level and determine the corresponding critical value of the test statistic under the assumption that null hypothesis is true.
  • Calculate the test statistic from the data. Having discussed P value and hypothesis testing, fallacies of hypothesis testing and P value are now looked into.

Fallacies of Hypothesis Testing

In a paper I submitted for publication in one of the widely read medical journals in Nigeria, one of the reviewers commented on the age-sex distribution of the participants, “Is there any difference in sex distribution, subject to chi square statistics”? Statistically, this question does not convey any query and this is one of many instances among medical researchers (postgraduate supervisors alike) in which test of hypothesis is quickly and spontaneously resorted to without due consideration to its appropriate application. The aim of my research was to determine the prevalence of diabetes mellitus in a rural community; it was not part of my objectives to determine any association between sex and prevalence of diabetes mellitus. To the inexperienced, this comment will definitely prompt conducting test of hypothesis simply to satisfy the editor and reviewer such that the article will sail through. However, the results of such statistical tests becomes difficult to understand and interprete in the light of the data. (The result of study turned out that all those with elevated fasting blood glucose are females). There are several fallacies associated with hypothesis testing. Below is a small list that will help avoid these fallacies.

  • Failure to reject null hypothesis leads to its acceptance. ( No. When you fail to reject null hypothesis it means there is insufficient evidence to reject)
  • The use of á = 0.05 is a standard with an objective basis ( No. á = 0.05 is merely a convention that evolved from the practice of R.A. Fisher. There is no sharp distinction between “significant” and “not significant” results, only increasing strong evidence against null hypothesis as P becomes smaller. (P=0.02 is stronger than P=0.04)
  • Small P value indicates large effects ( No. P value does not tell anything about size of an effect)
  • Statistical significance implies clinical importance. ( No. Statistical significance says very little about the clinical importance of relation. There is a big gulf of difference between statistical significance and clinical significance. By statistical definition at á = 0.05, it means that 1 in 20 comparisons in which null hypothesis is true will result in P < 0.05!. Finally, with these and many fallacies of hypothesis testing, it is rather sad to read in journals how significance testing has become an insignificance testing.

Fallacies of P Value

Just as test of hypothesis is associated with some fallacies so also is P value with common root causes, “ It comes to be seen as natural that any finding worth its salt should have a P value less than 0.05 flashing like a divinely appointed stamp of approval’’ 12 . The inherent subjectivity of Fisher’s P value approach and the subsequent poor understanding of this approach by the medical community could be the reason why P value is associated with myriad of fallacies. Thirdly, P value produced by researchers as mere ‘’passports to publication’’ aggravated the situation 13 . We were earlier on awakened to the inadequacy of the P value in clinical trials by Feinstein 14 ,

“The method of making statistical decisions about ‘significance’ creates one of the most devastating ironies in modern biologic science. To avoid usual categorical data, a critical investigator will usually go to enormous efforts in mensuration. He will get special machines and elaborate technologic devices to supplement his old categorical statement with new measurements of ‘continuous’ dimensional data. After all this work in getting ‘continuous’ data, however, and after calculating all the statistical tests of the data, the investigator then makes the final decision about his results on the basis of a completely arbitrary pair of dichotomous categories. These categories, which are called ‘significant’ and ‘nonsignificant’, are usually demarcated by a P value of either 0.05 or 0.01, chosen according to the capricious dictates of the statistician, the editor, the reviewer or the granting agency. If the level demanded for ‘significant’ is 0.05 or lower and the P value that emerge is 0.06, the investigator may be ready to discard a well-designed, excellently conducted, thoughtfully analyzed, and scientifically important experiment because it failed to cross the Procrustean boundary demanded for statistical approbation.

We should try to understand that Fisher wanted to have an index of measurement that will help him to decide the strength of evidence against null effect. But as it has been said earlier his idea was poorly understood and criticized and led to Neyman and Pearson to develop hypothesis testing in order to go round the problem. But, this is the result of their attempt: “accept” or “reject” null hypothesis or alternatively “significant” or “non significant”. The inadequacy of P value in decision making pervades all epidemiological study design. This head-or-tail approach to test of hypothesis has pushed the stakeholders in the field (statistician, editor, reviewer or granting agency) into an ever increasing confusion and difficulty. It is an accepted fact among statisticians of the inadequacy of P value as a sole standard judgment in the analysis of clinical trials 15 . Just as hypothesis testing is not devoid of caveats so also P values. Some of these are exposed below.

  • The threshold value, P < 0.05 is arbitrary. As has been said earlier, it was the practice of Fisher to assign P the value of 0.05 as a measure of evidence against null effect. One can make the “significant test” more stringent by moving to 0.01 (1%) or less stringent moving the borderline to 0.10 (10%). Dichotomizing P values into “significant” and “non significant” one loses information the same way as demarcating laboratory finding into normal” and “abnormal”, one may ask what is the difference between a fasting blood glucose of 25mmol/L and 15mmol/L?
  • Statistically significant (P < 0.05) findings are assumed to result from real treatment effects ignoring the fact that 1 in 20 comparisons of effects in which null hypothesis is true will result in significant finding (P < 0.05). This problem is more serious when several tests of hypothesis involving several variables were carried without using the appropriate statistical test, e.g., ANOVA instead of repeated t-test.
  • Statistical significance result does not translate into clinical importance. A large study can detect a small, clinically unimportant finding.
  • Chance is rarely the most important issue. Remember that when conducting a research a questionnaire is usually administered to participants. This questionnaire in most instances collect large amount of information from several variables included in the questionnaire. The manner in which the questions where asked and manner they were answered are important sources of errors (systematic error) which are difficult to measure.

What Influences P Value?

Generally, these factors influence P value.

  • Effect size . It is a usual research objective to detect a difference between two drugs, procedures or programmes. Several statistics are employed to measure the magnitude of effect produced by these interventions. They range: r 2 , ç 2 , ù 2 , R 2 , Q 2 , Cohen’s d, and Hedge’s g. Two problems are encountered: the use of appropriate index for measuring the effect and secondly size of the effect. A 7kg or 10 mmHg difference will have a lower P value (and more likely to be significant) than a 2-kg or 4 mmHg difference.
  • Size of sample . The larger the sample the more likely a difference to be detected. Further, a 7 kg difference in a study with 500 participants will give a lower P value than 7 kg difference observed in a study involving 250 participants in each group.
  • Spread of the data . The spread of observations in a data set is measured commonly with standard deviation. The bigger the standard deviation, the more the spread of observations and the lower the P value.

P Value and Statistical Significance: An Uncommon Ground

Both the Fisherian and Neyman-Pearson (N-P) schools did not uphold the practice of stating, “P values of less than 0.05 were regarded as statistically significant” or “P-value was 0.02 and therefore there was statistically significant difference.” These statements and many similar statements have criss-crossed medical journals and standard textbooks of statistics and provided an uncommon ground for marrying the two schools. This marriage of inconvenience further deepened the confusion and misunderstanding of the Fisherian and Neyman-Pearson schools. The combination of Fisherian and N-P thoughts (as exemplified in the above statements) did not shed light on correct interpretation of statistical test of hypothesis and p-value. The hybrid of the two schools as often read in medical journals and textbooks of statistics makes it as if the two schools were and are compatible as a single coherent method of statistical inference 4 , 23 , 24 . This confusion, perpetuated by medical journals, textbooks of statistics, reviewers and editors, have almost made it impossible for research report to be published without statements or notations such as, “statistically significant” or “statistically insignificant” or “P<0.05” or “P>0.05”.Sterne, then asked “can we get rid of P-values? His answer was “practical experience says no-why? 21 ”

However, the next section, “P-value and confidence interval: a common ground” provides one of the possible ways out of the seemingly insoluble problem. Goodman commented on P–value and confidence interval approach in statistical inference and its ability to solve the problem. “The few efforts to eliminate P values from journals in favor of confidence intervals have not generally been successful, indicating that the researchers’ need for a measure of evidence remains strong and that they often feel lost without one” 6 .

P Value and Confidence Interval: A Common Ground

Thus, so far this paper has examined the historical evolution of ‘significance’ testing as was initially proposed by R.A. Fisher. Neyman and Pearson were not accustomed to his subjective approach and therefore proposed ‘hypothesis testing’ involving binary outcomes: “accept” or “reject” null hypothesis. This, as we saw did not “solve” the problem completely. Thus, a common ground was needed and the combination of P value and confidence intervals provided the much needed common ground.

Before proceeding, we should briefly understand what confidence intervals (CIs) means having gone through what p-values and hypothesis testing mean. Suppose that we have two diets A and B given to two groups of malnourished children. An 8-kg increase in body weight was observed among children on diet A while a 3-kg increase in body weights was observed on diet B. The effect in weight increase is therefore 5kg on average. But it is obvious that the increase might be less than 3kg and also more than 8kg, thus a range can be represented and the chance associated with this range under the confidence intervals. Thus, for 95% confidence interval in this example will mean that if the study is repeated 100 times, 95 out of 100 the times, the CI contain the true increase in weight. Formally, 95% CI: “the interval computed from the sample data which when the study is repeated multiple times would contain the true effect 95% of the time.”

In the 1980s, a number of British statisticians tried to promote the use of this common ground approach in presenting statistical analysis 16 , 17 , 18 . They encouraged the combine presentation of P value and confidence intervals. The use of confidence intervals in addressing hypothesis testing is one of the four popular methods journal editors and eminent statisticians have issued statements supporting its use 19 . In line with this, the American Psychological Association’s Board of Scientific Affairs commissioned a white paper, “Task Force on Statistical Inference”. The Task Force suggested,

“When reporting inferential statistics (e.g. t - tests, F - tests, and chi-square) include information about the obtained ….. value of the test statistic, the degree of freedom, the probability of obtaining a value as extreme as or more extreme than the one obtained [i.e., the P value]…. Be sure to include sufficient descriptive statistics [e.g. per-cell sample size, means, correlations, standard deviations]…. The reporting of confidence intervals [for estimates of parameters, for functions of parameter such as differences in means, and for effect sizes] can be an extremely effective way of reporting results… because confidence intervals combine information on location and precision and can often be directly used to infer significance levels” 20 .

Jonathan Sterne and Davey Smith came up with their suggested guidelines for reporting statistical analysis as shown in the box 21 :

Box 1: Suggested guidance’s for the reporting of results of statistical analyses in medical journals.

  • The description of differences as statistically significant is not acceptable.
  • Confidence intervals for the main results should always be included, but 90% rather than 95% levels should be used. Confidence intervals should not be used as a surrogate means of examining significance at the conventional 5% level. Interpretation of confidence intervals should focus on the implication (clinical importance) of the range of values in the interval.
  • When there is a meaningful null hypothesis, the strength of evidence against it should be indexed by the P value. The smaller the P value, the stronger is the evidence.
  • While it is impossible to reduce substantially the amount of data dredging that is carried out, authors should take a very skeptical view of subgroup analyses in clinical trials and observational studies. The strength of the evidence for interaction-that effects really differ between subgroups – should always be presented. Claims made on the basis of subgroup findings should be even more tempered than claims made about main effects.
  • In observational studies it should be remembered that considerations of confounding and bias are at least as important as the issues discussed in this paper.

Since the 1980s when British statisticians championed the use of confidence intervals, journal after journal are issuing statements regarding its use. In an editorial in Clinical Chemistry, it read as follows,

“There is no question that a confidence interval for the difference between two true (i.e., population) means or proportions, based on the observed difference between sample estimate, provides more useful information than a P value, no matter how exact, for the probability that the true difference is zero. The confidence interval reflects the precision of the sample values in terms of their standard deviation and the sample size …..’’ 22

On the final note, it is important to know why it is statistically superior to use P value and confidence intervals rather than P value and hypothesis testing:

  • Confidence intervals emphasize the importance of estimation over hypothesis testing. It is more informative to quote the magnitude of the size of effect rather than adopting the significantnonsignificant hypothesis testing.
  • The width of the CIs provides a measure of the reliability or precision of the estimate.
  • Confidence intervals makes it far easier to determine whether a finding has any substantive (e.g. clinical) importance, as opposed to statistical significance.
  • While statistical significant tests are vulnerable to type I error, CIs are not.
  • Confidence intervals can be used as a significance test. The simple rule is that if 95% CIs does not include the null value (usually zero for difference in means and proportions; one for relative risk and odds ratio) null hypothesis is rejected at 0.05 levels.
  • Finally, the use of CIs promotes cumulative knowledge development by obligating researchers to think meta-analytically about estimation, replication and comparing intervals across studies 25 . For example, in a meta-analysis of trials dealing with intravenous nitrates in acute myocardial infraction found reduction in mortality of somewhere between one quarter and two-thirds. Meanwhile previous six trials 26 showed conflicting results: some trials revealed that it was dangerous to give intravenous nitrates while others revealed that it actually reduced mortality. For the six trials, the odds ratio, 95% CIs and P-values are: OR = 0.33 (CI = 0.09, 1.13, P = 0.08); OR = 0.24 (CI = 0.08, 0.74, P = 0.01); OR = 0.83(CI = 0.33, 2.12, P = 0.07); OR = 2.04 (CI = 0.39, 10.71, P = 0.04); OR = 0.58 (CI = 0.19. 1.65; P = 0.29) and OR = 0.48 (CI = 0.28, 0.82; P = 0.007). The first, third, fourth and fifth studies appear harmful; while the second and the sixth appear useful (in reducing mortality).

What is to be done?

While it is possible to make a change and improve on the practice, however, as Cohen warns, “Don’t look for a magic alternative … It does not exist” 27 .

  • The foundation for change in this practice should be laid in the foundation of teaching statistics: classroom. The curriculum and class room teaching should clearly differentiate between the two schools. Historical evolution should be clearly explained so also meaning of “statistical significance”. The classroom teaching of the correct concepts should begin at undergraduate and move up to graduate classroom instruction, even if it means this teaching would be at introductory level.
  • We should promote and encourage the use of confidence intervals around sample statistics and effect sizes. This duty lies in the hands of statistics teachers, medical journal editors, reviewers and any granting agency.
  • Generally, researchers, preparing on a study are encouraged to consult a statistician at the initial stage of their study to avoid misinterpreting the P value especially if they are using statistical software for their data analysis.

what is p value in research methodology

  • {{subColumn.name}}

Electronic Research Archive

what is p value in research methodology

  • {{newsColumn.name}}
  • Share facebook twitter google linkedin

what is p value in research methodology

Nanocrystalline SEM image restoration based on fractional-order TV and nuclear norm

  • Ruini Zhao , 
  • College of ASEAN Studies, Guangxi Minzu University, Nanning 530006, China
  • Received: 28 April 2024 Revised: 05 August 2024 Accepted: 13 August 2024 Published: 16 August 2024
  • Full Text(HTML)
  • Download PDF

To obtain high-quality nanocrystalline scanning electron microscopy (SEM) images, this paper proposed a Poisson denoising model that combined the fractional-order total variation (TV) and nuclear norm regularizers. The developed novel model integrated the superiorities of fractional-order TV and nuclear norm constraints, which contributed to significantly improving the accuracy of image restoration while preventing the staircase effect and preserving edge details. By combining the variable separation method and singular value thresholding method, an improved alternating direction method of multipliers was developed for numerical computation. Compared with some existing popular solvers, numerical experiments demonstrated the superiority of the new method in visual effects and quality evaluation.

  • image restoration ,
  • Poisson noise ,
  • fractional-order TV ,
  • nuclear norm ,
  • alternating direction method of multipliers

Citation: Ruini Zhao. Nanocrystalline SEM image restoration based on fractional-order TV and nuclear norm[J]. Electronic Research Archive, 2024, 32(8): 4954-4968. doi: 10.3934/era.2024228

Related Papers:

[1] , (2007), 257–263. https://doi.org/10.1007/s10851-007-0652-y --> T. Le, R. Chartrand, T. J. Asaki, A variational approach to reconstructing images corrupted by Poisson noise, , (2007), 257–263. https://doi.org/10.1007/s10851-007-0652-y doi:
[2] , (2020), 124678. https://doi.org/10.1016/j.amc.2019.124678 --> D. di Serafino, G. Landi, M. Viola, ACQUIRE: An inexact iteratively reweighted norm approach for TV-based Poisson image restoration, , (2020), 124678. https://doi.org/10.1016/j.amc.2019.124678 doi:
[3] , (2000), 503–516. https://doi.org/10.1137/S1064827598344169 --> T. Chan, A. Marquina, P. Mulet, High-order total variation-based image restoration, , (2000), 503–516. https://doi.org/10.1137/S1064827598344169 doi:
[4] , (2020), 123001. https://doi.org/10.1088/1361-6420/ab8f80 --> K. Bredies, M. Holler, Higher-order total variation approaches and generalisations, , (2020), 123001. https://doi.org/10.1088/1361-6420/ab8f80 doi:
[5] , (2010), 492–526. https://doi.org/10.1137/090769521 --> K. Bredies, K. Kunisch, T. Pock, Total generalized variation, , (2010), 492–526. https://doi.org/10.1137/090769521 doi:
[6] , (2016), 1694–1705. https://doi.org/10.1016/j.camwa.2016.03.005 --> X. Liu, Augmented Lagrangian method for total generalized variation based Poissonian image restoration, , (2016), 1694–1705. https://doi.org/10.1016/j.camwa.2016.03.005 doi:
[7] , (2022), 573–592. https://doi.org/10.15388/22-INFOR480 --> X. Liu, W. Lian, Restoration of Poissonian images using non-convex regularizer with overlapping group sparsity, , (2022), 573–592. https://doi.org/10.15388/22-INFOR480 doi:
[8] , (2020), 1238–1255. https://doi.org/10.1007/s10851-020-00987-0 --> M. R. Chowdhury, J. Qin, Y. Lou, Non-blind and blind deconvolution under poisson noise using fractional-order total variation, , (2020), 1238–1255. https://doi.org/10.1007/s10851-020-00987-0 doi:
[9] , (2023), 1555–1562. https://doi.org/10.1007/s11760-022-02364-3 --> J. Xiang, H. Xiang, L. Wang, Poisson noise image restoration method based on variational regularization, , (2023), 1555–1562. https://doi.org/10.1007/s11760-022-02364-3 doi:
[10] , (2024), 106072. https://doi.org/10.1016/j.bspc.2024.106072 --> M. Diwakar, P. Singh, D. Garg, Edge-guided filtering based CT image denoising using fractional order total variation, , (2024), 106072. https://doi.org/10.1016/j.bspc.2024.106072 doi:
[11] , (2016), 4464–4479. https://doi.org/10.1109/JSTARS.2016.2585158 --> C. Zou, Y. Xia, Poissonian hyperspectral image superresolution using alternating direction optimization, , (2016), 4464–4479. https://doi.org/10.1109/JSTARS.2016.2585158 doi:
[12] , (2018), 430–437. https://doi.org/10.1016/j.neucom.2017.09.010 --> C. Zou, Y. Xia, Restoration of hyperspectral image contaminated by Poisson noise using spectral unmixing, , (2018), 430–437. https://doi.org/10.1016/j.neucom.2017.09.010 doi:
[13] , (2009), 323–343. https://doi.org/10.1007/s11760-022-02364-3 --> T. Goldstein, S. Osher, The split Bregman algorithm for L1 regularized problems, , (2009), 323–343. https://doi.org/10.1007/s11760-022-02364-3 doi:
[14] , (2018), 1–12. https://doi.org/10.1016/j.neucom.2017.12.056 --> Z. Zhu, J. Yao, Z. Xu, J. Huang, B. Zhang, A simple primal-dual algorithm for nuclear norm and total variation regularization, , (2018), 1–12. https://doi.org/10.1016/j.neucom.2017.12.056 doi:
[15] , (2022), 126925. https://doi.org/10.1016/j.amc.2022.126925 --> B. Shi, F. Gu, Z. F. Pang, Y. Zeng, Remove the salt and pepper noise based on the high order total variation and the nuclear norm regularization, , (2022), 126925. https://doi.org/10.1016/j.amc.2022.126925 doi:
[16] , (2012), 39–49. https://doi.org/10.1007/s10851-011-0285-z --> J. Zhang, Z. Wei, L. Xiao, Adaptive fractional-order multi-scale method for image denoising, , (2012), 39–49. https://doi.org/10.1007/s10851-011-0285-z doi:
[17] , (2023), 1287–1302. https://doi.org/10.3934/era.2023066 --> C. Chen, H. Kong, B. Wu, Edge detection of remote sensing image based on Grünwald-Letnikov fractional difference and Otsu threshold, , (2023), 1287–1302. https://doi.org/10.3934/era.2023066 doi:
[18] , Springer-Verlag, 2006. --> J. Nocedal, S. J. Wright, , Springer-Verlag, 2006.
[19] , (2010), 1956–1982. https://doi.org/10.1137/080738970 --> J. F. Cai, E. J. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion, , (2010), 1956–1982. https://doi.org/10.1137/080738970 doi:
[20] , (2017), 156–171. https://doi.org/10.1109/TPAMI.2016.2535218 --> J. Yang, L. Luo, J. Qian, Y. Tai, F. Zhang, Y. Xu, Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes, , (2017), 156–171. https://doi.org/10.1109/TPAMI.2016.2535218 doi:
[21] , (1976), 17–40. https://doi.org/10.1016/0898-1221(76)90003-1 --> D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximations, , (1976), 17–40. https://doi.org/10.1016/0898-1221(76)90003-1 doi:
[22] , (2002), 81–84. https://doi.org/10.1109/97.995823 --> Z. Wang, A. C. Bovik, A universal image quality index, , (2002), 81–84. https://doi.org/10.1109/97.995823 doi:
[23] , (2018), 2023–2036. https://doi.org/10.1109/TVCG.2017.2702738 --> Q. Guo, S. Gao, X. Zhang, Y. Yin, C. Zhang, Patch-based image inpainting via two-stage low rank approximation, , (2018), 2023–2036. https://doi.org/10.1109/TVCG.2017.2702738 doi:
  • This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ -->

Supplements

Access History

Reader Comments

  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0 )

通讯作者: 陈斌, [email protected]

沈阳化工大学材料科学与工程学院 沈阳 110142

what is p value in research methodology

Article views( 3 ) PDF downloads( 6 ) Cited by( 0 )

Figures and Tables

what is p value in research methodology

Figures( 8 )  /  Tables( 2 )

what is p value in research methodology

Associated material

Other articles by authors, related pages.

  • on Google Scholar
  • Email to a friend
  • Order reprints

Export File

shu

  • Figure 1. Four test images used in the simulation experiments
  • Figure 2. Comparison of denoising results of five methods. (a1),(a2) Noisy images, (b1),(b2) TV, (c1),(c2) FOTV, (d1),(d2) HOTV, (e1),(e2) NOGS, (f1),(f2) new method
  • Figure 3. Comparison of denoising results of five methods. (a1),(a2) Noisy images, (b1),(b2) TV, (c1),(c2) FOTV, (d1),(d2) HOTV, (e1),(e2) NOGS, (f1),(f2) new method
  • Figure 4. Quantitative assessments versus iteration number for the designed method. (a) Relative error, (b) PSNR, (c) UIQI
  • Figure 5. Comparison of restoration results of five methods. (a1),(a2) Degraded images, (b1),(b2) TV, (c1),(c2) FOTV, (d1),(d2) HOTV, (e1),(e2) NOGS, (f1),(f2) new method
  • Figure 6. Comparison of restoration results of five methods. (a1),(a2) Degraded images, (b1),(b2) TV, (c1),(c2) FOTV, (d1),(d2) HOTV, (e1),(e2) NOGS, (f1),(f2) new method
  • Figure 7. Quantitative assessments versus iteration number for the designed method. (a) Relative error, (b) PSNR, (c) UIQI
  • Figure 8. Comparison of denoising results for real electron microscopy image. (a) Noisy image, (b) TV, (c) FOTV, (d) HOTV, (e) NOGS, (f) new method

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

biology-logo

Article Menu

what is p value in research methodology

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

High-throughput sequencing analysis revealed a preference for animal-based food in purple sea urchins.

what is p value in research methodology

Simple Summary

1. introduction, 2. materials and methods, 2.1. sample collection and processing, 2.2. sample preparation, 2.3. stable isotope analysis, 2.4. high-throughput sequencing of 18s rdna, 2.5. 16s rdna high-throughput sequencing, 2.6. statistical analysis, 3. results and analysis, 3.1. δ 13 c and δ 15 n values and c/n ratios of potential food sources, 3.2. stable-isotope-ratio-based analysis of the diet of the purple sea urchin, 3.3. analysis of the diet composition and relative abundance of purple sea urchins based on 18s, 3.4. diversity and differential analysis of the sea urchin gut microbiome, 4. discussion, 4.1. combining stable isotope analysis with high-throughput sequencing technology can provide a more comprehensive understanding of an animal’s feeding habits, 4.2. habitat environmental characteristics drive changes in the gut food composition of purple sea urchins, 4.3. diet and environment can influence the composition and diversity of the gut microbiota in sea urchins by inducing changes in their feeding preferences, 5. conclusions, author contributions, institutional review board statement, data availability statement, conflicts of interest.

  • Guo, M.; Li, C. Current Progress on Identification of Virus Pathogens and the Antiviral Effectors in Echinoderms. Dev. Comp. Immunol. 2021 , 116 , 103912. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Uthicke, S.; Schaffelke, B.; Byrne, M. A Boom–Bust Phylum? Ecological and Evolutionary Consequences of Density Variations in Echinoderms. Ecol. Monogr. 2009 , 79 , 3–24. [ Google Scholar ] [ CrossRef ]
  • Rodríguez-Barreras, R.; Godoy-Vitorino, F.; Præbel, K.; Wangensteen, O.S. DNA Metabarcoding Unveils Niche Overlapping and Competition among Caribbean Sea Urchins. Reg. Stud. Mar. Sci. 2020 , 40 , 101537. [ Google Scholar ] [ CrossRef ]
  • Kelly, S.R.; Hamel, K.; Narvaez, C.A.; Armstrong, T.J.; Grace, S.P.; Feehan, C.J. Sea Urchin Arbacia Punctulata Feeding Preference for Algal Turf over Kelp in a Degraded Kelp Forest ecosystem. J. Exp. Mar. Biol. Ecol. 2024 , 571 , 151976. [ Google Scholar ] [ CrossRef ]
  • Pulgar, J.; Moya, A.; Fernández, M.; Varas, O.; Guzmán-Rivas, F.; Urzúa, Á.; Quijón, P.A.; García-Huidobro, M.R.; Aldana, M.; Duarte, C. Upwelling Enhances Seaweed Nutrient Quality, Altering Feeding Behavior and Growth Rates in an Intertidal Sea Urchin, Loxechinus Albus. Sci. Total Environ. 2022 , 851 , 158307. [ Google Scholar ] [ CrossRef ]
  • Yiu, S.K.F.; Chung, S.S.-W. Spatial Distribution and Habitat Relationship of Sea Urchin Assemblages (Echinodermata: Echinoidea) in Hong Kong Waters. Cont. Shelf Res. 2024 , 273 , 105170. [ Google Scholar ] [ CrossRef ]
  • Ross, D.A.N.; Hamel, J.-F.; Mercier, A. Bathymetric and Interspecific Variability in Maternal Reproductive Investment and Diet of Eurybathic Echinoderms. Deep Sea Res. Part II Top. Stud. Oceanogr. 2013 , 94 , 333–342. [ Google Scholar ] [ CrossRef ]
  • Mo, B.; Qin, C.; Chen, C.; Li, X.; Feng, X.; Tong, F.; Yuan, H. Dietary analysis of purple sea urchin in Daya Bay based on carbon and nitrogen stable isotope techniques. J. Fish. Sci. China 2017 , 24 , 566–575. [ Google Scholar ]
  • Lau, D.C.C.; Dumont, C.P.; Lui, G.C.S.; Qiu, J.-W. Effectiveness of a Small Marine Reserve in Southern China in Protecting the Harvested Sea Urchin Anthocidaris Crassispina: A Mark-and-Recapture Study. Biol. Conserv. 2011 , 144 , 2674–2683. [ Google Scholar ] [ CrossRef ]
  • Leung, P.T.Y.; Yan, M.; Yiu, S.K.F.; Lam, V.T.T.; Ip, J.C.H.; Au, M.W.Y.; Chen, C.-Y.; Wai, T.-C.; Lam, P.K.S. Molecular Phylogeny and Toxicity of Harmful Benthic Dinoflagellates Coolia (Ostreopsidaceae, Dinophyceae) in a Sub-Tropical Marine Ecosystem: The First Record from Hong Kong. Mar. Pollut. Bull. 2017 , 124 , 878–889. [ Google Scholar ] [ CrossRef ]
  • Chen, G.; Xiang, W.-Z.; Lau, C.-C.; Peng, J.; Qiu, J.-W.; Chen, F.; Jiang, Y. A Comparative Analysis of Lipid and Carotenoid Composition of the Gonads of Anthocidaris Crassispina, Diadema Setosum and Salmacis Sphaeroides. Food Chem. 2010 , 120 , 973–977. [ Google Scholar ] [ CrossRef ]
  • Dincer, T.; Cakli, S. Chemical Composition and Biometrical Measurements of the Turkish Sea Urchin (Paracentrotus Lividus, Lamarck, 1816). Crit. Rev. Food Sci. Nutr. 2007 , 47 , 21–26. [ Google Scholar ] [ CrossRef ]
  • Pulz, O.; Gross, W. Valuable Products from Biotechnology of Microalgae. Appl. Microbiol. Biotechnol. 2004 , 65 , 635–648. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lourenço, S.; Mendes, S.; Raposo, A.; Santos, P.M.; Gomes, A.S.; Ganhão, R.; Costa, J.L.; Gil, M.M.; Pombo, A. Motivation and Preferences of Portuguese Consumers’ towards Sea Urchin Roe. Int. J. Gastron. Food Sci. 2021 , 24 , 100312. [ Google Scholar ] [ CrossRef ]
  • Kanamoto, Y.; Nakamura, K. Age and Growth Analyses of the Purple Sea Urchin Heliocidaris Crassispina Inhabiting Different Feeding Environments in the Shimane Peninsula, Japan. Reg. Stud. Mar. Sci. 2023 , 65 , 103096. [ Google Scholar ] [ CrossRef ]
  • Rocha, A.C.; Ressurreição, M.; Baeta, A.; Veríssimo, H.; Camarão, B.; Fernández-Boo, S.; Pombo, A.; Lourenço, S.; Gomes, A.S.; Santos, P.M.; et al. Temporal and Spatial Variability in the Isotopic Composition of Sea Urchins along Portuguese Coast. Mar. Environ. Res. 2023 , 192 , 106236. [ Google Scholar ] [ CrossRef ]
  • Qin, C.; Chen, P.; Sarà, G.; Mo, B.; Zhang, A.; Li, X. Ecological Implications of Purple Sea Urchin ( Heliocidaris crassispina , Agassiz, 1864) Enhancement on the Coastal Benthic Food Web: Evidence from Stable Isotope Analysis. Mar. Environ. Res. 2020 , 158 , 104957. [ Google Scholar ] [ CrossRef ]
  • Magierecka, A.; Lind, Å.J.; Aristeidou, A.; Sloman, K.A.; Metcalfe, N.B. Chronic Exposure to Stressors Has a Persistent Effect on Feeding Behaviour but Not Cortisol Levels in Sticklebacks. Anim. Behav. 2021 , 181 , 71–81. [ Google Scholar ] [ CrossRef ]
  • Cicala, D.; Sbrana, A.; Valente, T.; Berto, D.; Rampazzo, F.; Gravina, M.F.; Maiello, G.; Russo, T. Trophic Niche Overlap of Deep-Sea Fish Species Revealed by the Combined Approach of Stomach Contents and Stable Isotopes Analysis in the Central Tyrrhenian Sea. Deep Sea Res. Part Oceanogr. Res. Pap. 2024 , 206 , 104281. [ Google Scholar ] [ CrossRef ]
  • Deniro, M.J.; Epstein, S. Influence of Diet on the Distribution of Nitrogen Isotopes in Animals. Geochim. Cosmochim. Acta 1981 , 45 , 341–351. [ Google Scholar ] [ CrossRef ]
  • Boecklen, W.J.; Yarnes, C.T.; Cook, B.A.; James, A.C. On the Use of Stable Isotopes in Trophic Ecology. Annu. Rev. Ecol. Evol. Syst. 2011 , 42 , 411–440. [ Google Scholar ] [ CrossRef ]
  • McCormack, S.A.; Trebilco, R.; Melbourne-Thomas, J.; Blanchard, J.L.; Fulton, E.A.; Constable, A. Using Stable Isotope Data to Advance Marine Food Web Modelling. Rev. Fish Biol. Fish. 2019 , 29 , 277–296. [ Google Scholar ] [ CrossRef ]
  • Layman, C.A.; Giery, S.T.; Buhler, S.; Rossi, R.; Penland, T.; Henson, M.N.; Bogdanoff, A.K.; Cove, M.V.; Irizarry, A.D.; Schalk, C.M.; et al. A Primer on the History of Food Web Ecology: Fundamental Contributions of Fourteen Researchers. Food Webs 2015 , 4 , 14–24. [ Google Scholar ] [ CrossRef ]
  • Phillips, D.L.; Gregg, J.W. Source Partitioning Using Stable Isotopes: Coping with Too Many Sources. Oecologia 2003 , 136 , 261–269. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Phillips, D.L.; Inger, R.; Bearhop, S.; Jackson, A.L.; Moore, J.W.; Parnell, A.C.; Semmens, B.X.; Ward, E.J. Best Practices for Use of Stable Isotope Mixing Models in Food-Web Studies. Can. J. Zool. 2014 , 92 , 823–835. [ Google Scholar ] [ CrossRef ]
  • Nie, S.; Li, L.; Wu, Y.; Xiang, H.; Li, C.; Chen, S.; Zhao, Y.; Cen, J.; Yang, S.; Wang, Y. Exploring the Roles of Microorganisms and Metabolites in the Fermentation of Sea Bass ( Lateolabrax japonicas ) Based on High-Throughput Sequencing and Untargeted Metabolomics. LWT 2022 , 167 , 113795. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.; Shen, Y.; Wu, Y.; Li, C.; Li, L.; Zhao, Y.; Hu, X.; Wei, Y.; Huang, H. Comparison of the Microbial Community and Flavor Compounds in Fermented Mandarin Fish ( Siniperca chuatsi ): Three Typical Types of Chinese Fermented Mandarin Fish Products. Food Res. Int. 2021 , 144 , 110365. [ Google Scholar ] [ CrossRef ]
  • Shen, Y.; Wu, Y.; Wang, Y.; Li, L.; Li, C.; Zhao, Y.; Yang, S. Contribution of Autochthonous Microbiota Succession to Flavor Formation during Chinese Fermented Mandarin Fish ( Siniperca chuatsi ). Food Chem. 2021 , 348 , 129107. [ Google Scholar ] [ CrossRef ]
  • Zhang, C.; Hu, S.; Lin, X.; Zhou, T.; Huang, H.; Liu, S. Diet and trophic level analysis of triggerfish ( Balistapus undulatuse ) in coral reefs of Nansha. J. Trop. Oceanogr. 2022 , 41 , 7–14. [ Google Scholar ]
  • Lin, X.; Zhou, Y.; Lin, H.; Hu, S.; Huang, H.; Zhang, L.; Liu, S. Diet analysis of the parrotfish ( Scarus globiceps ) in coral reefs of the Nansha islands. J. Trop. Oceanogr. 2024 , 43 , 100–108. [ Google Scholar ]
  • Pan, W.; Qin, C.; Zuo, T.; Yu, G.; Zhu, W.; Ma, H.; Xi, S. Is Metagenomic Analysis an Effective Way to Analyze Fish Feeding Habits? A Case of the Yellowfin Sea Bream Acanthopagrus Latus (Houttuyn) in Daya Bay. Front. Mar. Sci. 2021 , 8 , 634651. [ Google Scholar ] [ CrossRef ]
  • Chai, J.; Zhuang, Y.; Cui, K.; Bi, Y.; Zhang, N. Metagenomics Reveals the Temporal Dynamics of the Rumen Resistome and Microbiome in Goat Kids. Microbiome 2024 , 12 , 14. [ Google Scholar ] [ CrossRef ]
  • Kim, J.; Guk, J.-H.; Mun, S.-H.; An, J.-U.; Song, H.; Kim, J.; Ryu, S.; Jeon, B.; Cho, S. Metagenomic Analysis of Isolation Methods of a Targeted Microbe, Campylobacter Jejuni, from Chicken Feces with High Microbial Contamination. Microbiome 2019 , 7 , 67. [ Google Scholar ] [ CrossRef ]
  • Zhang, J.; Wang, Y.; Feng, H.; Wang, C.; Tang, Y. Analysis of feeding habits of cultured jellyfish ( Rhopilema esculentum ) in Tongzhou Bay based on fatty acid and stable carbon and nitrogen isotopic analysis. South China Fish. Sci. 2021 , 17 , 25–31. [ Google Scholar ]
  • Härer, A.; Torres-Dowdall, J.; Rometsch, S.J.; Yohannes, E.; Machado-Schiaffino, G.; Meyer, A. Parallel and Non-Parallel Changes of the Gut Microbiota during Trophic Diversification in Repeated Young Adaptive Radiations of Sympatric Cichlid Fish. Microbiome 2020 , 8 , 149. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tissue-Specific Turnover Rates of the Nitrogen Stable Isotope as Functions of Time and Growth in a Cyprinid Fish|Hydrobiologia. Available online: https://link.springer.com/article/10.1007/s10750-017-3276-2 (accessed on 2 May 2024).
  • Yin, Y.; Liu, K.; Li, G. Protective Effect of Prim-O-Glucosylcimifugin on Ulcerative Colitis and Its Mechanism. Front. Pharmacol. 2022 , 13 , 882924. [ Google Scholar ] [ CrossRef ]
  • Chen, S.; Han, P.; Zhang, Q.; Liu, P.; Liu, J.; Zhao, L.; Guo, L.; Li, J. Lactobacillus Brevis Alleviates the Progress of Hepatocellular Carcinoma and Type 2 Diabetes in Mice Model via Interplay of Gut Microflora, Bile Acid and NOTCH 1 Signaling. Front. Immunol. 2023 , 14 , 1179014. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hicks, A.L.; Lee, K.J.; Couto-Rodriguez, M.; Patel, J.; Sinha, R.; Guo, C.; Olson, S.H.; Seimon, A.; Seimon, T.A.; Ondzie, A.U.; et al. Gut Microbiomes of Wild Great Apes Fluctuate Seasonally in Response to Diet. Nat. Commun. 2018 , 9 , 1786. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Muegge, B.D.; Kuczynski, J.; Knights, D.; Clemente, J.C.; Gonzalez, A.; Fontana, L.; Henrissat, B.; Knight, R.; Gordon, J.I. Diet Drives Convergence in Gut Microbiome Functions across Mammalian Phylogeny and within Humans. Science 2011 , 332 , 970–974. [ Google Scholar ] [ CrossRef ]
  • Yuan, S.; Wang, K.-S.; Meng, H.; Hou, X.-T.; Xue, J.-C.; Liu, B.-H.; Cheng, W.-W.; Li, J.; Zhang, H.-M.; Nan, J.-X.; et al. The Gut Microbes in Inflammatory Bowel Disease: Future Novel Target Option for Pharmacotherapy. Biomed. Pharmacother. 2023 , 165 , 114893. [ Google Scholar ] [ CrossRef ]
  • Li, X.; Wu, P.; Zeng, X.; Lang, Q.; Lin, Y.; Huang, H.; Qian, P. Protocol for Correlation Analysis of the Murine Gut Microbiome and Meta-Metabolome Using 16S rDNA Sequencing and UPLC-MS. STAR Protoc. 2022 , 3 , 101494. [ Google Scholar ] [ CrossRef ]
  • Zhang, J.; Huang, J.; He, F. The Construction of Mycobacterium Tuberculosis 16S rDNA MSPQC Sensor Based on Exonuclease III-Assisted Cyclic Signal amplification. Biosens. Bioelectron. 2019 , 138 , 111322. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Dueñas, E.; Nakamoto, J.A.; Cabrera-Sosa, L.; Huaihua, P.; Cruz, M.; Arévalo, J.; Milón, P.; Adaui, V. Novel CRISPR-Based Detection of Leishmania Species. Front. Microbiol. 2022 , 13 , 958693. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hugerth, L.W.; Muller, E.E.L.; Hu, Y.O.O.; Lebrun, L.A.M.; Roume, H.; Lundin, D.; Wilmes, P.; Andersson, A.F. Systematic Design of 18S rRNA Gene Primers for Determining Eukaryotic Diversity in Microbial Consortia. PLoS ONE 2014 , 9 , e95567. [ Google Scholar ] [ CrossRef ]
  • Cruz-Saavedra, L.; Ospina, C.; Patiño, L.H.; Villar, J.C.; Sáenz Pérez, L.D.; Cantillo-Barraza, O.; Jaimes-Dueñez, J.; Ballesteros, N.; Cáceres, T.; Vallejo, G.; et al. Enhancing Trypanosomatid Identification and Genotyping with Oxford Nanopore SequencingOxford Nanopore. J. Mol. Diagn. 2024 , 26 , 323–336. [ Google Scholar ] [ CrossRef ]
  • Zhao, Y.-E.; Xu, J.-R.; Hu, L.; Wu, L.-P.; Wang, Z.-H. Complete Sequence Analysis of 18S rDNA Based on Genomic DNA Extraction from Individual Demodex Mites (Acari: Demodicidae). Exp. Parasitol. 2012 , 131 , 45–51. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Makwanise, T.; Dube, S.; Sibula, M.S. Molecular Characterization of Raillietina Isolates from the Gastrointestinal Tract of Free Range Chickens (Gallus Gallus Domesticus) from the Southern Region of Zimbabwe Using the 18S rDNA gene. Vet. Parasitol. Reg. Stud. Rep. 2020 , 20 , 100389. [ Google Scholar ] [ CrossRef ]
  • Lin, H.-J.; Kao, W.-Y.; Wang, Y.-T. Analyses of Stomach Contents and Stable Isotopes Reveal Food Sources of Estuarine Detritivorous Fish in Tropical/Subtropical Taiwan. Estuar. Coast. Shelf Sci. 2007 , 73 , 527–537. [ Google Scholar ] [ CrossRef ]
  • Prado, P.; Carmichael, R.H.; Watts, S.A.; Cebrian, J.; Heck, K., Jr. Diet-Dependent δ13C and δ15N Fractionation among Sea Urchin Lytechinus Variegatus Tissues: Implications for Food Web Models. Mar. Ecol. Prog. Ser. 2012 , 462 , 175–190. [ Google Scholar ] [ CrossRef ]
  • Yang, S.; Li, P.; Sun, K.; Wei, N.; Liu, J.; Feng, X. Mercury isotope compositions in seawater and marine fish revealed the sources and processes of mercury in the food web within differing marine compartments. Water Res. 2023 , 241 , 120150. [ Google Scholar ] [ CrossRef ]
  • Prado, P.; Heck, K., Jr.; Watts, S.A.; Cebrian, J. δ13C and δ18O Signatures from Sea Urchin Skeleton: Importance of Diet Type in Metabolic Contributions. Mar. Ecol. Prog. Ser. 2013 , 476 , 153–166. [ Google Scholar ] [ CrossRef ]
  • Macroalgae Contribution to the Diet of Two Sea Urchins in Sargassum Beds: Tripneustes Depressus (Camarodonta: Toxopneustidae) and Eucidaris Thouarsii (Cidaroide: Cidaridae). Reg. Stud. Mar. Sci. 2022 , 53 , 102456. [ CrossRef ]
  • Hughes, T.P.; Rodrigues, M.J.; Bellwood, D.R.; Ceccarelli, D.; Hoegh-Guldberg, O.; McCook, L.; Moltschaniwskyj, N.; Pratchett, M.S.; Steneck, R.S.; Willis, B. Phase Shifts, Herbivory, and the Resilience of Coral Reefs to Climate Change. Curr. Biol. 2007 , 17 , 360–365. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Carter, A.R.; Anderson, R.J. Biological and Physical Factors Controlling the Spatial Distribution of the Intertidal Alga Gelidium Pristoides in the Eastern Cape, South Africa. J. Mar. Biol. Assoc. UK 1991 , 71 , 555–568. [ Google Scholar ] [ CrossRef ]
  • Bulleri, F.; Thiault, L.; Mills, S.C.; Nugues, M.M.; Eckert, E.M.; Corno, G.; Claudet, J. Erect Macroalgae Influence Epilithic Bacterial Assemblages and Reduce Coral Recruitment. Mar. Ecol. Prog. Ser. 2018 , 597 , 65–77. [ Google Scholar ] [ CrossRef ]
  • Vieira, C.; Engelen, A.H.; Guentas, L.; Aires, T.; Houlbreque, F.; Gaubert, J.; Serrão, E.A.; De Clerck, O.; Payri, C.E. Species Specificity of Bacteria Associated to the Brown Seaweeds Lobophora (Dictyotales, Phaeophyceae) and Their Potential for Induction of Rapid Coral Bleaching in Acropora Muricata. Front. Microbiol. 2016 , 7 , 316. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Nugues, M.M.; Smith, G.W.; Hooidonk, R.J.; Seabra, M.I.; Bak, R.P.M. Algal Contact as a Trigger for Coral Disease. Ecol. Lett. 2004 , 7 , 919–923. [ Google Scholar ] [ CrossRef ]
  • Zheng, X.; Wang, Q.; Huang, L.; Wang, J.; Lin, R.; Huang, D.; Sun, X. Feeding Habits for Two Dominant Amphipod Species in the Yundang Lagoon Based on Stable Carbon and Nitrogen Isotope Analysis. Acta Ecol. Sin. 2015 , 35 , 7589–7597. [ Google Scholar ]
  • Lyons, D.A.; Scheibling, R.E. Effect of Dietary History and Algal Traits on Feeding Rate and Food Preference in the Green Sea Urchin Strongylocentrotus Droebachiensis. J. Exp. Mar. Biol. Ecol. 2007 , 349 , 194–204. [ Google Scholar ] [ CrossRef ]
  • Trenzado, C.E.; Hidalgo, F.; Villanueva, D.; Furné, M.; Díaz-Casado, M.E.; Merino, R.; Sanz, A. Study of the Enzymatic Digestive Profile in Three Species of Mediterranean Sea Urchins. Aquaculture 2012 , 344–349 , 174–180. [ Google Scholar ] [ CrossRef ]
  • Hernández-Almaraz, P.; Méndez-Rodríguez, L.; Zenteno-Savín, T.; O’Hara, T.M.; Harley, J.R.; Serviere-Zaragoza, E. Concentrations of Trace Elements in Sea Urchins and Macroalgae Commonly Present in Sargassum Beds: Implications for Trophic Transfer. Ecol. Res. 2016 , 31 , 785–798. [ Google Scholar ] [ CrossRef ]
  • Lawrence, J.M.; Lawrence, A.L.; Watts, S.A. Chapter 9—Feeding, Digestion and Digestibility of Sea Urchins. In Developments in Aquaculture and Fisheries Science ; Lawrence, J.M., Ed.; Elsevier: Amsterdam, The Netherlands, 2013; Volume 38, pp. 135–154. ISBN 0167-9309. [ Google Scholar ]
  • Giraldes, B.W.; Al-Thani, J.A.K.H.; Dib, S.; Engmann, A.; Alsaadi, H.A.; Vethamony, P.; Alatalo, J.M.; Yigiterhan, O. Target Gastropods for Standardizing the Monitoring of Tar Mat Contamination in the Arabian Gulf. Reg. Stud. Mar. Sci. 2022 , 53 , 102328. [ Google Scholar ] [ CrossRef ]
  • Rotolo, F.; Roncalli, V.; Cieslak, M.; Gallo, A.; Buttino, I.; Carotenuto, Y. Transcriptomic Analysis Reveals Responses to a Polluted Sediment in the Mediterranean Copepod Acartia Clausi. Environ. Pollut. 2023 , 335 , 122284. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Scheibling, R.; Hennigar, A.; Balch, T. Destructive Grazing, Epiphytism, and Disease: The Dynamics of Sea Urchin—Kelp Interactions in Nova Scotia. Can. J. Fish. Aquat. Sci. 1999 , 56 , 2300–2314. [ Google Scholar ] [ CrossRef ]
  • Westbrook, C.E.; Ringang, R.R.; Cantero, S.M.A.; Toonen, R.J. Survivorship and Feeding Preferences among Size Classes of Outplanted Sea Urchins, Tripneustes Gratilla, and Possible Use as Biocontrol for Invasive Alien Algae. PeerJ 2015 , 3 , e1235. [ Google Scholar ] [ CrossRef ]
  • Conklin, E.J.; Smith, J.E. Abundance and Spread of the Invasive Red Algae, Kappaphycus Spp., in Kane’ohe Bay, Hawai’i and an Experimental Assessment of Management Options. Biol. Invasions 2005 , 7 , 1029–1039. [ Google Scholar ] [ CrossRef ]
  • Mancuso, F.P.; Milazzo, M.; Sarà, G.; Chemello, R. Bi- and Three-Dimensional Fractal Analysis of the Brown Seaweed Gongolaria Montagnei and Their Relationship with Gastropod Molluscs Assemblage. Mar. Pollut. Bull. 2023 , 186 , 114396. [ Google Scholar ] [ CrossRef ]
  • Mahadik, G.A.; Castellani, C.; Mazzocchi, M.G. Effect of Diatom Morphology on the Small-Scale Behavior of the Copepod Temora Stylifera (Dana, 1849). J. Exp. Mar. Biol. Ecol. 2017 , 493 , 41–48. [ Google Scholar ] [ CrossRef ]
  • Wessels, H.; Hagen, W.; Molis, M.; Wiencke, C.; Karsten, U. Intra- and Interspecific Differences in Palatability of Arctic Macroalgae from Kongsfjorden (Spitsbergen) for Two Benthic Sympatric Invertebrates. J. Exp. Mar. Biol. Ecol. 2006 , 329 , 20–33. [ Google Scholar ] [ CrossRef ]
  • Pavia, H.; Toth, G.B. Inducible Chemical Resistance to Herbivory in the Brown Seaweed Ascophyllum Nodosum. Ecology 2000 , 81 , 3212–3225. [ Google Scholar ] [ CrossRef ]
  • Neighbors, M.A.; Horn, M.H. Nutritional Quality of Macrophytes Eaten and Not Eaten by Two Temperatezone Herbivorous Fishes: A Multivariate Comparison. Mar. Biol. 1991 , 108 , 471–476. [ Google Scholar ] [ CrossRef ]
  • Wright, J.T.; Dworjanyn, S.A.; Rogers, C.N.; Steinberg, P.D.; Williamson, J.E.; Poore, A.G.B. Density-Dependent Sea Urchin Grazing: Differential Removal of Species, Changes in Community Composition and Alternative Community States. Mar. Ecol. Prog. Ser. 2005 , 298 , 143–156. [ Google Scholar ] [ CrossRef ]
  • Sala, E.; Boudouresque, C.F.; Harmelin-Vivien, M. Fishing, Trophic Cascades, and the Structure of Algal Assemblages: Evaluation of an Old but Untested Paradigm. Oikos 1998 , 82 , 425–439. [ Google Scholar ] [ CrossRef ]
  • Pinnegar, J.; Polunin, N.; Francour, P.; Badalamenti, F.; Chemello, R.; Harmelin-vivien, M.; Pipitone, C. Trophic cascades in benthic marine ecosystems: Lessons for fisheries and protected-area management. Environ. Conserv. 2000 , 27 , 179–200. [ Google Scholar ] [ CrossRef ]
  • Liao, Y.; Cai, C.; Yang, C.; Zheng, Z.; Wang, Q.; Du, X.; Deng, Y. Effect of Protein Sources in Formulated Diets on the Growth, Immune Response, and Intestinal Microflora of Pearl Oyster Pinctada Fucata Martensii. Aquac. Rep. 2020 , 16 , 100253. [ Google Scholar ] [ CrossRef ]
  • Wang, C.; Li, P.; Guo, L.; Cao, H.; Mo, W.; Xin, Y.; Jv, R.; Zhao, Y.; Liu, X.; Ma, C.; et al. A New Potential Risk: The Impacts of Klebsiella Pneumoniae Infection on the Histopathology, Transcriptome and Metagenome of Chinese Mitten Crab (Eriocheir Sinensis). Fish Shellfish Immunol. 2022 , 131 , 918–928. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sears, C.L. A Dynamic Partnership: Celebrating Our Gut Flora. Anaerobe 2005 , 11 , 247–251. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Cheng, C.; Wu, Y.; Ye, Q.; Yao, Y.; Li, L.; Guo, Z.; Yang, L.; Tian, W.; Jiang, J. Individual and Combined Effects of Microplastics and Cadmium on Intestinal Histology and Microflora of Procypris Merus. Aquac. Rep. 2023 , 31 , 101659. [ Google Scholar ] [ CrossRef ]
  • Klase, G.; Lee, S.; Liang, S.; Kim, J.; Zo, Y.-G.; Lee, J. The Microbiome and Antibiotic Resistance in Integrated Fishfarm Water: Implications of Environmental Public Health. Sci. Total Environ. 2019 , 649 , 1491–1501. [ Google Scholar ] [ CrossRef ]
  • Gil, D.G.; Boraso, A.L.; Lopretto, E.C.; Zaixso, H.E. Depth-Related Plasticity in the Diet Composition of Pseudechinus Magellanicus (Echinoidea, Temnopleuridae) in Nearshore Environments off Central Patagonia, Argentina. Aquat. Ecol. 2021 , 55 , 589–606. [ Google Scholar ] [ CrossRef ]
  • Wu, W.-J.; Liu, Q.-Q.; Chen, G.-J.; Du, Z.-J. Roseimarinus Sediminis Gen. Nov., Sp. Nov., a Facultatively Anaerobic Bacterium Isolated from Coastal Sediment. Int. J. Syst. Evol. Microbiol. 2015 , 65 , 2260–2264. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Cai, W.; Jiang, C.; Shang, S.; Wang, S.; Zhu, K.; Dong, X.; Zhou, D.; Jiang, P. Insight into the Relationship between Metabolite Dynamic Changes and Microorganisms of Sea Urchin ( S. intermedius ) Gonads during Storage. Food Chem. X 2023 , 18 , 100727. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jia, S.; Liu, Y.; Zhuang, S.; Sun, X.; Li, Y.; Hong, H.; Lv, Y.; Luo, Y. Effect of ε-Polylysine and Ice Storage on Microbiota Composition and Quality of Pacific White Shrimp ( Litopenaeus vannamei ) Stored at 0 °C. Food Microbiol. 2019 , 83 , 27–35. [ Google Scholar ] [ CrossRef ] [ PubMed ]

Click here to enlarge figure

Alpha Diversity IndicesStone RegionAlgal Region
ACE2.8 ± 0.537.1 ± 1.29
Chao12.8 ± 0.537.1 ± 1.29
Simpson0.23 ± 0.060.33 ± 0.06
Shannon0.68 ± 0.170.74 ± 0.13
Alpha Diversity IndicesStone RegionAlgal Region
ACE149.1 ± 15.184 ± 9.87
Chao1149.1 ± 15.184 ± 9.87
Simpson0.93 ± 0.030.9 ± 0.05
Shannon5.8 ± 0.364.87 ± 0.37
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Liu, Z.; Guo, Y.; Qin, C.; Mu, X.; Zhang, J. High-Throughput Sequencing Analysis Revealed a Preference for Animal-Based Food in Purple Sea Urchins. Biology 2024 , 13 , 623. https://doi.org/10.3390/biology13080623

Liu Z, Guo Y, Qin C, Mu X, Zhang J. High-Throughput Sequencing Analysis Revealed a Preference for Animal-Based Food in Purple Sea Urchins. Biology . 2024; 13(8):623. https://doi.org/10.3390/biology13080623

Liu, Zerui, Yu Guo, Chuanxin Qin, Xiaohui Mu, and Jia Zhang. 2024. "High-Throughput Sequencing Analysis Revealed a Preference for Animal-Based Food in Purple Sea Urchins" Biology 13, no. 8: 623. https://doi.org/10.3390/biology13080623

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

COMMENTS

  1. Understanding P-values

    The p value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true. P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis.

  2. P-Value And Statistical Significance: What It Is & Why It Matters

    A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p -value, the less likely the results occurred by random chance, and the ...

  3. Hypothesis Testing, P Values, Confidence Intervals, and Significance

    Medical providers often rely on evidence-based medicine to guide decision-making in practice. Often a research hypothesis is tested with results provided, typically with p values, confidence intervals, or both. Additionally, statistical or research significance is estimated or determined by the investigators. Unfortunately, healthcare providers may have different comfort levels in interpreting ...

  4. What is p-value: How to Calculate It and Statistical Significance

    The p-value is an important concept in quantitative research that can be confusing and easily misused. In this comprehensive article, we take a deeper look at what is a p-value, how to calculate it, and its statistical significance in research. Read more!

  5. p-value

    The p -value is used in the context of null hypothesis testing in order to quantify the statistical significance of a result, the result being the observed value of the chosen statistic . [ note 2] The lower the p -value is, the lower the probability of getting that result if the null hypothesis were true. A result is said to be statistically ...

  6. An Explanation of P-Values and Statistical Significance

    An Explanation of P-Values and Statistical Significance. In statistics, p-values are commonly used in hypothesis testing for t-tests, chi-square tests, regression analysis, ANOVAs, and a variety of other statistical methods. Despite being so common, people often interpret p-values incorrectly, which can lead to errors when interpreting the ...

  7. The P Value and Statistical Significance: Misunderstandings

    The calculation of a P value in research and especially the use of a threshold to declare the statistical significance of the P value have both been challenged in recent years. There are at least two important reasons for this challenge: research data ...

  8. The p value

    The p-value is the most commonly used statistic in scientific papers and applied statistical analyses. Learn what its definition is, how to interpret it and how to calculate statistical significance if you are performing statistical tests of hypotheses. The utility, interpretation, and common misinterpretations of observed p-values and significance levels are illustrated with examples.

  9. P-Value: What It Is, How to Calculate It, and Why It Matters

    P-Value: The p-value is the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event. The p-value is used as an ...

  10. An Easy Introduction to Statistical Significance (With Examples)

    The p value determines statistical significance. An extremely low p value indicates high statistical significance, while a high p value means low or no statistical significance. Example: Hypothesis testing. To test your hypothesis, you first collect data from two groups. The experimental group actively smiles, while the control group does not.

  11. P-Value in Statistical Hypothesis Tests: What is it?

    A p value is used in hypothesis testing to help you support or reject the null hypothesis. The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. P values are expressed as decimals although it may be easier to understand what they are if you convert ...

  12. Understanding P-values

    P-values, or probability values, play a crucial role in statistical hypothesis testing. They help researchers determine the significance of their findings and whether they can reject the null hypothesis. Here's a comprehensive guide to understanding p-values, including their definition, interpretation, and examples:

  13. P-Value: A Complete Guide

    P-value in statistics is the probability of getting outcomes as extreme as the outcomes of a statistical hypothesis test.

  14. What the P values really tell us

    To support the significance of the study's conclusion, the concept of "statistical significance", typically assessed with an index referred as P value is commonly used. The prevalent use of P values to summarize the results of research articles could result from the increased quantity and complexity of data in recent scientific research.

  15. What is a p value and what does it mean?

    Statistical probability or p values reveal whether the findings in a research study are statistically significant, meaning that the findings are unlikely to have occurred by chance. To understand the p value concept, it is important to understand its relationship with the α level. Before conducting a study, researchers specify the α level ...

  16. P-values and significance tests (video)

    Transcript. We compare a P-value to a significance level to make a conclusion in a significance test. Given the null hypothesis is true, a p-value is the probability of getting a result as or more extreme than the sample result by random chance alone. If a p-value is lower than our significance level, we reject the null hypothesis.

  17. What is P-Value?

    According to American Statistical Association, "a p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.".

  18. P-values

    Background In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain ...

  19. What is a p-value?

    A p-value is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test.

  20. Why, When and How to Adjust Your P Values?

    The simplest way to adjust your P values is to use the conservative Bonferroni correction method which multiplies the raw P values by the number of tests m (i.e. length of the vector P_values). Using the p.adjust function and the 'method' argument set to "bonferroni", we get a vector of same length but with adjusted P values.

  21. P-Value (Definition, Formula, Table & Example)

    P-value In Statistics, the researcher checks the significance of the observed result, which is known as test static. For this test, a hypothesis test is also utilized. The P-value or probability value concept is used everywhere in statistical analysis. It determines the statistical significance and the measure of significance testing. In this article, let us discuss its definition, formula ...

  22. What is P-value in hypothesis testing

    This short animated video explains the concept P-values. Watch this video to understand this concept of P-values with help of animations and solved examples....

  23. What is a Zestimate? Zillow's Zestimate Accuracy

    It is a computer-generated estimate of the value of a home today, given the available data. We encourage buyers, sellers and homeowners to supplement the Zestimate with other research, such as visiting the home, getting a professional appraisal of the home, or requesting a comparative market analysis (CMA) from a real estate agent.

  24. P

    The P value is defined as the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed. The P stands for probability and measures how likely it is that any observed difference between groups is due to chance.

  25. Nanocrystalline SEM image restoration based on fractional-order TV and

    By combining the variable separation method and singular value thresholding method, an improved alternating direction method of multipliers was developed for numerical computation. Compared with some existing popular solvers, numerical experiments demonstrated the superiority of the new method in visual effects and quality evaluation.</p>

  26. Biology

    Sea urchins play an important role in marine ecosystems. Owing to limitations in previous research methods, there has been insufficient understanding of the food sources and ecological functional value of purple sea urchins, leading to considerable controversy regarding their functional positioning. We focused on Daya Bay as the research area, utilizing stable isotope technology and high ...