Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • The 4 Types of Validity in Research | Definitions & Examples

The 4 Types of Validity in Research | Definitions & Examples

Published on September 6, 2019 by Fiona Middleton . Revised on June 22, 2023.

Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid. There are four main types of validity:

  • Construct validity : Does the test measure the concept that it’s intended to measure?
  • Content validity : Is the test fully representative of what it aims to measure?
  • Face validity : Does the content of the test appear to be suitable to its aims?
  • Criterion validity : Do the results accurately measure the concrete outcome they are designed to measure?

In quantitative research , you have to consider the reliability and validity of your methods and measurements.

Note that this article deals with types of test validity, which determine the accuracy of the actual components of a measure. If you are doing experimental research, you also need to consider internal and external validity , which deal with the experimental design and the generalizability of results.

Table of contents

Construct validity, content validity, face validity, criterion validity, other interesting articles, frequently asked questions about types of validity.

Construct validity evaluates whether a measurement tool really represents the thing we are interested in measuring. It’s central to establishing the overall validity of a method.

What is a construct?

A construct refers to a concept or characteristic that can’t be directly observed, but can be measured by observing other indicators that are associated with it.

Constructs can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or depression; they can also be broader concepts applied to organizations or social groups, such as gender equality, corporate social responsibility, or freedom of speech.

There is no objective, observable entity called “depression” that we can measure directly. But based on existing psychological research and theory, we can measure depression based on a collection of symptoms and indicators, such as low self-confidence and low energy levels.

What is construct validity?

Construct validity is about ensuring that the method of measurement matches the construct you want to measure. If you develop a questionnaire to diagnose depression, you need to know: does the questionnaire really measure the construct of depression? Or is it actually measuring the respondent’s mood, self-esteem, or some other construct?

To achieve construct validity, you have to ensure that your indicators and measurements are carefully developed based on relevant existing knowledge. The questionnaire must include only relevant questions that measure known indicators of depression.

The other types of validity described below can all be considered as forms of evidence for construct validity.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Content validity assesses whether a test is representative of all aspects of the construct.

To produce valid results, the content of a test, survey or measurement method must cover all relevant parts of the subject it aims to measure. If some aspects are missing from the measurement (or if irrelevant aspects are included), the validity is threatened and the research is likely suffering from omitted variable bias .

A mathematics teacher develops an end-of-semester algebra test for her class. The test should cover every form of algebra that was taught in the class. If some types of algebra are left out, then the results may not be an accurate indication of students’ understanding of the subject. Similarly, if she includes questions that are not related to algebra, the results are no longer a valid measure of algebra knowledge.

Face validity considers how suitable the content of a test seems to be on the surface. It’s similar to content validity, but face validity is a more informal and subjective assessment.

You create a survey to measure the regularity of people’s dietary habits. You review the survey items, which ask questions about every meal of the day and snacks eaten in between for every day of the week. On its surface, the survey seems like a good representation of what you want to test, so you consider it to have high face validity.

As face validity is a subjective measure, it’s often considered the weakest form of validity. However, it can be useful in the initial stages of developing a method.

Criterion validity evaluates how well a test can predict a concrete outcome, or how well the results of your test approximate the results of another test.

What is a criterion variable?

A criterion variable is an established and effective measurement that is widely considered valid, sometimes referred to as a “gold standard” measurement. Criterion variables can be very difficult to find.

What is criterion validity?

To evaluate criterion validity, you calculate the correlation between the results of your measurement and the results of the criterion measurement. If there is a high correlation, this gives a good indication that your test is measuring what it intends to measure.

A university professor creates a new test to measure applicants’ English writing ability. To assess how well the test really does measure students’ writing ability, she finds an existing test that is considered a valid measurement of English writing ability, and compares the results when the same group of students take both tests. If the outcomes are very similar, the new test has high criterion validity.

Prevent plagiarism. Run a free check.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Face validity and content validity are similar in that they both evaluate how suitable the content of a test is. The difference is that face validity is subjective, and assesses content at surface level.

When a test has strong face validity, anyone would agree that the test’s questions appear to measure what they are intended to measure.

For example, looking at a 4th grade math test consisting of problems in which students have to add and multiply, most people would agree that it has strong face validity (i.e., it looks like a math test).

On the other hand, content validity evaluates how well a test represents all the aspects of a topic. Assessing content validity is more systematic and relies on expert evaluation. of each question, analyzing whether each one covers the aspects that the test was designed to cover.

A 4th grade math test would have high content validity if it covered all the skills taught in that grade. Experts(in this case, math teachers), would have to evaluate the content validity by comparing the test to the learning objectives.

Criterion validity evaluates how well a test measures the outcome it was designed to measure. An outcome can be, for example, the onset of a disease.

Criterion validity consists of two subtypes depending on the time at which the two measures (the criterion and your test) are obtained:

  • Concurrent validity is a validation strategy where the the scores of a test and the criterion are obtained at the same time .
  • Predictive validity is a validation strategy where the criterion variables are measured after the scores of the test.

Convergent validity and discriminant validity are both subtypes of construct validity . Together, they help you evaluate whether a test measures the concept it was designed to measure.

  • Convergent validity indicates whether a test that is designed to measure a particular construct correlates with other tests that assess the same or similar construct.
  • Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related. This type of validity is also called divergent validity .

You need to assess both in order to demonstrate construct validity. Neither one alone is sufficient for establishing construct validity.

The purpose of theory-testing mode is to find evidence in order to disprove, refine, or support a theory. As such, generalizability is not the aim of theory-testing mode.

Due to this, the priority of researchers in theory-testing mode is to eliminate alternative causes for relationships between variables . In other words, they prioritize internal validity over external validity , including ecological validity .

It’s often best to ask a variety of people to review your measurements. You can ask experts, such as other researchers, or laypeople, such as potential participants, to judge the face validity of tests.

While experts have a deep understanding of research methods , the people you’re studying can provide you with valuable insights you may have missed otherwise.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Middleton, F. (2023, June 22). The 4 Types of Validity in Research | Definitions & Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/methodology/types-of-validity/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, reliability vs. validity in research | difference, types and examples, construct validity | definition, types, & examples, external validity | definition, types, threats & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • Privacy Policy

Research Method

Home » Validity – Types, Examples and Guide

Validity – Types, Examples and Guide

Table of Contents

Validity

Validity is a fundamental concept in research, referring to the extent to which a test, measurement, or study accurately reflects or assesses the specific concept that the researcher is attempting to measure. Ensuring validity is crucial as it determines the trustworthiness and credibility of the research findings.

Research Validity

Research validity pertains to the accuracy and truthfulness of the research. It examines whether the research truly measures what it claims to measure. Without validity, research results can be misleading or erroneous, leading to incorrect conclusions and potentially flawed applications.

How to Ensure Validity in Research

Ensuring validity in research involves several strategies:

  • Clear Operational Definitions : Define variables clearly and precisely.
  • Use of Reliable Instruments : Employ measurement tools that have been tested for reliability.
  • Pilot Testing : Conduct preliminary studies to refine the research design and instruments.
  • Triangulation : Use multiple methods or sources to cross-verify results.
  • Control Variables : Control extraneous variables that might influence the outcomes.

Types of Validity

Validity is categorized into several types, each addressing different aspects of measurement accuracy.

Internal Validity

Internal validity refers to the degree to which the results of a study can be attributed to the treatments or interventions rather than other factors. It is about ensuring that the study is free from confounding variables that could affect the outcome.

External Validity

External validity concerns the extent to which the research findings can be generalized to other settings, populations, or times. High external validity means the results are applicable beyond the specific context of the study.

Construct Validity

Construct validity evaluates whether a test or instrument measures the theoretical construct it is intended to measure. It involves ensuring that the test is truly assessing the concept it claims to represent.

Content Validity

Content validity examines whether a test covers the entire range of the concept being measured. It ensures that the test items represent all facets of the concept.

Criterion Validity

Criterion validity assesses how well one measure predicts an outcome based on another measure. It is divided into two types:

  • Predictive Validity : How well a test predicts future performance.
  • Concurrent Validity : How well a test correlates with a currently existing measure.

Face Validity

Face validity refers to the extent to which a test appears to measure what it is supposed to measure, based on superficial inspection. While it is the least scientific measure of validity, it is important for ensuring that stakeholders believe in the test’s relevance.

Importance of Validity

Validity is crucial because it directly affects the credibility of research findings. Valid results ensure that conclusions drawn from research are accurate and can be trusted. This, in turn, influences the decisions and policies based on the research.

Examples of Validity

  • Internal Validity : A randomized controlled trial (RCT) where the random assignment of participants helps eliminate biases.
  • External Validity : A study on educational interventions that can be applied to different schools across various regions.
  • Construct Validity : A psychological test that accurately measures depression levels.
  • Content Validity : An exam that covers all topics taught in a course.
  • Criterion Validity : A job performance test that predicts future job success.

Where to Write About Validity in A Thesis

In a thesis, the methodology section should include discussions about validity. Here, you explain how you ensured the validity of your research instruments and design. Additionally, you may discuss validity in the results section, interpreting how the validity of your measurements affects your findings.

Applications of Validity

Validity has wide applications across various fields:

  • Education : Ensuring assessments accurately measure student learning.
  • Psychology : Developing tests that correctly diagnose mental health conditions.
  • Market Research : Creating surveys that accurately capture consumer preferences.

Limitations of Validity

While ensuring validity is essential, it has its limitations:

  • Complexity : Achieving high validity can be complex and resource-intensive.
  • Context-Specific : Some validity types may not be universally applicable across all contexts.
  • Subjectivity : Certain types of validity, like face validity, involve subjective judgments.

By understanding and addressing these aspects of validity, researchers can enhance the quality and impact of their studies, leading to more reliable and actionable results.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Test-Retest Reliability

Test-Retest Reliability – Methods, Formula and...

Parallel Forms Reliability

Parallel Forms Reliability – Methods, Example...

Internal_Consistency_Reliability

Internal Consistency Reliability – Methods...

Split-Half Reliability

Split-Half Reliability – Methods, Examples and...

External Validity

External Validity – Threats, Examples and Types

Construct Validity

Construct Validity – Types, Threats and Examples

  • How it works

researchprospect post subheader

Reliability and Validity – Definitions, Types & Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On October 26, 2023

A researcher must test the collected data before making any conclusion. Every  research design  needs to be concerned with reliability and validity to measure the quality of the research.

What is Reliability?

Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid.

Example: If you weigh yourself on a weighing scale throughout the day, you’ll get the same results. These are considered reliable results obtained through repeated measures.

Example: If a teacher conducts the same math test of students and repeats it next week with the same questions. If she gets the same score, then the reliability of the test is high.

What is the Validity?

Validity refers to the accuracy of the measurement. Validity shows how a specific test is suitable for a particular situation. If the results are accurate according to the researcher’s situation, explanation, and prediction, then the research is valid. 

If the method of measuring is accurate, then it’ll produce accurate results. If a method is reliable, then it’s valid. In contrast, if a method is not reliable, it’s not valid. 

Example:  Your weighing scale shows different results each time you weigh yourself within a day even after handling it carefully, and weighing before and after meals. Your weighing machine might be malfunctioning. It means your method had low reliability. Hence you are getting inaccurate or inconsistent results that are not valid.

Example:  Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from various participants, it means the validity of the questionnaire and product is high as it has high reliability.

Most of the time, validity is difficult to measure even though the process of measurement is reliable. It isn’t easy to interpret the real situation.

Example:  If the weighing scale shows the same result, let’s say 70 kg each time, even if your actual weight is 55 kg, then it means the weighing scale is malfunctioning. However, it was showing consistent results, but it cannot be considered as reliable. It means the method has low reliability.

Internal Vs. External Validity

One of the key features of randomised designs is that they have significantly high internal and external validity.

Internal validity  is the ability to draw a causal link between your treatment and the dependent variable of interest. It means the observed changes should be due to the experiment conducted, and any external factor should not influence the  variables .

Example: age, level, height, and grade.

External validity  is the ability to identify and generalise your study outcomes to the population at large. The relationship between the study’s situation and the situations outside the study is considered external validity.

Also, read about Inductive vs Deductive reasoning in this article.

Looking for reliable dissertation support?

We hear you.

  • Whether you want a full dissertation written or need help forming a dissertation proposal, we can help you with both.
  • Get different dissertation services at ResearchProspect and score amazing grades!

Threats to Interval Validity

Threat Definition Example
Confounding factors Unexpected events during the experiment that are not a part of treatment. If you feel the increased weight of your experiment participants is due to lack of physical activity, but it was actually due to the consumption of coffee with sugar.
Maturation The influence on the independent variable due to passage of time. During a long-term experiment, subjects may feel tired, bored, and hungry.
Testing The results of one test affect the results of another test. Participants of the first experiment may react differently during the second experiment.
Instrumentation Changes in the instrument’s collaboration Change in the   may give different results instead of the expected results.
Statistical regression Groups selected depending on the extreme scores are not as extreme on subsequent testing. Students who failed in the pre-final exam are likely to get passed in the final exams; they might be more confident and conscious than earlier.
Selection bias Choosing comparison groups without randomisation. A group of trained and efficient teachers is selected to teach children communication skills instead of randomly selecting them.
Experimental mortality Due to the extension of the time of the experiment, participants may leave the experiment. Due to multi-tasking and various competition levels, the participants may leave the competition because they are dissatisfied with the time-extension even if they were doing well.

Threats of External Validity

Threat Definition Example
Reactive/interactive effects of testing The participants of the pre-test may get awareness about the next experiment. The treatment may not be effective without the pre-test. Students who got failed in the pre-final exam are likely to get passed in the final exams; they might be more confident and conscious than earlier.
Selection of participants A group of participants selected with specific characteristics and the treatment of the experiment may work only on the participants possessing those characteristics If an experiment is conducted specifically on the health issues of pregnant women, the same treatment cannot be given to male participants.

How to Assess Reliability and Validity?

Reliability can be measured by comparing the consistency of the procedure and its results. There are various methods to measure validity and reliability. Reliability can be measured through  various statistical methods  depending on the types of validity, as explained below:

Types of Reliability

Type of reliability What does it measure? Example
Test-Retests It measures the consistency of the results at different points of time. It identifies whether the results are the same after repeated measures. Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from a various group of participants, it means the validity of the questionnaire and product is high as it has high test-retest reliability.
Inter-Rater It measures the consistency of the results at the same time by different raters (researchers) Suppose five researchers measure the academic performance of the same student by incorporating various questions from all the academic subjects and submit various results. It shows that the questionnaire has low inter-rater reliability.
Parallel Forms It measures Equivalence. It includes different forms of the same test performed on the same participants. Suppose the same researcher conducts the two different forms of tests on the same topic and the same students. The tests could be written and oral tests on the same topic. If results are the same, then the parallel-forms reliability of the test is high; otherwise, it’ll be low if the results are different.
Inter-Term It measures the consistency of the measurement. The results of the same tests are split into two halves and compared with each other. If there is a lot of difference in results, then the inter-term reliability of the test is low.

Types of Validity

As we discussed above, the reliability of the measurement alone cannot determine its validity. Validity is difficult to be measured even if the method is reliable. The following type of tests is conducted for measuring validity. 

Type of reliability What does it measure? Example
Content validity It shows whether all the aspects of the test/measurement are covered. A language test is designed to measure the writing and reading skills, listening, and speaking skills. It indicates that a test has high content validity.
Face validity It is about the validity of the appearance of a test or procedure of the test. The type of   included in the question paper, time, and marks allotted. The number of questions and their categories. Is it a good question paper to measure the academic performance of students?
Construct validity It shows whether the test is measuring the correct construct (ability/attribute, trait, skill) Is the test conducted to measure communication skills is actually measuring communication skills?
Criterion validity It shows whether the test scores obtained are similar to other measures of the same concept. The results obtained from a prefinal exam of graduate accurately predict the results of the later final exam. It shows that the test has high criterion validity.

Does your Research Methodology Have the Following?

  • Great Research/Sources
  • Perfect Language
  • Accurate Sources

If not, we can help. Our panel of experts makes sure to keep the 3 pillars of Research Methodology strong.

Does your Research Methodology Have the Following?

How to Increase Reliability?

  • Use an appropriate questionnaire to measure the competency level.
  • Ensure a consistent environment for participants
  • Make the participants familiar with the criteria of assessment.
  • Train the participants appropriately.
  • Analyse the research items regularly to avoid poor performance.

How to Increase Validity?

Ensuring Validity is also not an easy job. A proper functioning method to ensure validity is given below:

  • The reactivity should be minimised at the first concern.
  • The Hawthorne effect should be reduced.
  • The respondents should be motivated.
  • The intervals between the pre-test and post-test should not be lengthy.
  • Dropout rates should be avoided.
  • The inter-rater reliability should be ensured.
  • Control and experimental groups should be matched with each other.

How to Implement Reliability and Validity in your Thesis?

According to the experts, it is helpful if to implement the concept of reliability and Validity. Especially, in the thesis and the dissertation, these concepts are adopted much. The method for implementation given below:

Segments Explanation
All the planning about reliability and validity will be discussed here, including the chosen samples and size and the techniques used to measure reliability and validity.
Please talk about the level of reliability and validity of your results and their influence on values.
Discuss the contribution of other researchers to improve reliability and validity.

Frequently Asked Questions

What is reliability and validity in research.

Reliability in research refers to the consistency and stability of measurements or findings. Validity relates to the accuracy and truthfulness of results, measuring what the study intends to. Both are crucial for trustworthy and credible research outcomes.

What is validity?

Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. Without validity, research findings may be irrelevant, misleading, or incorrect, limiting their applicability and credibility.

What is reliability?

Reliability in research refers to the consistency and stability of measurements over time. If a study is reliable, repeating the experiment or test under the same conditions should produce similar results. Without reliability, findings become unpredictable and lack dependability, potentially undermining the study’s credibility and generalisability.

What is reliability in psychology?

In psychology, reliability refers to the consistency of a measurement tool or test. A reliable psychological assessment produces stable and consistent results across different times, situations, or raters. It ensures that an instrument’s scores are not due to random error, making the findings dependable and reproducible in similar conditions.

What is test retest reliability?

Test-retest reliability assesses the consistency of measurements taken by a test over time. It involves administering the same test to the same participants at two different points in time and comparing the results. A high correlation between the scores indicates that the test produces stable and consistent results over time.

How to improve reliability of an experiment?

  • Standardise procedures and instructions.
  • Use consistent and precise measurement tools.
  • Train observers or raters to reduce subjective judgments.
  • Increase sample size to reduce random errors.
  • Conduct pilot studies to refine methods.
  • Repeat measurements or use multiple methods.
  • Address potential sources of variability.

What is the difference between reliability and validity?

Reliability refers to the consistency and repeatability of measurements, ensuring results are stable over time. Validity indicates how well an instrument measures what it’s intended to measure, ensuring accuracy and relevance. While a test can be reliable without being valid, a valid test must inherently be reliable. Both are essential for credible research.

Are interviews reliable and valid?

Interviews can be both reliable and valid, but they are susceptible to biases. The reliability and validity depend on the design, structure, and execution of the interview. Structured interviews with standardised questions improve reliability. Validity is enhanced when questions accurately capture the intended construct and when interviewer biases are minimised.

Are IQ tests valid and reliable?

IQ tests are generally considered reliable, producing consistent scores over time. Their validity, however, is a subject of debate. While they effectively measure certain cognitive skills, whether they capture the entirety of “intelligence” or predict success in all life areas is contested. Cultural bias and over-reliance on tests are also concerns.

Are questionnaires reliable and valid?

Questionnaires can be both reliable and valid if well-designed. Reliability is achieved when they produce consistent results over time or across similar populations. Validity is ensured when questions accurately measure the intended construct. However, factors like poorly phrased questions, respondent bias, and lack of standardisation can compromise their reliability and validity.

You May Also Like

What are the different types of research you can use in your dissertation? Here are some guidelines to help you choose a research strategy that would make your research more credible.

Textual analysis is the method of analysing and understanding the text. We need to look carefully at the text to identify the writer’s context and message.

Content analysis is used to identify specific words, patterns, concepts, themes, phrases, or sentences within the content in the recorded communication.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Validity in research: a guide to measuring the right things

Last updated

27 February 2023

Reviewed by

Cathy Heath

Short on time? Get an AI generated summary of this article instead

Validity is necessary for all types of studies ranging from market validation of a business or product idea to the effectiveness of medical trials and procedures. So, how can you determine whether your research is valid? This guide can help you understand what validity is, the types of validity in research, and the factors that affect research validity.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • What is validity?

In the most basic sense, validity is the quality of being based on truth or reason. Valid research strives to eliminate the effects of unrelated information and the circumstances under which evidence is collected. 

Validity in research is the ability to conduct an accurate study with the right tools and conditions to yield acceptable and reliable data that can be reproduced. Researchers rely on carefully calibrated tools for precise measurements. However, collecting accurate information can be more of a challenge.

Studies must be conducted in environments that don't sway the results to achieve and maintain validity. They can be compromised by asking the wrong questions or relying on limited data. 

Why is validity important in research?

Research is used to improve life for humans. Every product and discovery, from innovative medical breakthroughs to advanced new products, depends on accurate research to be dependable. Without it, the results couldn't be trusted, and products would likely fail. Businesses would lose money, and patients couldn't rely on medical treatments. 

While wasting money on a lousy product is a concern, lack of validity paints a much grimmer picture in the medical field or producing automobiles and airplanes, for example. Whether you're launching an exciting new product or conducting scientific research, validity can determine success and failure.

  • What is reliability?

Reliability is the ability of a method to yield consistency. If the same result can be consistently achieved by using the same method to measure something, the measurement method is said to be reliable. For example, a thermometer that shows the same temperatures each time in a controlled environment is reliable.

While high reliability is a part of measuring validity, it's only part of the puzzle. If the reliable thermometer hasn't been properly calibrated and reliably measures temperatures two degrees too high, it doesn't provide a valid (accurate) measure of temperature. 

Similarly, if a researcher uses a thermometer to measure weight, the results won't be accurate because it's the wrong tool for the job. 

  • How are reliability and validity assessed?

While measuring reliability is a part of measuring validity, there are distinct ways to assess both measurements for accuracy. 

How is reliability measured?

These measures of consistency and stability help assess reliability, including:

Consistency and stability of the same measure when repeated multiple times and conditions

Consistency and stability of the measure across different test subjects

Consistency and stability of results from different parts of a test designed to measure the same thing

How is validity measured?

Since validity refers to how accurately a method measures what it is intended to measure, it can be difficult to assess the accuracy. Validity can be estimated by comparing research results to other relevant data or theories.

The adherence of a measure to existing knowledge of how the concept is measured

The ability to cover all aspects of the concept being measured

The relation of the result in comparison with other valid measures of the same concept

  • What are the types of validity in a research design?

Research validity is broadly gathered into two groups: internal and external. Yet, this grouping doesn't clearly define the different types of validity. Research validity can be divided into seven distinct groups.

Face validity : A test that appears valid simply because of the appropriateness or relativity of the testing method, included information, or tools used.

Content validity : The determination that the measure used in research covers the full domain of the content.

Construct validity : The assessment of the suitability of the measurement tool to measure the activity being studied.

Internal validity : The assessment of how your research environment affects measurement results. This is where other factors can’t explain the extent of an observed cause-and-effect response.

External validity : The extent to which the study will be accurate beyond the sample and the level to which it can be generalized in other settings, populations, and measures.

Statistical conclusion validity: The determination of whether a relationship exists between procedures and outcomes (appropriate sampling and measuring procedures along with appropriate statistical tests).

Criterion-related validity : A measurement of the quality of your testing methods against a criterion measure (like a “gold standard” test) that is measured at the same time.

  • Examples of validity

Like different types of research and the various ways to measure validity, examples of validity can vary widely. These include:

A questionnaire may be considered valid because each question addresses specific and relevant aspects of the study subject.

In a brand assessment study, researchers can use comparison testing to verify the results of an initial study. For example, the results from a focus group response about brand perception are considered more valid when the results match that of a questionnaire answered by current and potential customers.

A test to measure a class of students' understanding of the English language contains reading, writing, listening, and speaking components to cover the full scope of how language is used.

  • Factors that affect research validity

Certain factors can affect research validity in both positive and negative ways. By understanding the factors that improve validity and those that threaten it, you can enhance the validity of your study. These include:

Random selection of participants vs. the selection of participants that are representative of your study criteria

Blinding with interventions the participants are unaware of (like the use of placebos)

Manipulating the experiment by inserting a variable that will change the results

Randomly assigning participants to treatment and control groups to avoid bias

Following specific procedures during the study to avoid unintended effects

Conducting a study in the field instead of a laboratory for more accurate results

Replicating the study with different factors or settings to compare results

Using statistical methods to adjust for inconclusive data

What are the common validity threats in research, and how can their effects be minimized or nullified?

Research validity can be difficult to achieve because of internal and external threats that produce inaccurate results. These factors can jeopardize validity.

History: Events that occur between an early and later measurement

Maturation: The passage of time in a study can include data on actions that would have naturally occurred outside of the settings of the study

Repeated testing: The outcome of repeated tests can change the outcome of followed tests

Selection of subjects: Unconscious bias which can result in the selection of uniform comparison groups

Statistical regression: Choosing subjects based on extremes doesn't yield an accurate outcome for the majority of individuals

Attrition: When the sample group is diminished significantly during the course of the study

Maturation: When subjects mature during the study, and natural maturation is awarded to the effects of the study

While some validity threats can be minimized or wholly nullified, removing all threats from a study is impossible. For example, random selection can remove unconscious bias and statistical regression. 

Researchers can even hope to avoid attrition by using smaller study groups. Yet, smaller study groups could potentially affect the research in other ways. The best practice for researchers to prevent validity threats is through careful environmental planning and t reliable data-gathering methods. 

  • How to ensure validity in your research

Researchers should be mindful of the importance of validity in the early planning stages of any study to avoid inaccurate results. Researchers must take the time to consider tools and methods as well as how the testing environment matches closely with the natural environment in which results will be used.

The following steps can be used to ensure validity in research:

Choose appropriate methods of measurement

Use appropriate sampling to choose test subjects

Create an accurate testing environment

How do you maintain validity in research?

Accurate research is usually conducted over a period of time with different test subjects. To maintain validity across an entire study, you must take specific steps to ensure that gathered data has the same levels of accuracy. 

Consistency is crucial for maintaining validity in research. When researchers apply methods consistently and standardize the circumstances under which data is collected, validity can be maintained across the entire study.

Is there a need for validation of the research instrument before its implementation?

An essential part of validity is choosing the right research instrument or method for accurate results. Consider the thermometer that is reliable but still produces inaccurate results. You're unlikely to achieve research validity without activities like calibration, content, and construct validity.

  • Understanding research validity for more accurate results

Without validity, research can't provide the accuracy necessary to deliver a useful study. By getting a clear understanding of validity in research, you can take steps to improve your research skills and achieve more accurate results.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 22 August 2024

Last updated: 5 February 2023

Last updated: 16 August 2024

Last updated: 9 March 2023

Last updated: 30 April 2024

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 4 July 2024

Last updated: 6 March 2024

Last updated: 5 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next, log in or sign up.

Get started for free

what is validity of data in research

Validity & Reliability In Research

A Plain-Language Explanation (With Examples)

By: Derek Jansen (MBA) | Expert Reviewer: Kerryn Warren (PhD) | September 2023

Validity and reliability are two related but distinctly different concepts within research. Understanding what they are and how to achieve them is critically important to any research project. In this post, we’ll unpack these two concepts as simply as possible.

This post is based on our popular online course, Research Methodology Bootcamp . In the course, we unpack the basics of methodology  using straightfoward language and loads of examples. If you’re new to academic research, you definitely want to use this link to get 50% off the course (limited-time offer).

Overview: Validity & Reliability

  • The big picture
  • Validity 101
  • Reliability 101 
  • Key takeaways

First, The Basics…

First, let’s start with a big-picture view and then we can zoom in to the finer details.

Validity and reliability are two incredibly important concepts in research, especially within the social sciences. Both validity and reliability have to do with the measurement of variables and/or constructs – for example, job satisfaction, intelligence, productivity, etc. When undertaking research, you’ll often want to measure these types of constructs and variables and, at the simplest level, validity and reliability are about ensuring the quality and accuracy of those measurements .

As you can probably imagine, if your measurements aren’t accurate or there are quality issues at play when you’re collecting your data, your entire study will be at risk. Therefore, validity and reliability are very important concepts to understand (and to get right). So, let’s unpack each of them.

Free Webinar: Research Methodology 101

What Is Validity?

In simple terms, validity (also called “construct validity”) is all about whether a research instrument accurately measures what it’s supposed to measure .

For example, let’s say you have a set of Likert scales that are supposed to quantify someone’s level of overall job satisfaction. If this set of scales focused purely on only one dimension of job satisfaction, say pay satisfaction, this would not be a valid measurement, as it only captures one aspect of the multidimensional construct. In other words, pay satisfaction alone is only one contributing factor toward overall job satisfaction, and therefore it’s not a valid way to measure someone’s job satisfaction.

what is validity of data in research

Oftentimes in quantitative studies, the way in which the researcher or survey designer interprets a question or statement can differ from how the study participants interpret it . Given that respondents don’t have the opportunity to ask clarifying questions when taking a survey, it’s easy for these sorts of misunderstandings to crop up. Naturally, if the respondents are interpreting the question in the wrong way, the data they provide will be pretty useless . Therefore, ensuring that a study’s measurement instruments are valid – in other words, that they are measuring what they intend to measure – is incredibly important.

There are various types of validity and we’re not going to go down that rabbit hole in this post, but it’s worth quickly highlighting the importance of making sure that your research instrument is tightly aligned with the theoretical construct you’re trying to measure .  In other words, you need to pay careful attention to how the key theories within your study define the thing you’re trying to measure – and then make sure that your survey presents it in the same way.

For example, sticking with the “job satisfaction” construct we looked at earlier, you’d need to clearly define what you mean by job satisfaction within your study (and this definition would of course need to be underpinned by the relevant theory). You’d then need to make sure that your chosen definition is reflected in the types of questions or scales you’re using in your survey . Simply put, you need to make sure that your survey respondents are perceiving your key constructs in the same way you are. Or, even if they’re not, that your measurement instrument is capturing the necessary information that reflects your definition of the construct at hand.

If all of this talk about constructs sounds a bit fluffy, be sure to check out Research Methodology Bootcamp , which will provide you with a rock-solid foundational understanding of all things methodology-related. Remember, you can take advantage of our 60% discount offer using this link.

Need a helping hand?

what is validity of data in research

What Is Reliability?

As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability . In other words, reliability reflects the degree to which a measurement instrument produces consistent results when applied repeatedly to the same phenomenon , under the same conditions .

As you can probably imagine, a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable) than one that doesn’t – in other words, it can be trusted to provide consistent measurements . And that, of course, is what you want when undertaking empirical research. If you think about it within a more domestic context, just imagine if you found that your bathroom scale gave you a different number every time you hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your body weight 🙂

It’s worth mentioning that reliability also extends to the person using the measurement instrument . For example, if two researchers use the same instrument (let’s say a measuring tape) and they get different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring tape. So, when you think about reliability, consider both the instrument and the researcher as part of the equation.

As with validity, there are various types of reliability and various tests that can be used to assess the reliability of an instrument. A popular one that you’ll likely come across for survey instruments is Cronbach’s alpha , which is a statistical measure that quantifies the degree to which items within an instrument (for example, a set of Likert scales) measure the same underlying construct . In other words, Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same concept . 

Reliability reflects whether an instrument produces consistent results when applied to the same phenomenon, under the same conditions.

Recap: Key Takeaways

Alright, let’s quickly recap to cement your understanding of validity and reliability:

  • Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed to measure
  • Reliability is concerned with whether that measurement is consistent and stable when measuring the same phenomenon under the same conditions.

In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-quality, accurate data that help you answer your research questions . So, be sure to always pay careful attention to the validity and reliability of your measurement instruments when collecting and analysing data. As the adage goes, “rubbish in, rubbish out” – make sure that your data inputs are rock-solid.

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

Kennedy Sinkamba

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS.

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS AND I HAVE GREATLY BENEFITED FROM THE CONTENT.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Validity In Psychology Research: Types & Examples

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

In psychology research, validity refers to the extent to which a test or measurement tool accurately measures what it’s intended to measure. It ensures that the research findings are genuine and not due to extraneous factors.

Validity can be categorized into different types based on internal and external validity .

The concept of validity was formulated by Kelly (1927, p. 14), who stated that a test is valid if it measures what it claims to measure. For example, a test of intelligence should measure intelligence and not something else (such as memory).

Internal and External Validity In Research

Internal validity refers to whether the effects observed in a study are due to the manipulation of the independent variable and not some other confounding factor.

In other words, there is a causal relationship between the independent and dependent variables .

Internal validity can be improved by controlling extraneous variables, using standardized instructions, counterbalancing, and eliminating demand characteristics and investigator effects.

External validity refers to the extent to which the results of a study can be generalized to other settings (ecological validity), other people (population validity), and over time (historical validity).

External validity can be improved by setting experiments more naturally and using random sampling to select participants.

Types of Validity In Psychology

Two main categories of validity are used to assess the validity of the test (i.e., questionnaire, interview, IQ test, etc.): Content and criterion.

  • Content validity refers to the extent to which a test or measurement represents all aspects of the intended content domain. It assesses whether the test items adequately cover the topic or concept.
  • Criterion validity assesses the performance of a test based on its correlation with a known external criterion or outcome. It can be further divided into concurrent (measured at the same time) and predictive (measuring future performance) validity.

table showing the different types of validity

Face Validity

Face validity is simply whether the test appears (at face value) to measure what it claims to. This is the least sophisticated measure of content-related validity, and is a superficial and subjective assessment based on appearance.

Tests wherein the purpose is clear, even to naïve respondents, are said to have high face validity. Accordingly, tests wherein the purpose is unclear have low face validity (Nevo, 1985).

A direct measurement of face validity is obtained by asking people to rate the validity of a test as it appears to them. This rater could use a Likert scale to assess face validity.

For example:

  • The test is extremely suitable for a given purpose
  • The test is very suitable for that purpose;
  • The test is adequate
  • The test is inadequate
  • The test is irrelevant and, therefore, unsuitable

It is important to select suitable people to rate a test (e.g., questionnaire, interview, IQ test, etc.). For example, individuals who actually take the test would be well placed to judge its face validity.

Also, people who work with the test could offer their opinion (e.g., employers, university administrators, employers). Finally, the researcher could use members of the general public with an interest in the test (e.g., parents of testees, politicians, teachers, etc.).

The face validity of a test can be considered a robust construct only if a reasonable level of agreement exists among raters.

It should be noted that the term face validity should be avoided when the rating is done by an “expert,” as content validity is more appropriate.

Having face validity does not mean that a test really measures what the researcher intends to measure, but only in the judgment of raters that it appears to do so. Consequently, it is a crude and basic measure of validity.

A test item such as “ I have recently thought of killing myself ” has obvious face validity as an item measuring suicidal cognitions and may be useful when measuring symptoms of depression.

However, the implication of items on tests with clear face validity is that they are more vulnerable to social desirability bias. Individuals may manipulate their responses to deny or hide problems or exaggerate behaviors to present a positive image of themselves.

It is possible for a test item to lack face validity but still have general validity and measure what it claims to measure. This is good because it reduces demand characteristics and makes it harder for respondents to manipulate their answers.

For example, the test item “ I believe in the second coming of Christ ” would lack face validity as a measure of depression (as the purpose of the item is unclear).

This item appeared on the first version of The Minnesota Multiphasic Personality Inventory (MMPI) and loaded on the depression scale.

Because most of the original normative sample of the MMPI were good Christians, only a depressed Christian would think Christ is not coming back. Thus, for this particular religious sample, the item does have general validity but not face validity.

Construct Validity

Construct validity assesses how well a test or measure represents and captures an abstract theoretical concept, known as a construct. It indicates the degree to which the test accurately reflects the construct it intends to measure, often evaluated through relationships with other variables and measures theoretically connected to the construct.

Construct validity was invented by Cronbach and Meehl (1955). This type of content-related validity refers to the extent to which a test captures a specific theoretical construct or trait, and it overlaps with some of the other aspects of validity

Construct validity does not concern the simple, factual question of whether a test measures an attribute.

Instead, it is about the complex question of whether test score interpretations are consistent with a nomological network involving theoretical and observational terms (Cronbach & Meehl, 1955).

To test for construct validity, it must be demonstrated that the phenomenon being measured actually exists. So, the construct validity of a test for intelligence, for example, depends on a model or theory of intelligence .

Construct validity entails demonstrating the power of such a construct to explain a network of research findings and to predict further relationships.

The more evidence a researcher can demonstrate for a test’s construct validity, the better. However, there is no single method of determining the construct validity of a test.

Instead, different methods and approaches are combined to present the overall construct validity of a test. For example, factor analysis and correlational methods can be used.

Convergent validity

Convergent validity is a subtype of construct validity. It assesses the degree to which two measures that theoretically should be related are related.

It demonstrates that measures of similar constructs are highly correlated. It helps confirm that a test accurately measures the intended construct by showing its alignment with other tests designed to measure the same or similar constructs.

For example, suppose there are two different scales used to measure self-esteem:

Scale A and Scale B. If both scales effectively measure self-esteem, then individuals who score high on Scale A should also score high on Scale B, and those who score low on Scale A should score similarly low on Scale B.

If the scores from these two scales show a strong positive correlation, then this provides evidence for convergent validity because it indicates that both scales seem to measure the same underlying construct of self-esteem.

Concurrent Validity (i.e., occurring at the same time)

Concurrent validity evaluates how well a test’s results correlate with the results of a previously established and accepted measure, when both are administered at the same time.

It helps in determining whether a new measure is a good reflection of an established one without waiting to observe outcomes in the future.

If the new test is validated by comparison with a currently existing criterion, we have concurrent validity.

Very often, a new IQ or personality test might be compared with an older but similar test known to have good validity already.

Predictive Validity

Predictive validity assesses how well a test predicts a criterion that will occur in the future. It measures the test’s ability to foresee the performance of an individual on a related criterion measured at a later point in time. It gauges the test’s effectiveness in predicting subsequent real-world outcomes or results.

For example, a prediction may be made on the basis of a new intelligence test that high scorers at age 12 will be more likely to obtain university degrees several years later. If the prediction is born out, then the test has predictive validity.

Cronbach, L. J., and Meehl, P. E. (1955) Construct validity in psychological tests. Psychological Bulletin , 52, 281-302.

Hathaway, S. R., & McKinley, J. C. (1943). Manual for the Minnesota Multiphasic Personality Inventory . New York: Psychological Corporation.

Kelley, T. L. (1927). Interpretation of educational measurements. New York : Macmillan.

Nevo, B. (1985). Face validity revisited . Journal of Educational Measurement , 22(4), 287-293.

Print Friendly, PDF & Email

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • The 4 Types of Validity | Types, Definitions & Examples

The 4 Types of Validity | Types, Definitions & Examples

Published on 3 May 2022 by Fiona Middleton . Revised on 10 October 2022.

In quantitative research , you have to consider the reliability and validity of your methods and measurements.

Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid. There are four main types of validity:

  • Construct validity : Does the test measure the concept that it’s intended to measure?
  • Content validity : Is the test fully representative of what it aims to measure?
  • Face validity : Does the content of the test appear to be suitable to its aims?
  • Criterion validity : Do the results accurately measure the concrete outcome they are designed to measure?

Note that this article deals with types of test validity, which determine the accuracy of the actual components of a measure. If you are doing experimental research, you also need to consider internal and external validity , which deal with the experimental design and the generalisability of results.

Table of contents

Construct validity, content validity, face validity, criterion validity.

Construct validity evaluates whether a measurement tool really represents the thing we are interested in measuring. It’s central to establishing the overall validity of a method.

What is a construct?

A construct refers to a concept or characteristic that can’t be directly observed but can be measured by observing other indicators that are associated with it.

Constructs can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or depression; they can also be broader concepts applied to organisations or social groups, such as gender equality, corporate social responsibility, or freedom of speech.

What is construct validity?

Construct validity is about ensuring that the method of measurement matches the construct you want to measure. If you develop a questionnaire to diagnose depression, you need to know: does the questionnaire really measure the construct of depression? Or is it actually measuring the respondent’s mood, self-esteem, or some other construct?

To achieve construct validity, you have to ensure that your indicators and measurements are carefully developed based on relevant existing knowledge. The questionnaire must include only relevant questions that measure known indicators of depression.

The other types of validity described below can all be considered as forms of evidence for construct validity.

Prevent plagiarism, run a free check.

Content validity assesses whether a test is representative of all aspects of the construct.

To produce valid results, the content of a test, survey, or measurement method must cover all relevant parts of the subject it aims to measure. If some aspects are missing from the measurement (or if irrelevant aspects are included), the validity is threatened.

Face validity considers how suitable the content of a test seems to be on the surface. It’s similar to content validity, but face validity is a more informal and subjective assessment.

As face validity is a subjective measure, it’s often considered the weakest form of validity. However, it can be useful in the initial stages of developing a method.

Criterion validity evaluates how well a test can predict a concrete outcome, or how well the results of your test approximate the results of another test.

What is a criterion variable?

A criterion variable is an established and effective measurement that is widely considered valid, sometimes referred to as a ‘gold standard’ measurement. Criterion variables can be very difficult to find.

What is criterion validity?

To evaluate criterion validity, you calculate the correlation between the results of your measurement and the results of the criterion measurement. If there is a high correlation, this gives a good indication that your test is measuring what it intends to measure.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Middleton, F. (2022, October 10). The 4 Types of Validity | Types, Definitions & Examples. Scribbr. Retrieved 3 September 2024, from https://www.scribbr.co.uk/research-methods/validity-types/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, qualitative vs quantitative research | examples & methods, a quick guide to experimental design | 5 steps & examples, what is qualitative research | methods & examples.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Reliability and validity: Importance in Medical Research

Affiliations.

  • 1 Al-Nafees Medical College,Isra University, Islamabad, Pakistan.
  • 2 Fauji Foundation Hospital, Foundation University Medical College, Islamabad, Pakistan.
  • PMID: 34974579
  • DOI: 10.47391/JPMA.06-861

Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtained and the degree to which any measuring tool controls random error. The current narrative review was planned to discuss the importance of reliability and validity of data-collection or measurement techniques used in research. It describes and explores comprehensively the reliability and validity of research instruments and also discusses different forms of reliability and validity with concise examples. An attempt has been taken to give a brief literature review regarding the significance of reliability and validity in medical sciences.

Keywords: Validity, Reliability, Medical research, Methodology, Assessment, Research tools..

PubMed Disclaimer

Similar articles

  • Principles and methods of validity and reliability testing of questionnaires used in social and health science researches. Bolarinwa OA. Bolarinwa OA. Niger Postgrad Med J. 2015 Oct-Dec;22(4):195-201. doi: 10.4103/1117-1936.173959. Niger Postgrad Med J. 2015. PMID: 26776330
  • The measurement of collaboration within healthcare settings: a systematic review of measurement properties of instruments. Walters SJ, Stern C, Robertson-Malt S. Walters SJ, et al. JBI Database System Rev Implement Rep. 2016 Apr;14(4):138-97. doi: 10.11124/JBISRIR-2016-2159. JBI Database System Rev Implement Rep. 2016. PMID: 27532315 Review.
  • Evaluation of research studies. Part IV: Validity and reliability--concepts and application. Fullerton JT. Fullerton JT. J Nurse Midwifery. 1993 Mar-Apr;38(2):121-5. doi: 10.1016/0091-2182(93)90146-8. J Nurse Midwifery. 1993. PMID: 8492191
  • Validity and reliability of measurement instruments used in research. Kimberlin CL, Winterstein AG. Kimberlin CL, et al. Am J Health Syst Pharm. 2008 Dec 1;65(23):2276-84. doi: 10.2146/ajhp070364. Am J Health Syst Pharm. 2008. PMID: 19020196 Review.
  • [Psychometric characteristics of questionnaires designed to assess the knowledge, perceptions and practices of health care professionals with regards to alcoholic patients]. Jaussent S, Labarère J, Boyer JP, François P. Jaussent S, et al. Encephale. 2004 Sep-Oct;30(5):437-46. doi: 10.1016/s0013-7006(04)95458-9. Encephale. 2004. PMID: 15627048 Review. French.
  • A psychometric assessment of a novel scale for evaluating vaccination attitudes amidst a major public health crisis. Cheng L, Kong J, Xie X, Zhang F. Cheng L, et al. Sci Rep. 2024 May 4;14(1):10250. doi: 10.1038/s41598-024-61028-z. Sci Rep. 2024. PMID: 38704420 Free PMC article.
  • Test-Retest Reliability of Isokinetic Strength in Lower Limbs under Single and Dual Task Conditions in Women with Fibromyalgia. Gomez-Alvaro MC, Leon-Llamas JL, Melo-Alonso M, Villafaina S, Domínguez-Muñoz FJ, Gusi N. Gomez-Alvaro MC, et al. J Clin Med. 2024 Feb 24;13(5):1288. doi: 10.3390/jcm13051288. J Clin Med. 2024. PMID: 38592707 Free PMC article.
  • Bridging, Mapping, and Addressing Research Gaps in Health Sciences: The Naqvi-Gabr Research Gap Framework. Naqvi WM, Gabr M, Arora SP, Mishra GV, Pashine AA, Quazi Syed Z. Naqvi WM, et al. Cureus. 2024 Mar 8;16(3):e55827. doi: 10.7759/cureus.55827. eCollection 2024 Mar. Cureus. 2024. PMID: 38590484 Free PMC article. Review.
  • Reliability, validity, and responsiveness of the simplified Chinese version of the knee injury and Osteoarthritis Outcome Score in patients after total knee arthroplasty. Yao R, Yang L, Wang J, Zhou Q, Li X, Yan Z, Fu Y. Yao R, et al. Heliyon. 2024 Feb 21;10(5):e26786. doi: 10.1016/j.heliyon.2024.e26786. eCollection 2024 Mar 15. Heliyon. 2024. PMID: 38434342 Free PMC article.
  • Psychometric evaluation of the Chinese version of the stressors in breast cancer scale: a translation and validation study. Hu W, Bao J, Yang X, Ye M. Hu W, et al. BMC Public Health. 2024 Feb 9;24(1):425. doi: 10.1186/s12889-024-18000-3. BMC Public Health. 2024. PMID: 38336690 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Pakistan Medical Association

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Reliability vs Validity: Differences & Examples

By Jim Frost 1 Comment

Reliability and validity are criteria by which researchers assess measurement quality. Measuring a person or item involves assigning scores to represent an attribute. This process creates the data that we analyze. However, to provide meaningful research results, that data must be good. And not all data are good!

Check mark indicating that the researchers have assessed measurement reliability and validity.

For data to be good enough to allow you to draw meaningful conclusions from a research study, they must be reliable and valid. What are the properties of good measurements? In a nutshell, reliability relates to the consistency of measures, and validity addresses whether the measurements are quantifying the correct attribute.

In this post, learn about reliability vs. validity, their relationship, and the various ways to assess them.

Learn more about Experimental Design: Definition, Types, and Examples .

Reliability

Reliability refers to the consistency of the measure. High reliability indicates that the measurement system produces similar results under the same conditions. If you measure the same item or person multiple times, you want to obtain comparable values. They are reproducible.

If you take measurements multiple times and obtain very different values, your data are unreliable. Numbers are meaningless if repeated measures do not produce similar values. What’s the correct value? No one knows! This inconsistency hampers your ability to draw conclusions and understand relationships.

Suppose you have a bathroom scale that displays very inconsistent results from one time to the next. It’s very unreliable. It would be hard to use your scale to determine your correct weight and to know whether you are losing weight.

Inadequate data collection procedures and low-quality or defective data collection tools can produce unreliable data. Additionally, some characteristics are more challenging to measure reliably. For example, the length of an object is concrete. On the other hand, a psychological construct, such as conscientiousness, depression, and self-esteem, can be trickier to measure reliably.

When assessing studies, evaluate data collection methodologies and consider whether any issues undermine their reliability.

Validity refers to whether the measurements reflect what they’re supposed to measure. This concept is a broader issue than reliability. Researchers need to consider whether they’re measuring what they think they’re measuring. Or do the measurements reflect something else? Does the instrument measure what it says it measures? It’s a question that addresses the appropriateness of the data rather than whether measurements are repeatable.

Validity is a smaller concern for tangible measurements like height and weight. You might have a biased bathroom scale if it tends to read too high or too low—but it still measures weight. Validity is a bigger concern in the social sciences, where you can measure elusive concepts such as positive outlook and self-esteem. If you’re assessing the psychological construct of conscientiousness, you need to confirm that the instrument poses questions that appraise this attribute rather than, say, obedience.

Reliability vs Validity

A measurement must be reliable first before it has a chance of being valid. After all, if you don’t obtain consistent measurements for the same object or person under similar conditions, it can’t be valid. If your scale displays a different weight every time you step on it, it’s unreliable, and it is also invalid.

So, having reliable measurements is the first step towards having valid measures. Validity is necessary for reliability, but it is insufficient by itself.

Suppose you have a reliable measurement. You step on your scale a few times in a short period, and it displays very similar weights. It’s reliable. But the weight might be incorrect.

Just because you can measure the same object multiple times and get consistent values, it does not necessarily indicate that the measurements reflect the desired characteristic.

How can you determine whether measurements are both valid and reliable? Assessing reliability vs. validity is the topic for the rest of this post!

Similar measurements for the same person/item under the same conditions. Measurements reflect what they’re supposed to measure.
Stability of results across time, between observers, within the test. Measures have appropriate relationships to theories, similar measures, and different measures.
Unreliable measurements typically cannot be valid. Valid measurements are also reliable.

How to Assess Reliability

Reliability relates to measurement consistency. To evaluate reliability, analysts assess consistency over time, within the measurement instrument, and between different observers. These types of consistency are also known as—test-retest, internal, and inter-rater reliability. Typically, appraising these forms of reliability involves taking multiple measures of the same person, object, or construct and assessing scatterplots and correlations of the measurements. Reliable measurements have high correlations because the scores are similar.

Test-Retest Reliability

Analysts often assume that measurements should be consistent across a short time. If you measure your height twice over a couple of days, you should obtain roughly the same measurements.

To assess test-retest reliability, the experimenters typically measure a group of participants on two occasions within a few days. Usually, you’ll evaluate the reliability of the repeated measures using scatterplots and correlation coefficients . You expect to see high correlations and tight lines on the scatterplot when the characteristic you measure is consistent over a short period, and you have a reliable measurement system.

This type of reliability establishes the degree to which a test can produce stable, consistent scores across time. However, in practice, measurement instruments are never entirely consistent.

Keep in mind that some characteristics should not be consistent across time. A good example is your mood, which can change from moment to moment. A test-retest assessment of mood is not likely to produce a high correlation even though it might be a useful measurement instrument.

Internal Reliability

This type of reliability assesses consistency across items within a single instrument. Researchers evaluate internal reliability when they’re using instruments such as a survey or personality inventories. In these instruments, multiple items relate to a single construct. Questions that measure the same characteristic should have a high correlation. People who indicate they are risk-takers should also note that they participate in dangerous activities. If items that supposedly measure the same underlying construct have a low correlation, they are not consistent with each other and might not measure the same thing.

Inter-Rater Reliability

This type of reliability assesses consistency across different observers, judges, or evaluators. When various observers produce similar measurements for the same item or person, their scores are highly correlated. Inter-rater reliability is essential when the subjectivity or skill of the evaluator plays a role. For example, assessing the quality of a writing sample involves subjectivity. Researchers can employ rating guidelines to reduce subjectivity. Comparing the scores from different evaluators for the same writing sample helps establish the measure’s reliability. Learn more about inter-rater reliability .

Related post : Interpreting Correlation

Cronbach’s Alpha

Cronbach’s alpha measures the internal consistency, or reliability, of a set of survey items. Use this statistic to help determine whether a collection of items consistently measures the same characteristic. Learn more about Cronbach’s Alpha .

Gage R&R Studies

These studies evaluation a measurement systems reliability and identifies sources of variation that can help you target improvement efforts effectively. Learn more about Gage R&R Studies .

How to Assess Validity

Validity is more difficult to evaluate than reliability. After all, with reliability, you only assess whether the measures are consistent across time, within the instrument, and between observers. On the other hand, evaluating validity involves determining whether the instrument measures the correct characteristic. This process frequently requires examining relationships between these measurements, other data, and theory. Validating a measurement instrument requires you to use a wide range of subject-area knowledge and different types of constructs to determine whether the measurements from your instrument fit in with the bigger picture!

An instrument with high validity produces measurements that correctly fit the larger picture with other constructs. Validity assesses whether the web of empirical relationships aligns with the theoretical relationships.

The measurements must have a positive relationship with other measures of the same construct. Additionally, they need to correlate in the correct direction (positively or negatively) with the theoretically correct constructs. Finally, the measures should have no relationship with unrelated constructs.

If you need more detailed information, read my post that focuses on Measurement Validity . In that post, I cover the various types, how to evaluate them, and provide examples.

Experimental validity relates to experimental designs and methods. To learn about that topic, read my post about Internal and External Validity .

Whew, that’s a lot of information about reliability vs. validity. Using these concepts, you can determine whether a measurement instrument produces good data!

Share this:

what is validity of data in research

Reader Interactions

' src=

August 17, 2022 at 3:53 am

Good way of expressing what validity and reliabiliy with building examples.

Comments and Questions Cancel reply

what is validity of data in research

Validity vs. Reliability in Research: What's the Difference?

what is validity of data in research

Introduction

What is the difference between reliability and validity in a study, what is an example of reliability and validity, how to ensure validity and reliability in your research, critiques of reliability and validity.

In research, validity and reliability are crucial for producing robust findings. They provide a foundation that assures scholars, practitioners, and readers alike that the research's insights are both accurate and consistent. However, the nuanced nature of qualitative data often blurs the lines between these concepts, making it imperative for researchers to discern their distinct roles.

This article seeks to illuminate the intricacies of reliability and validity, highlighting their significance and distinguishing their unique attributes. By understanding these critical facets, qualitative researchers can ensure their work not only resonates with authenticity but also trustworthiness.

what is validity of data in research

In the domain of research, whether qualitative or quantitative , two concepts often arise when discussing the quality and rigor of a study: reliability and validity . These two terms, while interconnected, have distinct meanings that hold significant weight in the world of research.

Reliability, at its core, speaks to the consistency of a study. If a study or test measures the same concept repeatedly and yields the same results, it demonstrates a high degree of reliability. A common method for assessing reliability is through internal consistency reliability, which checks if multiple items that measure the same concept produce similar scores.

Another method often used is inter-rater reliability , which gauges the consistency of scores given by different raters. This approach is especially amenable to qualitative research , and it can help researchers assess the clarity of their code system and the consistency of their codings . For a study to be more dependable, it's imperative to ensure a sufficient measurement of reliability is achieved.

On the other hand, validity is concerned with accuracy. It looks at whether a study truly measures what it claims to. Within the realm of validity, several types exist. Construct validity, for instance, verifies that a study measures the intended abstract concept or underlying construct. If a research aims to measure self-esteem and accurately captures this abstract trait, it demonstrates strong construct validity.

Content validity ensures that a test or study comprehensively represents the entire domain of the concept it seeks to measure. For instance, if a test aims to assess mathematical ability, it should cover arithmetic, algebra, geometry, and more to showcase strong content validity.

Criterion validity is another form of validity that ensures that the scores from a test correlate well with a measure from a related outcome. A subset of this is predictive validity, which checks if the test can predict future outcomes. For instance, if an aptitude test can predict future job performance, it can be said to have high predictive validity.

The distinction between reliability and validity becomes clear when one considers the nature of their focus. While reliability is concerned with consistency and reproducibility, validity zeroes in on accuracy and truthfulness.

A research tool can be reliable without being valid. For instance, faulty instrument measures might consistently give bad readings (reliable but not valid). Conversely, in discussions about test reliability, the same test measure administered multiple times could sometimes hit the mark and at other times miss it entirely, producing different test scores each time. This would make it valid in some instances but not reliable.

For a study to be robust, it must achieve both reliability and validity. Reliability ensures the study's findings are reproducible while validity confirms that it accurately represents the phenomena it claims to. Ensuring both in a study means the results are both dependable and accurate, forming a cornerstone for high-quality research.

what is validity of data in research

Efficient, easy data analysis with ATLAS.ti

Start analyzing data quickly and more deeply with ATLAS.ti. Download a free trial today.

Understanding the nuances of reliability and validity becomes clearer when contextualized within a real-world research setting. Imagine a qualitative study where a researcher aims to explore the experiences of teachers in urban schools concerning classroom management. The primary method of data collection is semi-structured interviews .

To ensure the reliability of this qualitative study, the researcher crafts a consistent list of open-ended questions for the interview. This ensures that, while each conversation might meander based on the individual’s experiences, there remains a core set of topics related to classroom management that every participant addresses.

The essence of reliability in this context isn't necessarily about garnering identical responses but rather about achieving a consistent approach to data collection and subsequent interpretation . As part of this commitment to reliability, two researchers might independently transcribe and analyze a subset of these interviews. If they identify similar themes and patterns in their independent analyses, it suggests a consistent interpretation of the data, showcasing inter-rater reliability .

Validity , on the other hand, is anchored in ensuring that the research genuinely captures and represents the lived experiences and sentiments of teachers concerning classroom management. To establish content validity, the list of interview questions is thoroughly reviewed by a panel of educational experts. Their feedback ensures that the questions encompass the breadth of issues and concerns related to classroom management in urban school settings.

As the interviews are conducted, the researcher pays close attention to the depth and authenticity of responses. After the interviews, member checking could be employed, where participants review the researcher's interpretation of their responses to ensure that their experiences and perspectives have been accurately captured. This strategy helps in affirming the study's construct validity, ensuring that the abstract concept of "experiences with classroom management" has been truthfully and adequately represented.

In this example, we can see that while the interview study is rooted in qualitative methods and subjective experiences, the principles of reliability and validity can still meaningfully inform the research process. They serve as guides to ensure the research's findings are both dependable and genuinely reflective of the participants' experiences.

Ensuring validity and reliability in research, irrespective of its qualitative or quantitative nature, is pivotal to producing results that are both trustworthy and robust. Here's how you can integrate these concepts into your study to ensure its rigor:

Reliability is about consistency. One of the most straightforward ways to gauge it in quantitative research is using test-retest reliability. It involves administering the same test to the same group of participants on two separate occasions and then comparing the results.

A high degree of similarity between the two sets of results indicates good reliability. This can often be measured using a correlation coefficient, where a value closer to 1 indicates a strong positive consistency between the two test iterations.

Validity, on the other hand, ensures that the research genuinely measures what it intends to. There are various forms of validity to consider. Convergent validity ensures that two measures of the same construct or those that should theoretically be related, are indeed correlated. For example, two different measures assessing self-esteem should show similar results for the same group, highlighting that they are measuring the same underlying construct.

Face validity is the most basic form of validity and is gauged by the sheer appearance of the measurement tool. If, at face value, a test seems like it measures what it claims to, it has face validity. This is often the first step and is usually followed by more rigorous forms of validity testing.

Criterion-related validity, a subtype of the previously discussed criterion validity , evaluates how well the outcomes of a particular test or measurement correlate with another related measure. For example, if a new tool is developed to measure reading comprehension, its results can be compared with those of an established reading comprehension test to assess its criterion-related validity. If the results show a strong correlation, it's a sign that the new tool is valid.

Ensuring both validity and reliability requires deliberate planning, meticulous testing, and constant reflection on the study's methods and results. This might involve using established scales or measures with proven validity and reliability, conducting pilot studies to refine measurement tools, and always staying cognizant of the fact that these two concepts are important considerations for research robustness.

While reliability and validity are foundational concepts in many traditional research paradigms, they have not escaped scrutiny, especially from critical and poststructuralist perspectives. These critiques often arise from the fundamental philosophical differences in how knowledge, truth, and reality are perceived and constructed.

From a poststructuralist viewpoint, the very pursuit of a singular "truth" or an objective reality is questionable. In such a perspective, multiple truths exist, each shaped by its own socio-cultural, historical, and individual contexts.

Reliability, with its emphasis on consistent replication, might then seem at odds with this understanding. If truths are multiple and shifting, how can consistency across repeated measures or observations be a valid measure of anything other than the research instrument's stability?

Validity, too, faces critique. In seeking to ensure that a study measures what it purports to measure, there's an implicit assumption of an observable, knowable reality. Poststructuralist critiques question this foundation, arguing that reality is too fluid, multifaceted, and influenced by power dynamics to be pinned down by any singular measurement or representation.

Moreover, the very act of determining "validity" often requires an external benchmark or "gold standard." This brings up the issue of who determines this standard and the power dynamics and potential biases inherent in such decisions.

Another point of contention is the way these concepts can inadvertently prioritize certain forms of knowledge over others. For instance, privileging research that meets stringent reliability and validity criteria might marginalize more exploratory, interpretive, or indigenous research methods. These methods, while offering deep insights, might not align neatly with traditional understandings of reliability and validity, potentially relegating them to the periphery of "accepted" knowledge production.

To be sure, reliability and validity serve as guiding principles in many research approaches. However, it's essential to recognize their limitations and the critiques posed by alternative epistemologies. Engaging with these critiques doesn't diminish the value of reliability and validity but rather enriches our understanding of the multifaceted nature of knowledge and the complexities of its pursuit.

what is validity of data in research

A rigorous research process begins with ATLAS.ti

Download a free trial of our powerful data analysis software to make the most of your research.

what is validity of data in research

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals

You are here

  • Volume 18, Issue 3
  • Validity and reliability in quantitative studies
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Roberta Heale 1 ,
  • Alison Twycross 2
  • 1 School of Nursing, Laurentian University , Sudbury, Ontario , Canada
  • 2 Faculty of Health and Social Care , London South Bank University , London , UK
  • Correspondence to : Dr Roberta Heale, School of Nursing, Laurentian University, Ramsey Lake Road, Sudbury, Ontario, Canada P3E2C6; rheale{at}laurentian.ca

https://doi.org/10.1136/eb-2015-102129

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Evidence-based practice includes, in part, implementation of the findings of well-conducted quality research studies. So being able to critique quantitative research is an important skill for nurses. Consideration must be given not only to the results of the study but also the rigour of the research. Rigour refers to the extent to which the researchers worked to enhance the quality of the studies. In quantitative research, this is achieved through measurement of the validity and reliability. 1

  • View inline

Types of validity

The first category is content validity . This category looks at whether the instrument adequately covers all the content that it should with respect to the variable. In other words, does the instrument cover the entire domain related to the variable, or construct it was designed to measure? In an undergraduate nursing course with instruction about public health, an examination with content validity would cover all the content in the course with greater emphasis on the topics that had received greater coverage or more depth. A subset of content validity is face validity , where experts are asked their opinion about whether an instrument measures the concept intended.

Construct validity refers to whether you can draw inferences about test scores related to the concept being studied. For example, if a person has a high score on a survey that measures anxiety, does this person truly have a high degree of anxiety? In another example, a test of knowledge of medications that requires dosage calculations may instead be testing maths knowledge.

There are three types of evidence that can be used to demonstrate a research instrument has construct validity:

Homogeneity—meaning that the instrument measures one construct.

Convergence—this occurs when the instrument measures concepts similar to that of other instruments. Although if there are no similar instruments available this will not be possible to do.

Theory evidence—this is evident when behaviour is similar to theoretical propositions of the construct measured in the instrument. For example, when an instrument measures anxiety, one would expect to see that participants who score high on the instrument for anxiety also demonstrate symptoms of anxiety in their day-to-day lives. 2

The final measure of validity is criterion validity . A criterion is any other instrument that measures the same variable. Correlations can be conducted to determine the extent to which the different instruments measure the same variable. Criterion validity is measured in three ways:

Convergent validity—shows that an instrument is highly correlated with instruments measuring similar variables.

Divergent validity—shows that an instrument is poorly correlated to instruments that measure different variables. In this case, for example, there should be a low correlation between an instrument that measures motivation and one that measures self-efficacy.

Predictive validity—means that the instrument should have high correlations with future criterions. 2 For example, a score of high self-efficacy related to performing a task should predict the likelihood a participant completing the task.

Reliability

Reliability relates to the consistency of a measure. A participant completing an instrument meant to measure motivation should have approximately the same responses each time the test is completed. Although it is not possible to give an exact calculation of reliability, an estimate of reliability can be achieved through different measures. The three attributes of reliability are outlined in table 2 . How each attribute is tested for is described below.

Attributes of reliability

Homogeneity (internal consistency) is assessed using item-to-total correlation, split-half reliability, Kuder-Richardson coefficient and Cronbach's α. In split-half reliability, the results of a test, or instrument, are divided in half. Correlations are calculated comparing both halves. Strong correlations indicate high reliability, while weak correlations indicate the instrument may not be reliable. The Kuder-Richardson test is a more complicated version of the split-half test. In this process the average of all possible split half combinations is determined and a correlation between 0–1 is generated. This test is more accurate than the split-half test, but can only be completed on questions with two answers (eg, yes or no, 0 or 1). 3

Cronbach's α is the most commonly used test to determine the internal consistency of an instrument. In this test, the average of all correlations in every combination of split-halves is determined. Instruments with questions that have more than two responses can be used in this test. The Cronbach's α result is a number between 0 and 1. An acceptable reliability score is one that is 0.7 and higher. 1 , 3

Stability is tested using test–retest and parallel or alternate-form reliability testing. Test–retest reliability is assessed when an instrument is given to the same participants more than once under similar circumstances. A statistical comparison is made between participant's test scores for each of the times they have completed it. This provides an indication of the reliability of the instrument. Parallel-form reliability (or alternate-form reliability) is similar to test–retest reliability except that a different form of the original instrument is given to participants in subsequent tests. The domain, or concepts being tested are the same in both versions of the instrument but the wording of items is different. 2 For an instrument to demonstrate stability there should be a high correlation between the scores each time a participant completes the test. Generally speaking, a correlation coefficient of less than 0.3 signifies a weak correlation, 0.3–0.5 is moderate and greater than 0.5 is strong. 4

Equivalence is assessed through inter-rater reliability. This test includes a process for qualitatively determining the level of agreement between two or more observers. A good example of the process used in assessing inter-rater reliability is the scores of judges for a skating competition. The level of consistency across all judges in the scores given to skating participants is the measure of inter-rater reliability. An example in research is when researchers are asked to give a score for the relevancy of each item on an instrument. Consistency in their scores relates to the level of inter-rater reliability of the instrument.

Determining how rigorously the issues of reliability and validity have been addressed in a study is an essential component in the critique of research as well as influencing the decision about whether to implement of the study findings into nursing practice. In quantitative studies, rigour is determined through an evaluation of the validity and reliability of the tools or instruments utilised in the study. A good quality research study will provide evidence of how all these factors have been addressed. This will help you to assess the validity and reliability of the research and help you decide whether or not you should apply the findings in your area of clinical practice.

  • Lobiondo-Wood G ,
  • Shuttleworth M
  • ↵ Laerd Statistics . Determining the correlation coefficient . 2013 . https://statistics.laerd.com/premium/pc/pearson-correlation-in-spss-8.php

Twitter Follow Roberta Heale at @robertaheale and Alison Twycross at @alitwy

Competing interests None declared.

Read the full text or download the PDF:

Research-Methodology

Research validity in surveys relates to the extent at which the survey measures right elements that need to be measured. In simple terms, validity refers to how well an instrument as measures what it is intended to measure.

Reliability alone is not enough, measures need to be reliable, as well as, valid. For example, if a weight measuring scale is wrong by 4kg (it deducts 4 kg of the actual weight), it can be specified as reliable, because the scale displays the same weight every time we measure a specific item. However, the scale is not valid because it does not display the actual weight of the item.

Research validity can be divided into two groups: internal and external. It can be specified that “internal validity refers to how the research findings match reality, while external validity refers to the extend to which the research findings can be replicated to other environments” (Pelissier, 2008, p.12).

Moreover, validity can also be divided into five types:

1. Face Validity is the most basic type of validity and it is associated with a highest level of subjectivity because it is not based on any scientific approach. In other words, in this case a test may be specified as valid by a researcher because it may seem as valid, without an in-depth scientific justification.

Example: questionnaire design for a study that analyses the issues of employee performance can be assessed as valid because each individual question may seem to be addressing specific and relevant aspects of employee performance.

2. Construct Validity relates to assessment of suitability of measurement tool to measure the phenomenon being studied. Application of construct validity can be effectively facilitated with the involvement of panel of ‘experts’ closely familiar with the measure and the phenomenon.

Example: with the application of construct validity the levels of leadership competency in any given organisation can be effectively assessed by devising questionnaire to be answered by operational level employees and asking questions about the levels of their motivation to do their duties in a daily basis.

3. Criterion-Related Validity involves comparison of tests results with the outcome. This specific type of validity correlates results of assessment with another criterion of assessment.

Example: nature of customer perception of brand image of a specific company can be assessed via organising a focus group. The same issue can also be assessed through devising questionnaire to be answered by current and potential customers of the brand. The higher the level of correlation between focus group and questionnaire findings, the high the level of criterion-related validity.

4. Formative Validity refers to assessment of effectiveness of the measure in terms of providing information that can be used to improve specific aspects of the phenomenon.

Example: when developing initiatives to increase the levels of effectiveness of organisational culture if the measure is able to identify specific weaknesses of organisational culture such as employee-manager communication barriers, then the level of formative validity of the measure can be assessed as adequate.

5. Sampling Validity (similar to content validity) ensures that the area of coverage of the measure within the research area is vast. No measure is able to cover all items and elements within the phenomenon, therefore, important items and elements are selected using a specific pattern of sampling method depending on aims and objectives of the study.

Example: when assessing a leadership style exercised in a specific organisation, assessment of decision-making style would not suffice, and other issues related to leadership style such as organisational culture, personality of leaders, the nature of the industry etc. need to be taken into account as well.

My e-book,  The Ultimate Guide to Writing a Dissertation in Business Studies: a step by step assistance  offers practical assistance to complete a dissertation with minimum or no stress. The e-book covers all stages of writing a dissertation starting from the selection to the research area to submitting the completed version of the work within the deadline. John Dudovskiy

Research Validity

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Bras Pneumol
  • v.44(3); May-Jun 2018

Internal and external validity: can you apply research study results to your patients?

Cecilia maria patino.

1 . Methods in Epidemiologic, Clinical, and Operations Research-MECOR-program, American Thoracic Society/Asociación Latinoamericana del Tórax, Montevideo, Uruguay.

2 . Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.

Juliana Carvalho Ferreira

3 . Divisão de Pneumologia, Instituto do Coração, Hospital das Clínicas, Faculdade de Medicina, Universidade de São Paulo, São Paulo (SP) Brasil.

CLINICAL SCENARIO

In a multicenter study in France, investigators conducted a randomized controlled trial to test the effect of prone vs. supine positioning ventilation on mortality among patients with early, severe ARDS. They showed that prolonged prone-positioning ventilation decreased 28-day mortality [hazard ratio (HR) = 0.39; 95% CI: 0.25-0.63]. 1

STUDY VALIDITY

The validity of a research study refers to how well the results among the study participants represent true findings among similar individuals outside the study. This concept of validity applies to all types of clinical studies, including those about prevalence, associations, interventions, and diagnosis. The validity of a research study includes two domains: internal and external validity.

Internal validity is defined as the extent to which the observed results represent the truth in the population we are studying and, thus, are not due to methodological errors. In our example, if the authors can support that the study has internal validity, they can conclude that prone positioning reduces mortality among patients with severe ARDS. The internal validity of a study can be threatened by many factors, including errors in measurement or in the selection of participants in the study, and researchers should think about and avoid these errors.

Once the internal validity of the study is established, the researcher can proceed to make a judgment regarding its external validity by asking whether the study results apply to similar patients in a different setting or not ( Figure 1 ). In the example, we would want to evaluate if the results of the clinical trial apply to ARDS patients in other ICUs. If the patients have early, severe ARDS, probably yes, but the study results may not apply to patients with mild ARDS . External validity refers to the extent to which the results of a study are generalizable to patients in our daily practice, especially for the population that the sample is thought to represent.

An external file that holds a picture, illustration, etc.
Object name is 1806-3713-jbpneu-44-03-00183-gf1.jpg

Lack of internal validity implies that the results of the study deviate from the truth, and, therefore, we cannot draw any conclusions; hence, if the results of a trial are not internally valid, external validity is irrelevant. 2 Lack of external validity implies that the results of the trial may not apply to patients who differ from the study population and, consequently, could lead to low adoption of the treatment tested in the trial by other clinicians.

INCREASING VALIDITY OF RESEARCH STUDIES

To increase internal validity, investigators should ensure careful study planning and adequate quality control and implementation strategies-including adequate recruitment strategies, data collection, data analysis, and sample size. External validity can be increased by using broad inclusion criteria that result in a study population that more closely resembles real-life patients, and, in the case of clinical trials, by choosing interventions that are feasible to apply. 2

  • Research Process
  • Manuscript Preparation
  • Manuscript Review
  • Publication Process
  • Publication Recognition
  • Language Editing Services
  • Translation Services

Elsevier QRcode Wechat

Why is data validation important in research?

  • 3 minute read
  • 66.5K views

Table of Contents

Data collection and analysis is one of the most important aspects of conducting research. High-quality data allows researchers to interpret findings accurately, act as a foundation for future studies, and give credibility to their research. As such, research often needs to go under the scanner to be free of suspicions of fraud and data falsification . At times, even unintentional errors in data could be viewed as research misconduct. Hence, data integrity is essential to protect your reputation and the reliability of your study.

Owing to the very nature of research and the sheer volume of data collected in large-scale studies, errors are bound to occur. One way to avoid “bad” or erroneous data is through data validation.

What is data validation?

Data validation is the process of examining the quality and accuracy of the collected data before processing and analysing it. It not only ensures the accuracy but also confirms the completeness of your data. However, data validation is time-consuming and can delay analysis significantly. So, is this step really important?

Importance of data validation

Data validation is important for several aspects of a well-conducted study:

  • To ensure a robust dataset: The primary aim of data validation is to ensure an error-free dataset for further analysis. This is especially important if you or other researchers plan to use the dataset for future studies or to train machine learning models.
  • To get a clearer picture of the data: Data validation also includes ‘cleaning-up’ of data, i.e., removing inputs that are incomplete, not standardized, or not within the range specified for your study. This process could also shed light on previously unknown patterns in the data and provide additional insights regarding the findings.
  • To get accurate results: If your dataset has discrepancies, it will impact the final results and lead to inaccurate interpretations. Data validation can help identify errors, thus increasing the accuracy of your results.
  • To mitigate the risk of forming incorrect hypotheses: Only those inferences and hypotheses that are backed by solid data are considered valid. Thus, data validation can help you form logical and reasonable speculations .
  • To ensure the legitimacy of your findings: The integrity of your study is often determined by how reproducible it is. Data validation can enhance the reproducibility of your findings.

Data validation in research

Data validation is necessary for all types of research. For quantitative research, which utilizes measurable data points, the quality of data can be enhanced by selecting the correct methodology, avoiding biases in the study design, choosing an appropriate sample size and type, and conducting suitable statistical analyses.

In contrast, qualitative research , which includes surveys or behavioural studies, is prone to the use of incomplete and/or poor-quality data. This is because of the likelihood that the responses provided by survey participants are inaccurate and due to the subjective nature of observational studies. Thus, it is extremely important to validate data by incorporating a range of clear and objective questions in surveys, bullet-proofing multiple-choice questions, and setting standard parameters for data collection.

Importantly, for studies that utilize machine learning approaches or mathematical models, validating the data model is as important as validating the data inputs. Thus, for the generation of automated data validation protocols, one must rely on appropriate data structures, content, and file types to avoid errors due to automation.

Although data validation may seem like an unnecessary or time-consuming step, it is absolutely critical to validate the integrity of your study and is absolutely worth the effort. To learn more about how to validate data effectively, head over to Elsevier Author Services !

Write the Results Section

How to Write the Results Section: Guide to Structure and Key Points

choosing the Right Research Methodology

Choosing the Right Research Methodology: A Guide for Researchers

You may also like.

what is a descriptive research design

Descriptive Research Design and Its Myriad Uses

Doctor doing a Biomedical Research Paper

Five Common Mistakes to Avoid When Writing a Biomedical Research Paper

Writing in Environmental Engineering

Making Technical Writing in Environmental Engineering Accessible

Risks of AI-assisted Academic Writing

To Err is Not Human: The Dangers of AI-assisted Academic Writing

Importance-of-Data-Collection

When Data Speak, Listen: Importance of Data Collection and Analysis Methods

choosing the Right Research Methodology

Writing a good review article

Scholarly Sources What are They and Where can You Find Them

Scholarly Sources: What are They and Where can You Find Them?

Input your search keywords and press Enter.

Ask Yale Library

My Library Accounts

Find, Request, and Use

Help and Research Support

Visit and Study

Explore Collections

Research Data Management: Validate Data

  • Plan for Data
  • Organize & Document Data
  • Store & Secure Data
  • Validate Data
  • Share & Re-use Data
  • Data Use Agreements
  • Research Data Policies

What is Data Validation?

Data validation is important for ensuring regular monitoring of your data and assuring all stakeholders that your data is of a high quality that reliably meets research integrity standards — and also a crucial aspect of Yale's Research Data and Materials Policy, which states "The University deems appropriate stewardship of research data as fundamental to both high-quality research and academic integrity and therefore seeks to attain the highest standards in the generation, management, retention, preservation, curation, and sharing of research data."

Data Validation Methods

Basic methods to ensure data quality — all researchers should follow these practices :

  • Be consistent and follow other data management best practices, such as data organization and documentation
  • Document any data inconsistencies you encounter
  • Check all datasets for duplicates and errors
  • Use data validation tools (such as those in Excel and other software) where possible

Advanced methods to ensure data quality — the following methods may be useful in more computationally-focused research :

  • Establish processes to routinely inspect small subsets of your data
  • Perform statistical validation using software and/or programming languages
  • Use data validation applications at point of deposit in a data repository

Additional Resources for Data Validation

Data validation and quality assurance is often discipline-specific, and expectations and standards may vary. To learn more about data validation and data quality assurance, consider the information from the following U.S. government entities producing large amounts of public data:

  • U.S. Census Bureau Information Quality Guidelines
  • U.S. Geological Survey Data-Quality Management
  • << Previous: Store & Secure Data
  • Next: Share & Re-use Data >>
  • Last Updated: Sep 27, 2023 1:15 PM
  • URL: https://guides.library.yale.edu/datamanagement

Yale Library logo

Site Navigation

P.O. BOX 208240 New Haven, CT 06250-8240 (203) 432-1775

Yale's Libraries

Bass Library

Beinecke Rare Book and Manuscript Library

Classics Library

Cushing/Whitney Medical Library

Divinity Library

East Asia Library

Gilmore Music Library

Haas Family Arts Library

Lewis Walpole Library

Lillian Goldman Law Library

Marx Science and Social Science Library

Sterling Memorial Library

Yale Center for British Art

SUBSCRIBE TO OUR NEWSLETTER

@YALELIBRARY

image of the ceiling of sterling memorial library

Yale Library Instagram

Accessibility       Diversity, Equity, and Inclusion      Giving       Privacy and Data Use      Contact Our Web Team    

© 2022 Yale University Library • All Rights Reserved

Understanding survey validity and reliability: Key concepts and applications

Examples of market and customer validation surveys, online survey validity and reliability, methods for assessing validity, techniques for evaluating reliability, practical examples: market validation survey, ensuring validity and reliability in online surveys, create reliable surveys with surveyplanet’s survey tool.

In the realm of research and decision-making, surveys are indispensable tools. Whether gauging customer satisfaction, validating a new product, or conducting market research, the accuracy and consistency of survey results are paramount. This is where the concepts of survey validity and reliability come into play. Ensuring a survey is both valid and reliable is crucial for obtaining meaningful insights that can guide actions. In this blog post, we’ll explore what survey validity and reliability mean, why they are essential, and how to ensure your surveys meet these criteria.

What is survey validity?

Survey validity refers to the degree to which a survey measures what it is intended to measure. It’s about the accuracy and truthfulness of the results. A valid survey accurately reflects the reality it aims to capture, providing trustworthy data that can successfully inform decisions. Without validity, a study’s results can be misleading, which results in incorrect conclusions and potentially costly mistakes.

Survey validity is crucial because it ensures the accuracy of the data collected. When valid, survey results can be trusted to reflect the actual opinions or behaviors of respondents. Here are the key types of validity:

  • Content validity : This assesses whether a survey covers the full range of the concept it aims to measure. For example, a market validation survey designed to gauge customer interest in a new product must include questions that cover all aspects of the product’s features, benefits, and potential drawbacks.
  • Concurrent validity is the extent to which results correlate with other measures taken at the same time. For instance, a customer satisfaction survey should yield results aligning with sales data or customer retention rates.
  • Predictive validity : The extent to which results predict future outcomes. For example, a product validation survey should be able to predict future sales performance based on current customer feedback.
  • Construct validity : This evaluates whether a survey truly measures the theoretical construct it is intended to capture. For example, if surveying customer loyalty, questions should accurately reflect the components of loyalty, such as repeat purchases, brand advocacy, and emotional attachment.

To illustrate how these types of validity apply in practice, let’s look at market and customer validation surveys. A market validation survey might include questions about potential customers’ interest in a new product, willingness to pay for it, and preferences compared to existing products. On the other hand, a customer validation survey might focus on existing customers’ experiences with a product or service to identify strengths and areas for improvement.

What is survey reliability: Definition and significance

Survey reliability refers to the consistency of results over time. A reliable survey will yield the same results under consistent conditions, indicating that the data is dependable. Without reliability, even a valid survey can produce erratic results that undermine confidence in the findings.

Reliability is fundamental because it ensures that results are repeatable and consistent. Reliable data allows researchers to be confident that findings are stable and not influenced by external factors. Here are the primary types of reliability:

  • Test-retest reliability : This measures the consistency of survey results over time. Researchers can assess whether results are stable and consistent by administering the same survey to the same group of people at different times.
  • Internal consistency reliability : This assesses whether the items in a survey meant to measure the same concept produce similar results. Cronbach’s alpha is often used to evaluate internal consistency by analyzing the correlation between different survey items.

Reliability is critical in online surveys, where question interpretation by respondents may vary widely due to different contexts or distractions. Ensuring high reliability in online surveys helps obtain consistent and credible data. Reliability ensures consistent feedback on product features and usability for product validation surveys, allowing for better decision-making.

Assessing survey validity and reliability

To ensure that a questionnaire is valid and reliable, it’s essential to use appropriate assessment methods. This involves evaluating the survey’s design, its questions, and the data collected to ensure it meets necessary standards.

Find out the best practices and proven strategies for survey design with this blog.

Assessing validity involves several techniques to ensure that the survey accurately measures the intended concept:

  • Face validity : This is a preliminary check to see if the survey appears to measure what it is supposed to measure. Although subjective, it’s an essential first step in validating a survey.
  • Concurrent validity : As mentioned earlier, this involves comparing the survey results with other relevant measures taken simultaneously to ensure they align.
  • Predictive validity : This consists of evaluating whether the survey can accurately predict future outcomes based on current responses.

Evaluating reliability requires methods that ensure the survey results are consistent:

  • Split-half method : This involves dividing the survey into two halves and comparing the results. If the results are similar, the survey has high internal consistency.
  • Cronbach’s alpha : This statistical measure evaluates the correlation between different items on the survey. A higher alpha indicates greater internal consistency and, therefore, higher reliability.

Imagine you’re conducting a market validation survey for a new tech gadget.

  • To ensure content validity, questions about various features, potential use cases, and price points are included.
  • To assess criterion-related validity, responses to existing market data on similar products are compared.
  • For construct validity, measures are taken to ensure questions accurately reflect customer interest and purchase intentions.
  • To evaluate reliability, administer the survey to a sample group, then re-administer it after a few weeks to check for test-retest reliability.
  • Finally, Cronbach’s alpha is used to assess internal consistency, ensuring that questions about different features produce consistent responses.

Online surveys present unique challenges, such as varying respondent interpretations, distractions, and technical issues. To mitigate these challenges and ensure validity and reliability, consider the following strategies:

  • Clear and concise questions : Ensure that survey questions are straightforward and easy to understand. Avoid ambiguous language that could be interpreted differently. Learn how to write a good survey question with this blog .
  • Pilot testing : Conduct a pilot test with a small, representative sample to identify issues with question clarity or survey structure.
  • Consistent survey environment : Ensure that respondents complete the survey under similar conditions. This could involve specifying a time limit or providing instructions to minimize distractions.
  • Randomization : Question order should be randomized to reduce the impact of question order bias, which is when the sequence of questions influences responses. Learn how to reduce the impact of question order bias by reading our blog post about biased surveys .
  • Follow-up surveys : Use follow-up surveys to assess test-retest reliability, ensuring consistent results over time.

Suppose a company launches a new software product. They conduct an online product validation survey.

  • To ensure content validity comprehensive questions about functionality, user experience, and pricing are included.
  • To ensure reliability, they randomize question order and conduct a pilot test.
  • Cronbach’s alpha is used to assess internal consistency to ensure consistent responses about different features.

Survey validity and reliability are foundational to conducting effective research and making informed decisions. Validity ensures that a survey measures what it is intended to measure, while reliability ensures that results are consistent and dependable. Understanding and applying these concepts means creating surveys that provide accurate and trustworthy data that will guide the correct actions and decisions.

Now that you have a solid understanding of survey validity and reliability, it’s time to put these principles into practice. We invite you to try our survey tool to design surveys that deliver accurate and dependable insights. Our platform is designed to help you create highly valid and reliable surveys, offering features like customizable question formats, survey result filtering, a survey length estimator , and more.

Don’t leave the success of research to chance—experience the difference a well-designed survey can make. Sign up today for a free trial and see how our tools can help you achieve more reliable results and confidently make informed decisions.

Photo by Mario Heller on Unsplash

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Editor's Choice
  • 100 years of the AJE
  • Collections
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • About American Journal of Epidemiology
  • About the Johns Hopkins Bloomberg School of Public Health
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Society for Epidemiologic Research

Article Contents

Predictive validity of the polysocial score, unraveling interplay: genetics, lifestyle, and social environment, explaining health disparities, future directions, conflict of interest, data availability.

  • < Previous

Invited commentary: is the polysocial score approach valuable for advancing social determinants of health research?

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Chenkai Wu, Invited commentary: is the polysocial score approach valuable for advancing social determinants of health research?, American Journal of Epidemiology , Volume 193, Issue 9, September 2024, Pages 1301–1304, https://doi.org/10.1093/aje/kwae057

  • Permissions Icon Permissions

Social determinants of health encompass the social environmental factors and lived experiences that collectively shape an individual’s health. Recently, the polysocial score approach has been introduced as an innovative method for capturing the cumulative impact of a broad spectrum of social factors. This approach offers a promising opportunity to complement and enhance conventional methodologies in the advancement of research on social determinants of health. In this issue of the Journal , Jawadekar et al ( Am J Epidemiol . 2024;193(9):1296‑1300) evaluate the value of the polysocial score for predicting cognitive performance and mortality among middle-aged and older adults. Models built on a smaller set of social determinants, including race/ethnicity, sex, and education, performed comparably to the polysocial score models in which a more complex set of social factors was included. In this invited commentary, I evaluate the predictive ability of the polysocial score and discuss its merits and limitations. I also summarize the practical utility of the polysocial score in predicting health outcomes and its mechanistic significance in unveiling the relationship between genetics, social environment, and lifestyles in shaping an individual’s health and elucidate health disparities. Lastly, I propose several avenues for future research.

This article is linked to “A critique and examination of the polysocial risk score approach: predicting cognition in the Health and Retirement Study” ( https://doi.org/10.1093/aje/kwae074 ).

Editor’s note: The opinions expressed in this article are those of the author and do not necessarily reflect the views of the American Journal of Epidemiology.

Social determinants of health (SDOH) encompass the social environmental factors and lived experiences that collectively shape an individual’s health. The recognition of social factors’ impact on health has a long-standing history, and SDOH have emerged as an increasingly high-priority research area. The prevailing approach in this field concentrates on evaluating the causal influence of specific social factors. Recently, the introduction of the polysocial score approach has marked an innovative method for comprehensively quantifying the cumulative influence of a wide range of social factors. This approach presents a promising opportunity to complement and enhance conventional methodologies, thereby advancing research in the SDOH domain. In this invited commentary, I assess the predictive capacity of the polysocial score and discuss its strengths and limitations. Additionally, I summarize the practical and mechanistic significance of the polysocial score approach and explore future research directions, calling for a more collaborative, integrated, and interdisciplinary approach to advancing SDOH research.

Ping et al 1 conducted a pioneering study to create a polysocial score and rigorously assess its predictive validity for all-cause mortality among over 7000 older adults in the Health and Retirement Study (HRS), representing one of the initial endeavors in empirically adopting the polysocial score approach. The polysocial score was created on the basis of 14 social factors selected via a stepwise regression model including 24 social factors encompassing 5 domains: economic stability, neighborhood and physical environment, education, community and social context, and health-care system. Results suggested that the polysocial score stratified social risk satisfactorily and outperformed the reference model with age, sex, and race/ethnicity included for predicting 5-year mortality. Subsequently, a growing body of research has investigated the predictive validity of the polysocial score for a range of health outcomes across diverse study populations. Using data on over 160 000 adults from the 2013-2017 National Health Interview Survey, Javed et al 2 selected 7 SDOH from 38 factors to construct a polysocial score for capturing the aggregated social risk of atherosclerotic cardiovascular disease. Individuals with the highest polysocial score had a nearly 4-fold greater prevalence of cardiovascular disease than those with the lowest polysocial score. More importantly, inclusion of the polysocial score meaningfully improved the discrimination of the base model that included age, sex, race/ethnicity, and traditional clinical risk factors. The same research group conducted a follow-up study to examine how polysocial score contributed to financial toxicity—excess financial strain from health care. 3 The prevalence of financial toxicity among individuals in the lowest quartile of the polysocial score exceeded that of individuals in the highest quartile by over 4-fold (68% vs 15%). Evidence supporting the predictive value of the polysocial score was also identified within developing countries. Chen et al 4 found that a polysocial score constructed on the basis of 9 SDOH was strongly associated with the risk of incident rosacea among nearly 4000 adults from 5 cities in the Hunan province of China.

In this issue of the Journal , Jawadekar et al 5 evaluate the value of the polysocial score for improving the prediction of cognitive performance and all-cause mortality. Using data from over 13 000 middle-aged and older adults from the HRS, they built a series of models with varying degrees of complexity to predict cognitive decline and all-cause mortality over a period of 10 years. They found that models built on a smaller set of social determinants, including race/ethnicity, sex, and education, performed comparably to the polysocial score models that included a more complex set of social factors for predicting both outcomes. It is not surprising to observe researchers reporting slightly inconsistent study results regarding the predictive value of the polysocial score due to variations in the combination of SDOH, study outcomes, study population characteristics, reference models, length of follow-up, and research settings.

It is well recognized that many common complex diseases and syndromes are a result of genetic and environmental risk factors. Numerous studies have demonstrated that genetic variants explain only a small portion of individual variation in the risk of complex diseases. 6 Considerable evidence suggests that individuals with favorable social determinants have a lower risk of adverse health outcomes such as Alzheimer disease and cardiovascular disease, as well as disability and death. Although the inherited genetic risk for disease is not easily modifiable, it is possible that a favorable social environment—a potentially modifiable factor—can offset the risk of disease-associated genetic variants for developing complex diseases and syndromes. In addition to the social environment, lifestyle represents another cluster of modifiable factors that could possibly reduce the genetic risk for complex diseases. Little is known about whether social environment and lifestyle influence the association between genetic predisposition and disease risk synergistically. The polysocial score provides a means to quantify the cumulative effect of social factors, presenting an opportunity to dissect the complex interplay of genetic predisposition, social environment, and healthy lifestyle in shaping an individual’s health.

Using data from over 300 000 middle-aged and older adults in the UK Biobank Study, Huang et al 7 created a polysocial score based on 12 SDOH and examined its interactions with genetic susceptibility and healthy lifestyles in the development of incident type 2 diabetes. They found that adherence to a favorable healthy lifestyle could substantially mitigate the adverse influence of an unfavorable social environment on incident type 2 diabetes. The additive interaction between the polysocial score and the genetic risk score explained about 15% of incident type 2 diabetes cases, suggesting that disadvantaged social status could exacerbate the detrimental effect of genetics. Similar findings were reported in a more recent study focusing on cardiovascular disease. Tang et al 8 examined the joint effect of genetics and social environment on incident myocardial infarction among more than 5000 adults aged ≥65 years in the HRS. A disadvantaged social environment, indicated by a low polysocial score, was linked to an increased risk of incident myocardial infarction among White participants with intermediate and high genetic risk, yet no such association was observed among those with low genetic risk. These findings suggest that living in a favorable social environment holds particular significance for individuals with intermediate and high genetic risk. Taken together, they demonstrate that the polysocial score approach provides new insights into disentangling the synergistic effects of genetics, social factors, and healthy lifestyles on health and disease development.

Since its inception, the polysocial score has predominantly been regarded as best suited for risk stratification due to its ability to capture the aggregated effects of a variety of social factors. Nevertheless, as research has advanced, an increasingly diverse range of applications have gradually been unveiled. Racial and ethnic disparities in health persist as a significant and enduring public health concern, necessitating concerted efforts and actions at both the societal and health-care system levels. These disparities transcend mere health-care accessibility and are intricately interconnected with SDOH. 9 Previous research has demonstrated the importance of social factors in contributing to these disparities. 10 , 11 However, most prior studies have not adequately addressed the multifaceted, abundant, and frequently interconnected nature of these social factors.

Several studies have emerged to address this issue. Wu et al 12 utilized the HRS data to examine the incidence rate of dementia in 3 racial and ethnic groups (Hispanic, non-Hispanic Black, and non-Hispanic White) according to the desirability of the social environment as measured by a polysocial score. An additive interaction was revealed between race/ethnicity and the polysocial score. The dementia rate among non-Hispanic Blacks with a low polysocial score was substantially higher than that in their Hispanic and non-Hispanic White counterparts, while these differences were largely attenuated and were no longer statistically significant among individuals with a high polysocial score. A comparable phenomenon was observed in a subsequent study that aimed to elucidate the racial and ethnic disparities in disability in activities of daily living. 13 Hispanic and non-Hispanic Black older adults with a low polysocial score had a nearly 6% higher risk of incident disability in activities of daily living over 2 years than non-Hispanic White individuals. Conversely, a reversed trend was observed among individuals with intermediate and high polysocial scores. These findings implied the potential to mitigate racial and ethnic health disparities through customized interventions aimed at comprehensively enhancing the social environment.

The polysocial score stands as one of the initial endeavors to encapsulate the cumulative impacts of a broad spectrum of multifaceted and interrelated SDOH. To encourage the adoption of this innovative approach, the development of conceptual frameworks elucidating the external interactions between SDOH and other health determinants (eg, lifestyles), as well as the internal relationships among SDOH across different domains (eg, economic stability and health-care system) and levels (eg, family and community), would be valuable in establishing objectives and priorities for future research agendas. Figure 1 presents a conceptual framework illustrating the hierarchy of institutional, systematic, and structural racism, which serves as the fundamental driver of health disparities, and SDOH through a life-course lens. Because the polysocial score mirrors the consequences and processing of this underlying mechanism, development of theoretically grounded and methodologically robust approaches capable of accurately quantifying institutional, systematic, and structural racism is imperative. In addition, the polysocial score, initially constructed with a focus on middle-aged and older populations, presents potential applicability across different life stages. Characterizing the trajectories of the polysocial score and investigating the impact of changes in polysocial score on health would offer novel insights into the significance of enhancing the social environment to promote health and mitigate health disparities. Lastly, it is crucial to emphasize the importance of assessing the overall quality of the social environment for individuals in low- and middle-income countries and marginalized populations, such as institutionalized and homeless persons.

Conceptual framework of an integrated approach for social determinants of health research.

Conceptual framework of an integrated approach for social determinants of health research.

In sum, the polysocial score approach offers a promising opportunity to complement and enhance conventional methodologies in the advancement of SDOH research. While we have garnered insights into the practical and mechanistic values of the polysocial score approach, collaborative, integrated, and interdisciplinary research initiatives are essential to facilitate its integration into a more comprehensive research framework.

This work was supported by the Jiangsu Provincial Department of Education (grant 22KJB320010), Jiangsu, China.

The authors declare no conflicts of interest.

The views expressed in this article are those of the authors and do not reflect those of the Jiangsu Provincial Department of Education.

No original data were used.

Ping   Y , Oddén   MC , Stawski   RS , et al.    Creation and validation of a polysocial score for mortality among community-dwelling older adults in the USA: the Health and Retirement Study . Age Ageing .   2021 ; 50 ( 6 ): 2214 – 2221 . https://doi.org/10.1093/ageing/afab174

Google Scholar

Javed   Z , Valero-Elizondo   J , Dudum   R , et al.    Development and validation of a polysocial risk score for atherosclerotic cardiovascular disease . Am J Prev Cardiol .   2021 ; 8 :100251. https://doi.org/10.1016/j.ajpc.2021.100251

Valero-Elizondo   J , Javed   Z , Khera   R , et al.    Unfavorable social determinants of health are associated with higher burden of financial toxicity among patients with atherosclerotic cardiovascular disease in the US: findings from the National Health Interview Survey . Arch Public Health .   2022 ; 80 ( 1 ): 248 . https://doi.org/10.1186/s13690-022-00987-z

Chen   P , Yang   Z , Fan   Z , et al.    Associations of polysocial risk score with incident rosacea: a prospective cohort study of government employees in China . Front Public Health .   2023 ; 11 :1096687. https://doi.org/10.3389/fpubh.2023.1096687

Jawadekar   N , Zimmerman   S , Lu   P , et al.    A critique and examination of the polysocial risk score approach: predicting cognition in the Health and Retirement Study . Am J Epidemiol.   2024 ; 193 ( 9 ): 1296 – 1300 . https://doi.org/10.1093/aje/kwae074

Thomas   D . Gene–environment-wide association studies: emerging approaches . Nat Rev Genet .   2010 ; 11 ( 4 ): 259 – 272 . https://doi.org/10.1038/nrg2764

Zhao   Y , Li   Y , Zhuang   Z , et al.    Associations of polysocial risk score, lifestyle and genetic factors with incident type 2 diabetes: a prospective cohort study . Diabetologia .   2022 ; 65 ( 12 ): 2056 – 2065 . https://doi.org/10.1007/s00125-022-05761-y

Tang   J , Sheng   C , Wu   YY , et al.    Association of joint genetic and social environmental risks with incident myocardial infarction: results from the Health and Retirement Study . J Am Heart Assoc .   2023 ; 12 ( 6 ):e028200. https://doi.org/10.1161/JAHA.122.028200

Bailey   ZD , Krieger   N , Agénor   M , et al.    Structural racism and health inequities in the USA: evidence and interventions . Lancet .   2017 ; 389 ( 10077 ): 1453 – 1463 . https://doi.org/10.1016/S0140-6736(17)30569-X

Yaffe   K , Falvey   C , Harris   TB , et al.    Effect of socioeconomic disparities on incidence of dementia among biracial older adults: prospective study . BMJ .   2013 ; 347 ( 5 ):f7051. https://doi.org/10.1136/bmj.f7051

Mayeda   ER , Glymour   MM , Quesenberry   CP , et al.    Inequalities in dementia incidence between six racial and ethnic groups over 14 years . Alzheimers Dement .   2016 ; 12 ( 3 ): 216 – 224 . https://doi.org/10.1016/j.jalz.2015.12.007

Wu   C , Ping   Y , Chen   X , et al.    Deciphering racial and ethnic disparities in dementia and cognitive function: a polysocial score approach   [abstract] . Innovation Aging .   2022 ; 6 ( suppl 1 ): 1 . https://doi.org/10.1093/geroni/igac059.001

Tang   J , Chen   Y , Liu   H , et al.    Examining racial and ethnic differences in disability among older adults: a polysocial score approach . Maturitas .   2023 ; 172 : 1 – 8 . https://doi.org/10.1016/j.maturitas.2023.03.010

  • ethnic group
  • middle-aged adult
  • social environment
  • older adult
  • health disparity
  • social determinants of health
  • social factors
Month: Total Views:
April 2024 2
May 2024 60
June 2024 7
July 2024 12
August 2024 15

Email alerts

Citing articles via, looking for your next opportunity.

  • Recommend to your Library

Affiliations

  • Online ISSN 1476-6256
  • Print ISSN 0002-9262
  • Copyright © 2024 Johns Hopkins Bloomberg School of Public Health
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Monash University Logo

  • Help & FAQ

On the validity of measuring change over time in routine clinical assessment: a close examination of item-level response shifts in psychosomatic inpatients

Research output : Contribution to journal › Article › Research › peer-review

Objective: Significant life events such as severe health status changes or intensive medical treatment often trigger response shifts in individuals that may hamper the comparison of measurements over time. Drawing from the Oort model, this study aims at detecting response shift at the item level in psychosomatic inpatients and evaluating its impact on the validity of comparing repeated measurements. Study design and setting: Complete pretest and posttest data were available from 1188 patients who had filled out the ICD-10 Symptom Rating (ISR) scale at admission and discharge, on average 24 days after intake. Reconceptualization, reprioritization, and recalibration response shifts were explored applying tests of measurement invariance. In the item-level approach, all model parameters were constrained to be equal between pretest and posttest. If non-invariance was detected, these were linked to the different types of response shift. Results: When constraining across-occasion model parameters, model fit worsened as indicated by a significant Satorra–Bentler Chi-square difference test suggesting potential presence of response shifts. A close examination revealed presence of two types of response shift, i.e., (non)uniform recalibration and both higher- and lower-level reconceptualization response shifts leading to four model adjustments. Conclusions: Our analyses suggest that psychosomatic inpatients experienced some response shifts during their hospital stay. According to the hierarchy of measurement invariance, however, only one of the detected non-invariances is critical for unbiased mean comparisons over time, which did not have a substantial impact on estimating change. Hence, the use of the ISR can be recommended for outcomes assessment in clinical routine, as change score estimates do not seem hampered by response shift effects.

Original languageEnglish
Pages (from-to)1339-1347
Number of pages9
Journal
Volume25
Issue number6
DOIs
Publication statusPublished - Jun 2016
Externally publishedYes
  • Measurement bias
  • Measurement invariance
  • Psychosomatic medicine
  • Response shift
  • Routine monitoring

Access to Document

  • 10.1007/s11136-015-1123-3

Other files and links

  • Link to publication in Scopus

T1 - On the validity of measuring change over time in routine clinical assessment

T2 - a close examination of item-level response shifts in psychosomatic inpatients

AU - Nolte, S.

AU - Mierke, A.

AU - Fischer, H. F.

AU - Rose, M.

N1 - Publisher Copyright: © 2015, Springer International Publishing Switzerland.

PY - 2016/6

Y1 - 2016/6

N2 - Objective: Significant life events such as severe health status changes or intensive medical treatment often trigger response shifts in individuals that may hamper the comparison of measurements over time. Drawing from the Oort model, this study aims at detecting response shift at the item level in psychosomatic inpatients and evaluating its impact on the validity of comparing repeated measurements. Study design and setting: Complete pretest and posttest data were available from 1188 patients who had filled out the ICD-10 Symptom Rating (ISR) scale at admission and discharge, on average 24 days after intake. Reconceptualization, reprioritization, and recalibration response shifts were explored applying tests of measurement invariance. In the item-level approach, all model parameters were constrained to be equal between pretest and posttest. If non-invariance was detected, these were linked to the different types of response shift. Results: When constraining across-occasion model parameters, model fit worsened as indicated by a significant Satorra–Bentler Chi-square difference test suggesting potential presence of response shifts. A close examination revealed presence of two types of response shift, i.e., (non)uniform recalibration and both higher- and lower-level reconceptualization response shifts leading to four model adjustments. Conclusions: Our analyses suggest that psychosomatic inpatients experienced some response shifts during their hospital stay. According to the hierarchy of measurement invariance, however, only one of the detected non-invariances is critical for unbiased mean comparisons over time, which did not have a substantial impact on estimating change. Hence, the use of the ISR can be recommended for outcomes assessment in clinical routine, as change score estimates do not seem hampered by response shift effects.

AB - Objective: Significant life events such as severe health status changes or intensive medical treatment often trigger response shifts in individuals that may hamper the comparison of measurements over time. Drawing from the Oort model, this study aims at detecting response shift at the item level in psychosomatic inpatients and evaluating its impact on the validity of comparing repeated measurements. Study design and setting: Complete pretest and posttest data were available from 1188 patients who had filled out the ICD-10 Symptom Rating (ISR) scale at admission and discharge, on average 24 days after intake. Reconceptualization, reprioritization, and recalibration response shifts were explored applying tests of measurement invariance. In the item-level approach, all model parameters were constrained to be equal between pretest and posttest. If non-invariance was detected, these were linked to the different types of response shift. Results: When constraining across-occasion model parameters, model fit worsened as indicated by a significant Satorra–Bentler Chi-square difference test suggesting potential presence of response shifts. A close examination revealed presence of two types of response shift, i.e., (non)uniform recalibration and both higher- and lower-level reconceptualization response shifts leading to four model adjustments. Conclusions: Our analyses suggest that psychosomatic inpatients experienced some response shifts during their hospital stay. According to the hierarchy of measurement invariance, however, only one of the detected non-invariances is critical for unbiased mean comparisons over time, which did not have a substantial impact on estimating change. Hence, the use of the ISR can be recommended for outcomes assessment in clinical routine, as change score estimates do not seem hampered by response shift effects.

KW - Evaluation

KW - Measurement bias

KW - Measurement invariance

KW - Psychosomatic medicine

KW - Response shift

KW - Routine monitoring

UR - http://www.scopus.com/inward/record.url?scp=84941346588&partnerID=8YFLogxK

U2 - 10.1007/s11136-015-1123-3

DO - 10.1007/s11136-015-1123-3

M3 - Article

C2 - 26353906

AN - SCOPUS:84941346588

SN - 0962-9343

JO - Quality of Life Research

JF - Quality of Life Research

We Trust in Human Precision

20,000+ Professional Language Experts Ready to Help. Expertise in a variety of Niches.

API Solutions

  • API Pricing
  • Cost estimate
  • Customer loyalty program
  • Educational Discount
  • Non-Profit Discount
  • Green Initiative Discount1

Value-Driven Pricing

Unmatched expertise at affordable rates tailored for your needs. Our services empower you to boost your productivity.

PC editors choice

  • Special Discounts
  • Enterprise transcription solutions
  • Enterprise translation solutions
  • Transcription/Caption API
  • AI Transcription Proofreading API

Trusted by Global Leaders

GoTranscript is the chosen service for top media organizations, universities, and Fortune 50 companies.

GoTranscript

One of the Largest Online Transcription and Translation Agencies in the World. Founded in 2005.

Speaker 1: Welcome to this overview of quantitative research methods. This tutorial will give you the big picture of quantitative research and introduce key concepts that will help you determine if quantitative methods are appropriate for your project study. First, what is educational research? Educational research is a process of scholarly inquiry designed to investigate the process of instruction and learning, the behaviors, perceptions, and attributes of students and teachers, the impact of institutional processes and policies, and all other areas of the educational process. The research design may be quantitative, qualitative, or a mixed methods design. The focus of this overview is quantitative methods. The general purpose of quantitative research is to explain, predict, investigate relationships, describe current conditions, or to examine possible impacts or influences on designated outcomes. Quantitative research differs from qualitative research in several ways. It works to achieve different goals and uses different methods and design. This table illustrates some of the key differences. Qualitative research generally uses a small sample to explore and describe experiences through the use of thick, rich descriptions of detailed data in an attempt to understand and interpret human perspectives. It is less interested in generalizing to the population as a whole. For example, when studying bullying, a qualitative researcher might learn about the experience of the victims and the experience of the bully by interviewing both bullies and victims and observing them on the playground. Quantitative studies generally use large samples to test numerical data by comparing or finding correlations among sample attributes so that the findings can be generalized to the population. If quantitative researchers were studying bullying, they might measure the effects of a bully on the victim by comparing students who are victims and students who are not victims of bullying using an attitudinal survey. In conducting quantitative research, the researcher first identifies the problem. For Ed.D. research, this problem represents a gap in practice. For Ph.D. research, this problem represents a gap in the literature. In either case, the problem needs to be of importance in the professional field. Next, the researcher establishes the purpose of the study. Why do you want to do the study, and what do you intend to accomplish? This is followed by research questions which help to focus the study. Once the study is focused, the researcher needs to review both seminal works and current peer-reviewed primary sources. Based on the research question and on a review of prior research, a hypothesis is created that predicts the relationship between the study's variables. Next, the researcher chooses a study design and methods to test the hypothesis. These choices should be informed by a review of methodological approaches used to address similar questions in prior research. Finally, appropriate analytical methods are used to analyze the data, allowing the researcher to draw conclusions and inferences about the data, and answer the research question that was originally posed. In quantitative research, research questions are typically descriptive, relational, or causal. Descriptive questions constrain the researcher to describing what currently exists. With a descriptive research question, one can examine perceptions or attitudes as well as more concrete variables such as achievement. For example, one might describe a population of learners by gathering data on their age, gender, socioeconomic status, and attributes towards their learning experiences. Relational questions examine the relationship between two or more variables. The X variable has some linear relationship to the Y variable. Causal inferences cannot be made from this type of research. For example, one could study the relationship between students' study habits and achievements. One might find that students using certain kinds of study strategies demonstrate greater learning, but one could not state conclusively that using certain study strategies will lead to or cause higher achievement. Causal questions, on the other hand, are designed to allow the researcher to draw a causal inference. A causal question seeks to determine if a treatment variable in a program had an effect on one or more outcome variables. In other words, the X variable influences the Y variable. For example, one could design a study that answered the question of whether a particular instructional approach caused students to learn more. The research question serves as a basis for posing a hypothesis, a predicted answer to the research question that incorporates operational definitions of the study's variables and is rooted in the literature. An operational definition matches a concept with a method of measurement, identifying how the concept will be quantified. For example, in a study of instructional strategies, the hypothesis might be that students of teachers who use Strategy X will exhibit greater learning than students of teachers who do not. In this study, one would need to operationalize learning by identifying a test or instrument that would measure learning. This approach allows the researcher to create a testable hypothesis. Relational and causal research relies on the creation of a null hypothesis, a version of the research hypothesis that predicts no relationship between variables or no effect of one variable on another. When writing the hypothesis for a quantitative question, the null hypothesis and the research or alternative hypothesis use parallel sentence structure. In this example, the null hypothesis states that there will be no statistical difference between groups, while the research or alternative hypothesis states that there will be a statistical difference between groups. Note also that both hypothesis statements operationalize the critical thinking skills variable by identifying the measurement instrument to be used. Once the research questions and hypotheses are solidified, the researcher must select a design that will create a situation in which the hypotheses can be tested and the research questions answered. Ideally, the research design will isolate the study's variables and control for intervening variables so that one can be certain of the relationships being tested. In educational research, however, it is extremely difficult to establish sufficient controls in the complex social settings being studied. In our example of investigating the impact of a certain instructional strategy in the classroom on student achievement, each day the teacher uses a specific instructional strategy. After school, some of the students in her class receive tutoring. Other students have parents that are very involved in their child's academic progress and provide learning experiences in the home. These students may do better because they received extra help, not because the teacher's instructional strategy is more effective. Unless the researcher can control for the intervening variable of extra help, it will be impossible to effectively test the study's hypothesis. Quantitative research designs can fall into two broad categories, experimental and quasi-experimental. Classic experimental designs are those that randomly assign subjects to either a control or treatment comparison group. The researcher can then compare the treatment group to the control group to test for an intervention's effect, known as a between-subject design. It is important to note that the control group may receive a standard treatment or may receive a treatment of any kind. Quasi-experimental designs do not randomly assign subjects to groups, but rather take advantage of existing groups. A researcher can still have a control and comparison group, but assignment to the groups is not random. The use of a control group is not required. However, the researcher may choose a design in which a single group is pre- and post-tested, known as a within-subjects design. Or a single group may receive only a post-test. Since quasi-experimental designs lack random assignment, the researcher should be aware of the threats to validity. Educational research often attempts to measure abstract variables such as attitudes, beliefs, and feelings. Surveys can capture data about these hard-to-measure variables, as well as other self-reported information such as demographic factors. A survey is an instrument used to collect verifiable information from a sample population. In quantitative research, surveys typically include questions that ask respondents to choose a rating from a scale, select one or more items from a list, or other responses that result in numerical data. Studies that use surveys or tests need to include strategies that establish the validity of the instrument used. There are many types of validity that need to be addressed. Face validity. Does the test appear at face value to measure what it is supposed to measure? Content validity. Content validity includes both item validity and sampling validity. Item validity ensures that the individual test items deal only with the subject being addressed. Sampling validity ensures that the range of item topics is appropriate to the subject being studied. For example, item validity might be high, but if all the items only deal with one aspect of the subjects, then sampling validity is low. Content validity can be established by having experts in the field review the test. Concurrent validity. Does a new test correlate with an older, established test that measures the same thing? Predictive validity. Does the test correlate with another related measure? For example, GRE tests are used at many colleges because these schools believe that a good grade on this test increases the probability that the student will do well at the college. Linear regression can establish the predictive validity of a test. Construct validity. Does the test measure the construct it is intended to measure? Establishing construct validity can be a difficult task when the constructs being measured are abstract. But it can be established by conducting a number of studies in which you test hypotheses regarding the construct, or by completing a factor analysis to ensure that you have the number of constructs that you say you have. In addition to ensuring the validity of instruments, the quantitative researcher needs to establish their reliability as well. Strategies for establishing reliability include Test retest. Correlates scores from two different administrations of the same test. Alternate forms. Correlates scores from administrations of two different forms of the same test. Split half reliability. Treats each half of one test or survey as a separate administration and correlates the results from each. Internal consistency. Uses Cronbach's coefficient alpha to calculate the average of all possible split halves. Quantitative research almost always relies on a sample that is intended to be representative of a larger population. There are two basic sampling strategies, random and non-random, and a number of specific strategies within each of these approaches. This table provides examples of each of the major strategies. The next section of this tutorial provides an overview of the procedures in conducting quantitative data analysis. There are specific procedures for conducting the data collection, preparing for and analyzing data, presenting the findings, and connecting to the body of existing research. This process ensures that the research is conducted as a systematic investigation that leads to credible results. Data comes in various sizes and shapes, and it is important to know about these so that the proper analysis can be used on the data. In 1946, S.S. Stevens first described the properties of measurement systems that allowed decisions about the type of measurement and about the attributes of objects that are preserved in numbers. These four types of data are referred to as nominal, ordinal, interval, and ratio. First, let's examine nominal data. With nominal data, there is no number value that indicates quantity. Instead, a number has been assigned to represent a certain attribute, like the number 1 to represent male and the number 2 to represent female. In other words, the number is just a label. You could also assign numbers to represent race, religion, or any other categorical information. Nominal data only denotes group membership. With ordinal data, there is again no indication of quantity. Rather, a number is assigned for ranking order. For example, satisfaction surveys often ask respondents to rank order their level of satisfaction with services or programs. The next level of measurement is interval data. With interval data, there are equal distances between two values, but there is no natural zero. A common example is the Fahrenheit temperature scale. Differences between the temperature measurements make sense, but ratios do not. For instance, 20 degrees Fahrenheit is not twice as hot as 10 degrees Fahrenheit. You can add and subtract interval level data, but they cannot be divided or multiplied. Finally, we have ratio data. Ratio is the same as interval, however ratios, means, averages, and other numerical formulas are all possible and make sense. Zero has a logical meaning, which shows the absence of, or having none of. Examples of ratio data are height, weight, speed, or any quantities based on a scale with a natural zero. In summary, nominal data can only be counted. Ordinal data can be counted and ranked. Interval data can also be added and subtracted, and ratio data can also be used in ratios and other calculations. Determining what type of data you have is one of the most important aspects of quantitative analysis. Depending on the research question, hypotheses, and research design, the researcher may choose to use descriptive and or inferential statistics to begin to analyze the data. Descriptive statistics are best illustrated when viewed through the lens of America's pastimes. Sports, weather, economy, stock market, and even our retirement portfolio are presented in a descriptive analysis. Basic terminology for descriptive statistics are terms that we are most familiar in this discipline. Frequency, mean, median, mode, range, variance, and standard deviation. Simply put, you are describing the data. Some of the most common graphic representations of data are bar graphs, pie graphs, histograms, and box and whisker graphs. Attempting to reach conclusions and make causal inferences beyond graphic representations or descriptive analyses is referred to as inferential statistics. In other words, examining the college enrollment of the past decade in a certain geographical region would assist in estimating what the enrollment for the next year might be. Frequently in education, the means of two or more groups are compared. When comparing means to assist in answering a research question, one can use a within-group, between-groups, or mixed-subject design. In a within-group design, the researcher compares measures of the same subjects across time, therefore within-group, or under different treatment conditions. This can also be referred to as a dependent-group design. The most basic example of this type of quasi-experimental design would be if a researcher conducted a pretest of a group of students, subjected them to a treatment, and then conducted a post-test. The group has been measured at different points in time. In a between-group design, subjects are assigned to one of the two or more groups. For example, Control, Treatment 1, Treatment 2. Ideally, the sampling and assignment to groups would be random, which would make this an experimental design. The researcher can then compare the means of the treatment group to the control group. When comparing two groups, the researcher can gain insight into the effects of the treatment. In a mixed-subjects design, the researcher is testing for significant differences between two or more independent groups while subjecting them to repeated measures. Choosing a statistical test to compare groups depends on the number of groups, whether the data are nominal, ordinal, or interval, and whether the data meet the assumptions for parametric tests. Nonparametric tests are typically used with nominal and ordinal data, while parametric tests use interval and ratio-level data. In addition to this, some further assumptions are made for parametric tests that the data are normally distributed in the population, that participant selection is independent, and the selection of one person does not determine the selection of another, and that the variances of the groups being compared are equal. The assumption of independent participant selection cannot be violated, but the others are more flexible. The t-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups, and especially appropriate as the method of analysis for a quasi-experimental design. When choosing a t-test, the assumptions are that the data are parametric. The analysis of variance, or ANOVA, assesses whether the means of more than two groups are statistically different from each other. When choosing an ANOVA, the assumptions are that the data are parametric. The chi-square test can be used when you have non-parametric data and want to compare differences between groups. The Kruskal-Wallis test can be used when there are more than two groups and the data are non-parametric. Correlation analysis is a set of statistical tests to determine whether there are linear relationships between two or more sets of variables from the same list of items or individuals, for example, achievement and performance of students. The tests provide a statistical yes or no as to whether a significant relationship or correlation exists between the variables. A correlation test consists of calculating a correlation coefficient between two variables. Again, there are parametric and non-parametric choices based on the assumptions of the data. Pearson R correlation is widely used in statistics to measure the strength of the relationship between linearly related variables. Spearman-Rank correlation is a non-parametric test that is used to measure the degree of association between two variables. Spearman-Rank correlation test does not assume any assumptions about the distribution. Spearman-Rank correlation test is used when the Pearson test gives misleading results. Often a Kendall-Taw is also included in this list of non-parametric correlation tests to examine the strength of the relationship if there are less than 20 rankings. Linear regression and correlation are similar and often confused. Sometimes your methodologist will encourage you to examine both the calculations. Calculate linear correlation if you measured both variables, x and y. Make sure to use the Pearson parametric correlation coefficient if you are certain you are not violating the test assumptions. Otherwise, choose the Spearman non-parametric correlation coefficient. If either variable has been manipulated using an intervention, do not calculate a correlation. While linear regression does indicate the nature of the relationship between two variables, like correlation, it can also be used to make predictions because one variable is considered explanatory while the other is considered a dependent variable. Establishing validity is a critical part of quantitative research. As with the nature of quantitative research, there is a defined approach or process for establishing validity. This also allows for the findings transferability. For a study to be valid, the evidence must support the interpretations of the data, the data must be accurate, and their use in drawing conclusions must be logical and appropriate. Construct validity concerns whether what you did for the program was what you wanted to do, or whether what you observed was what you wanted to observe. Construct validity concerns whether the operationalization of your variables are related to the theoretical concepts you are trying to measure. Are you actually measuring what you want to measure? Internal validity means that you have evidence that what you did in the study, i.e., the program, caused what you observed, i.e., the outcome, to happen. Conclusion validity is the degree to which conclusions drawn about relationships in the data are reasonable. External validity concerns the process of generalizing, or the degree to which the conclusions in your study would hold for other persons in other places and at other times. Establishing reliability and validity to your study is one of the most critical elements of the research process. Once you have decided to embark upon the process of conducting a quantitative study, use the following steps to get started. First, review research studies that have been conducted on your topic to determine what methods were used. Consider the strengths and weaknesses of the various data collection and analysis methods. Next, review the literature on quantitative research methods. Every aspect of your research has a body of literature associated with it. Just as you would not confine yourself to your course textbooks for your review of research on your topic, you should not limit yourself to your course texts for your review of methodological literature. Read broadly and deeply from the scholarly literature to gain expertise in quantitative research. Additional self-paced tutorials have been developed on different methodologies and techniques associated with quantitative research. Make sure that you complete all of the self-paced tutorials and review them as often as needed. You will then be prepared to complete a literature review of the specific methodologies and techniques that you will use in your study. Thank you for watching.

techradar

Reliability and validity assessment of instrument to measure sustainability practices at shipping ports in India

  • Open access
  • Published: 03 September 2024
  • Volume 5 , article number  236 , ( 2024 )

Cite this article

You have full access to this open access article

what is validity of data in research

  • L. Kishore 1 ,
  • Yogesh P. Pai 2 &
  • Parthesh Shanbhag 3  

Sustainability has emerged as one of the most critical factors influencing the competitiveness of maritime shipping ports. This emergence has led to a surge in research publications on port sustainability-related topics. However, despite the increasing awareness and adoption of sustainability practices, documented literature on empirical studies with survey and interview data is very limited. Moreover, the existence of validated instruments to objectively assess sustainability through sustainability practices for shipping ports in India needs to be traced. This study contributes by validating an instrument to evaluate objectively sustainability practices in shipping ports by adopting a four-stage process, starting with item identification based on an extensive literature review, instrument evaluation by subject matter experts, assessing of the instrument with suitable content validation indices, and finally evaluating the validity and reliability of the hypothesized theoretical model. For content validation, Content Validity Index, Cohens Kappa coefficient, and Lawshe’s Content Validity Ratio were computed with the assessment by a subject matter expert panel comprising six members from the port industry as well as academicians cum researchers in the field of shipping ports. The content-validated instrument was administered to 200 samples comprising officer category port employees. The measurement model was evaluated and validated using the Confirmatory Factor Analysis to assess the extent to which the measured variables represent the theoretical construct of the study and ascertain the factor structure. The empirically validated instrument met the required guidelines of model fit, reliability, and construct validity measures and was found to be a confirmed model for measuring sustainability practices in shipping ports. Structural Equation Modeling methodology was adopted to explain the variance and the path relationship between the higher-order and lower-order constructs of sustainability. The results indicate that the economic dimensions are the major contributors to the overall sustainability of the port as they drive investments in environmental and social dimensions, leading to overall sustainable development. The study’s findings will be helpful for researchers, academicians, policymakers, and industry practitioners working towards sustainability practices that contribute to sustainable growth and development in the shipping industry.

Avoid common mistakes on your manuscript.

1 Introduction

Sustainability has increasingly been considered one of society and industry’s most significant focus areas [ 1 ] along with regulatory bodies in recent times, although only partially new [ 2 ]. Sustainable development is generally quoted as the one that “satisfies the needs and wants of the present generation” simultaneously without compromising the future generation’s needs and aspirations” [ 3 ]. Moreover, the challenge is balancing sustainability and economic growth [ 4 ]. This prerequisite has led organizations to look beyond mere economic performance and build social and environmentally friendly business models by integrating sustainability principles and practices [ 5 , 6 , 7 ] targeting competitive advantages [ 8 , 9 ]. Against this backdrop, Lun et al. [ 3 ] highlighted the importance of shipping ports in a country’s sustainable growth and development in the long run as they generate employment along with the export–import trade. It is widely accepted that port-led economic growth and development have been the backbone of many developed and developing countries. For instance, the Indian maritime sector, which is one of the most promising, emerging, developing, dynamic markets in the world, has received a facelift with one of the significant mega-initiatives of the government, “ Sagarmala ”, which is focused on “port-led economic development” [ 10 ]. 95% of India’s overall goods trade volume is through shipping ports, contributing about 14% of the GDP [ 11 ]. The impact of the subdued port performance is reflected in a country’s economic development [ 12 ]. Shipping ports are vital nodes that link other modes of transportation in global trade and are considered strategic assets demanding significant attention in maritime and transportation research and practice [ 4 , 13 , 14 ].

Lee et al. [ 15 ] flagged the concern of less attention given to sustainability in the shipping, port, and maritime industries, unlike the aviation and road transport sectors. Many studies [ 14 , 16 , 17 , 18 , 19 ] have emphasized sustainability as one of the most crucial aspects influencing the competitiveness and long-term sustenance of shipping ports; however, it is not fully incorporated into a strategic decision that demands a long-term view when deciding on port development and management. Further, coastal lines are densely populated, leading to higher levels of economic activity and rapid urbanization; however, they face the consequences that come along as the byproduct of development in the form of environmental, economic, and social concerns [ 14 , 20 ]. Some of the substantial problems in the ports discussed in many studies [ 14 , 18 , 19 , 21 , 22 , 23 , 24 , 25 ] include the depletion of the marine ecosystem and biodiversity due to dredging and reclamation works, water pollution due to ballast operations, oil spillage during ship anchoring and cargo operations, wastewater spillage from ships, air pollution due to various pollutants and particulate matters, dust and smoke pollution by a heavy vehicle, climate change effects, increased energy consumption for operations and the cost of energy, uncertainty in future economic returns of investment in the port, employment and diversity of jobs, employee productivity, displacement of local community along with impact on their livelihood, loss of agricultural land, steep increase in cost of living and land revenue rate due to swift urbanization, industrial and special economic zone development, inclusivity of community in developmental projects, safety and security in the port vicinity, social working environment, trade union interference and many more. These challenges, complemented by the growing importance and focus on the sustainable development of shipping ports, have led to a surge in research publications on sustainability-related topics that concentrate on environmental, economic, and social aspects [ 26 , 27 ].

A literature review gives insight into the focus of extant studies related to port sustainability, which is more on quantifying and measuring various dimensions of sustainability against benchmarks and developing indexes for multiple sizes of sustainability. However, qualitative studies need to be conducted to understand the extent to which sustainability measures are adopted and the interaction between the measures, directing towards empirical data-driven studies [ 27 , 28 ]. Alamoush et al. [ 2 ], in their port sustainability framework development study, found that only 16 percent of the articles published were empirical and were based on questionnaire surveys and personal interview data. At the same time, the majority, 40 percent, was conceptual and theoretical review, and the remaining was equally distributed among simulation and case studies. Empirical data-driven research on sustainability-related topics and port performance will be critical to the growing body of knowledge [ 28 ]. Further, the empirical studies on port sustainability [ 2 , 27 , 29 , 30 , 31 , 32 , 33 ] have adopted various indicators of sustainability and criteria for sustainability evaluation based on available scales from studies not directly related to port, rather context-adjusted for port industry. Moreover, despite the increasing number of empirical studies on sustainability in the port sector, in line with the claim made by Ayre et al. [ 34 ] on the hardly ever reporting of content validation by researchers, the existence of any content validation process adopted in any studies and the validated instrument so developed to measure sustainability and sustainability practices objectively for shipping ports is not traceable in the extant literature, especially for exponentially developing economy like India. The study of Ashrafi et al. [ 17 ], which pointed out their pilot study for validation, only assessed the perception of port sustainability in the US and Canada through an online survey to identify the primary factors and challenges in adopting and implementing sustainability strategies. However, the instrument needed to be more generic in capturing the overall sustainability barriers and influencing factors and needed a specific macro-level assessment of the three dimensions of sustainability.

There are studies [ 35 , 36 , 37 ] that discuss the importance of content validity to determine whether the measurement item used in the tool and the extent to which the tool is satisfactorily representing and addressing the domain of interest along with its relevance when measured. Thus, the need for precise, validated measurement tools for sustainability practices at shipping ports indicates a critical knowledge gap in the existing literature. It should also be noted that most seaport-related studies in the scholarly database concentrated on specific geographical areas in Europe [ 32 , 38 ] but not on the leading and growing economies like India. Another concern is that although sustainability is a widely discussed topic, there still needs to be a single universally acceptable and established definition for sustainability [ 39 , 40 , 41 ]. According to Maletič et al. [ 40 ], even though many have attempted to define and measure sustainability, there is an ongoing debate in the literature [ 42 ] on the existence of multiple ways to measure sustainability practices. Therefore, there is a vital need to have clarity and substantial justification on the dimensions and indicators that define the sustainability construct and standardize the assessment of sustainability to a greater extent, especially for seaports. Further to developing policies and schemes for sustainability, the implementations are essential for guaranteed sustainable development [ 28 , 43 ] and measuring the extent to which the port has focused on various sustainability practices can be a tool to assess the efforts towards sustainable development of the shipping port. Alamoush et al. [ 2 ] also pointed out their primary observation on the lack of study linking the port sustainability actions with the three sustainability dimensions represented by the three sustainability practices.

Considering this existing crucial gap and challenges discussed above, the novel contribution of this study is a validated instrument for assessing the sustainability practices followed at shipping ports covering the dimensions of sustainability. The measuring instrument can act as a guideline for seaport administrators and stakeholders to evaluate sustainability in shipping ports and develop seaport sustainability policy for sustainable maritime growth and sustainable development, specifically for an empirical and objective evaluation of sustainability practices adopted in shipping ports in India. Given this compelling necessity to have a content and construct validated instrument for sustainability assessment and strategy development, specifically in the context of Indian seaports, this study aims to explore, design, and develop an instrument for objectively assessing sustainability practices in shipping ports through a well-established content validation process. To achieve the aim of the study, the objectives are:

To identify the comprehensive list of dimensions of sustainability practice for shipping ports through an extensive literature survey

To validate the content of the measurement tool using globally accepted content validation indices viz Content Validity Index, Kohens Kappa coefficient, and Content Validity Ratio

To ascertain the factor structure of the measurement model using Confirmatory Factor Analysis

To estimate the relationship and contribution of three dimensions of sustainability practices to the higher-order sustainability practices construct

The study is structured into the following three major sections. The following section explores the theoretical foundations of sustainability construct and the related conceptual framework that shapes shipping ports’ sustainability dimensions. The following section covers the research methodology for achieving the study’s objectives. It outlines the steps followed in item identification through literature review, instrument development, and instrument assessment based on globally accepted indices and measurement model structure evaluation for validity and reliability using confirmatory factor analysis leading to Partial Least Square-Structural Equation Modeling (PLS-SEM) methods for relationship estimation and prediction of the relationship among the variables. Finally, the results are critically discussed, with the findings of the study highlighting the implications leading to a conclusion along with future research directions.

2 Theoretical background

Although certain sustainability practices are compelled by regulatory compliance, organizations are fortified to adopt and engage voluntarily and proactively in other sustainability practices to meet the needs of the broader society within which they operate [ 44 ]. Extant studies on sustainability [ 7 , 14 , 16 , 17 , 18 , 45 , 46 ] have discussed the need to integrate sustainability efforts into organizational goals, processes, and initiatives and link them with organizational strategy, without which the efforts are likely to fail. The goal for any firm is to secure a competitive advantage over its competitors, create wealth, capture the highest possible market share, and add value to the stakeholders while maintaining a balance between sustainability and economic growth [ 4 ]. Studies [ 16 , 47 ] have opined sustainability to be one of the most crucial aspects that influence the competitiveness of ports. Moreover, to achieve and sustain competitive advantage, organizations have been increasingly grappling with sustainability practices [ 8 ]. Simpson et al. [ 48 ] define practices as “the customary, habitual, or expected procedure or way of doing something. “The practices focused on sustainability could differ from industry to industry, and the shipping industry could concentrate on various practices, and relevant systems would be in place to support sustainable growth and development. Kang et al. [ 31 ] highlight the best practices that embrace sustainability and suggest many practices related to operations, resource optimization, safety and security, finance, risk, infrastructure upgradation, stakeholder management, environmental management systems, and the Port’s eco-friendly and social work environment.

Discussions in prior studies indicate mixed responses regarding the definition of sustainability. There is no universally acceptable definition [ 39 , 42 , 49 , 50 ], but a more generic definition emphasizes sustainability as the set of business strategies, policies, and associated practices or activities where the requirement of the present is satisfied without impacting the requirements of the future in the best interests of the port and related stakeholders. Different schools of thought have a general opinion that sustainability encompasses the three significant dimensions popularly termed as the triple bottom line (TBL) dimensions of “economic, social, and environmental practices,” which comprehend the broad framework of sustainable development [ 39 , 49 , 50 , 51 ]. Elkington [ 51 , 52 ] introduced the triple-bottom-line approach (TBL) incorporating these interrelated three dimensions—“environmental sustainability, economic sustainability, and social sustainability,” advocating organizations to adopt the TBL approach for long-term success [ 53 , 54 , 55 ], rather than short-term success focusing only on the economic dimension. The TBL aspects are considered the critical dimensions of sustainability [ 56 ]. Environmental dimensions concentrate on policies, initiatives, and practices that promote environmental management. In contrast, economic dimensions focus on policies, initiatives, and practices related to investments, economic benefits, and returns from those investments [ 57 ]. Social sustainability focuses on policies, initiatives, and practices that promote the overall improvement of society at large, including all other stakeholders [ 58 ]. Bansal [ 59 ] asserted that the three pillars, i.e., environmental integrity, economic prosperity, and social equity, should intersect for sustainability. Alamoush et al. [ 2 ] further related these dimensions of TBL to the planet, profit, and people as synonyms for environment, economic, and social sustainability.

In the context of ports-related studies, various environmental, economic, and social dimensions were adopted to assess the sustainability of the ports using different methodologies [ 2 , 27 ]. Oh et al. [ 29 ] adopted the importance-performance analysis technique to evaluate the sustainability of South Korean ports using 27 vital measures of the sustainability of ports adapted from the findings and discussions of previous research and found that those measures are essential from a port sustainability point of view. Their study classified the indicators of port sustainability in the three dimensions of sustainability as opined in the TBL concept. In contrast to this empirical quantitative approach, Vejvar et al. [ 33 ] adopted a case-study-based approach to study the institutional forces that compel organizations to adopt sustainability practices. However, they adopted open-ended questions to probe the sustainability practices adopted in the selected shipping ports. They performed a cross-case analysis to make the study more generalizable and increase the validity of the findings [ 32 ]. A thematic analysis of the sustainability performance of seaports was conducted, followed by semi-structured interviews. Later, a fuzzy analytical hierarchy process was applied to compute the weight for each port sustainability performance indicator. Their study also categorized the indicators into three dimensions of sustainability performance, namely social, environmental, and economic sustainability performance practices. Another multi-dimensional framework of sustainability practices was empirically tested by Maletič et al. [ 40 ], and they defined sustainability exploitation and exploration as two different sustainability practices. According to them, sustainability exploitation practices aim at incremental improvement in organizational processes, and sustainability exploration challenges current practices with innovative concepts in developing competencies and capabilities for sustainability. However, they also acknowledged the suitability of more objective measures, such as the TBL practices for sustainability studies. Sustainability practices aid organizations in developing opportunities while managing the three dimensions of organizational processes—economic, environmental, and social aspects in value creation over the long term [ 51 ]. In that definition given, profitability is the focus of economic sustainability, protection and concern towards the environment is the focus of environmental sustainability [ 60 , 61 ], and social sustainability focuses on sustained relations with all the stakeholders, including suppliers, customers, employees, and the community as well [ 62 ].

Regarding developing an index related to sustainability, Laxe et al. [ 43 ] developed the “Port Sustainability Synthetic Index” covering economic and environmental indicators using a sample of 16 ports in Spain. Molavi et al. [ 25 ] developed a novel framework for the smart port index for achieving sustainability using key performance indicators (KPI) that can assist in strategy development and policy framing. Their study indicated several sub-domains of environmental domains in the smart port index study, along with other domains such as operations, energy, safety, and security. However, their study mentioned environment-related quantitative KPIs and other domains that can be used to evaluate the smart port index. Still, it did not mention economic and social, although the sub-domain can be related to economic and social dimensions. In contrast, Stanković et al. [ 63 ] developed a novel composite index for assessing the sustainability of seaports covering environmental, economic, and social dimensions through its indicators based on the secondary data available in the Eurostat and the Organization for Economic Co-operation and Development database. However, the environmental dimension captured only air pollution particulate matter value as the only indicator. Their study also mentioned the limitations of not covering many indicators, including social inclusion and waste management, due to the unavailability of the database. These limitations of quantitative data available in secondary databases for index inclusion are also challenging. The data collection across ports is not yet standardized, and the diverse type of cargo handled in ports makes the index not universally adaptable. Mori et al. [ 64 ] had the same opinion about avoiding a synthesized composite index due to the chances of offset in evaluation [ 65 ]. Although indices have benefits, standard data availability limitations for computing indexes are another added concern that limits index adoption for benchmarking and assessment, thus making sustainability index adoption with caveats.

Therefore, following the justifications and proven theoretical foundations discussed above, this study is grounded on sustainability theory orchestrated by the TBL view, which incorporates the three interrelated dimensions of sustainability—“environmental sustainability, economic sustainability, and social sustainability” and the relevant sustainability practices focused on shipping ports. Therefore, based on previous studies on sustainability and sustainability practices, this study considers sustainability constructs, namely environmental sustainability practices (EnvSP), economic sustainability (EcoSP), and social sustainability practices (SocSP). The indicators thus identified would be used as the measurement scales to empirically measure through survey instruments and objectively evaluate sustainability practices adopted in shipping ports in India.

3 Methodology

The authors adopted the content validation process prescribed by Barbosa et al. [ 35 ]. The process starts with item identification based on an extensive literature review, instrument assessment by subject matter experts, and instrument evaluation with suitable content validation indices. This was followed by assessing the validity and reliability of the hypothesized model to confirm the theory established in the literature using Confirmatory Factor Analysis (CFA) [ 66 ]. CFA is the most widely adopted statistical technique that helps to determine the underlying structure among a set of latent variables and confirm the reliability of how well the scale measures the proposed concept. Hair et al. [ 67 ] elucidated on CFA as a technique to assess the contribution of each scale of item on the latent variable, which later can be incorporated into the estimation of the relationships in the structural model along with accounting for associated measurement error using the variance-based Partial Least Squares-Structural Equation Modelling (PLS-SEM) framework. Explaining the relationship between the exogenous and endogenous variables and predicting the variation in the relationship is the primary focus of PLS-SEM. The five-stage procedure adopted in the study is shown in the flow chart Fig.  1 below.

figure 1

Content validation process for the study (Author’s own)

3.1 Stage #1. Item identification and face validation

In the first stage, an extensive review of relevant articles related to port studies was performed to compile a comprehensive list of items related to the three dimensions of sustainability viz environmental, economic, and sustainability practices, with the help of a relevant keyword search in the Scopus database in the context of shipping ports. Multiple iterations with different combinations of keywords were performed to see the diversity of articles that can be traced in the scholarly database and the final set of keywords as [(“sustainability”) OR (“sustainability practices”)] AND [(“shipping ports”) OR (“maritime ports”) OR (“container ports”)] AND (scale OR items OR measurement OR indicators OR SEM) were adopted in article identification and followed by screening of articles for item identification and compilation of the list of items related to sustainability practices. The final set of related articles was critically reviewed to identify the relevant items for sustainability practices at shipping ports. Following the recommendation of Boateng et al. [ 68 ], face validation of the instrument was conducted with review and inputs from two senior academicians and experts with theoretical and practical knowledge of sustainability practices.

3.2 Stage #2. Instrument assessment

A subject matter expert panel selection followed this in the second phase of the content validation process to perform instrument assessment. Typically, experts evaluate the content validity, and for that, the recommended minimum number of experts is three and can go up to a maximum of 10 [ 68 , 69 , 70 ]. Following this study’s prescribed expert number requirement, a panel comprising six experts from academic and Port industry backgrounds assessed and validated items. Barbosa and Cansino [ 35 ] claim no unique formula or approach for selecting an expert panel exists. However, it points out the need for a heterogenous panel to mitigate the risk of biases in the validation. Therefore, the study included subject experts from the port industry and academicians with experience in port-related research studies.

The relevance of the items identified through literature review from various sources and the essentiality of these items are supposed to be assessed and content validated through the instrument assessment by the panel of experts. For content validation and evaluation, the Content Validity Index (CVI), Cohens Kappa coefficient, and Lawshe’s Content Validity Ratio (CVR) were adopted as they are the most widely adopted content validation tools for quantifying the opinions of experts [ 69 , 71 , 72 , 73 , 74 ]. The items were assessed for relevance on a 4-point Likert scale. The 4-point Likert scale for relevance captured the response as “1 = not relevant, 2 = somewhat relevant, 3 = quite relevant, and 4 = very relevant” for every item in the measurement instrument. Further, the items were assessed on a 3-point Likert scale to capture the extent of essentiality. Moreover, the 3-point Likert scale for essentiality captured the response as “1 = not essential; 2 = useful, but not essential; and 3 = essential”. Further, an additional comments column for each item was also provided to add feedback and remarks by the expert against each item.

3.3 Stage #3. Content validation using CVI, Cohen’s Kappa, and CVR index

Following the recommendation of [ 75 ], the validity of the instrument content was assessed using CVI, Cohen’s Kappa coefficient, and CVR indices. CVI is a straightforward computation of the agreement among the panelists and can be computed at both the individual item level and the overall scales [ 71 , 72 , 73 , 76 ]. Accordingly, I-CVI is the validity index for each item of the constructs of the study, whereas S-CVI is for the overall scale, which is calculated as the average of I-CVI. I-CVI can be determined as the ratio of several panelists’ ratings on a scale of 3 and above for each measurement item and the total panelists evaluating the relevance. Along similar lines, S-CVI can be computed using the number of measurement items in the assessment tool with a rating of 3 and above for each measurement item. To complement and increase the strength of assessment of relevance through CVI, Barbosa and Cansino [ 35 ] highlight the benefit of Cohen’s Kappa coefficient for evaluation of content validation with due consideration of the degree of agreement on the measurement item beyond certain chance along with the associated probability of inflated scales of agreement merely due to chance agreement. The formula to compute Cohen’s Kappa coefficient is as follows:

where the total number of experts is denoted as N, and it indicates the total number of subject matter experts indicating “essential,” P c is the probability of chance agreement and computed as:

According to Lawshe (1975), the CVR index can be computed using the formula:

The formulas described above were entered in a spreadsheet for the computation of CVR, Cohen’s Kappa coefficient, and CVI based on the rating given by the experts for the items identified. The scale of relevance and necessity of items marked by each expert was recorded and coded into the spreadsheet to facilitate the computation of indices for every item and the entire scale.

3.4 Stage #4. Reliability and validity using confirmatory factor analysis

3.4.1 sampling and data collection.

The content-validated questionnaire instrument was administered online through Microsoft Forms as well as offline to port employees working at the mid and senior management (Officer Category) level employees of various significant ports located in both the western and eastern coastal belts of India for data collection to test the validity and reliability of the measurement model. The instrument captured the respondents’ demographics and perceptions of how much the Port focuses on the three pillars of sustainability practices adopted in their respective ports. Many authors have opined the choice of sample size determination in business management and social science based on G-power [ 77 , 78 , 79 , 80 ]. As per the calculation, for an effect size at a medium level, implied as 0.15, a 5% significance level, and a power level of eighty percent, the recommended minimum sample size was 166. Further, Hair et al. [ 67 ] have outlined the guideline for the minimum sample size for a model structure with less than or equal to seven constructs as 150. However, to avert any possible statistical loss, the rationalized sample size was determined to be 20 percent over 166, thereby establishing the sample size required for the study to be 200. The respondents had to indicate their level of agreement on each item on a Likert scale of 1 to 5, 1 indicating “Strongly Disagree” and five indicating “Strongly Agree”. The data collection activity was carried out between January and December 2023 until the required sample usable data was received for further analysis.

3.4.2 Reliability and validity of the measurement model

Confirmatory factor analysis (CFA) was performed to check the factor structure confirmation of the sustainability practices dimensions using the sample data collected with IBM AMOS. In contrast to measurement error, reliability is the indicator of the “degree to which the observed variable measures the true value and is error-free [ 66 ]. It is also an “assessment of the degree of consistency between multiple measurements of a variable and the set of variables being measured.” Model fit, reliability, and construct validity indices were assessed based on the recommendations by [ 81 , 82 ]. Construct reliabilities were evaluated using Cronbach’s alpha, Composite reliability, and Average Variance Extracted (AVE) values. Construct validity was assessed using convergent validity and discriminant validity measures.

3.5 Stage #5. Structural equation modeling

SEM methodology facilitates the indirect measurement of unobserved latent variables with the measurement of indicator items for the variables in the model structure [ 82 ]. SEM methodology assesses how the latent variables in the model are related to one another and accounts for any errors in the measurement of the observed variables. Therefore, we adopted the PLS-SEM technique to estimate the relationship between the three dimensions of sustainability and their contribution to the overall sustainability construct through different item indicators for each of the dimensions of sustainability. Further, as per the recommendation of Hair et al. [ 83 ], the variance-based PLS-SEM framework is more suitable for our study because the sample size is comparatively less, and the normal distribution assumption is not significant for our study due to the innate nature of the items measuring the dimensions of sustainability. To specify the model parameters and estimate the relationship between the higher and lower-order constructs, we used SmartPLS [ 84 ], the most popular tool for PLS-SEM.

In the first stage of the analysis, items were identified based on an extensive literature review and face validation, instrument assessment by subject matter experts later, and instrument evaluation with suitable content validation indices. This was followed by assessing the validity and reliability of the hypothesized model to confirm the theory using CFA, leading to path relationship assessment using PLS-SEM methodology.

4.1 Item identification and face validation

The literature review compiled a comprehensive and exhaustive initial list comprising 48 items as indicators of sustainability practices adopted in shipping ports and the source (refer to Appendix 1 ). The initial list of items identified comprised 17 items as indicators of Environmental sustainability practices, 19 as indicators of environmental sustainability practices, and 12 as indicators of social sustainability practices adopted in shipping ports. The initial draft of the measuring instrument was subjected to face validation. The inputs from the two senior academicians and experts who carried out face validation were incorporated, which included necessary corrections such as the elimination of ambiguous terms, the inclusion of other indicators that were not included in the initial list, rephrasing of the sentences in the instrument for a better understanding of the context of the study along with the final formatting of the layout [ 68 ]. After incorporating the corrections of face validation along with their source, the items were compiled in the measurement instrument for content validation in the next stage.

4.2 Instrument assessment and content validation using CVI, Kappa, and CVR index

In the second stage, six selected subject matter experts conducted the content validation of the face-validated instrument. “Content validity is a subjective approach that evaluates the extent to which the content described through scale measures certain factors of study interest. Content validation evaluates whether the items in the questionnaire instrument are clear, readable, and relevant to the study context [ 85 ]. The relevance of the items, as well as the essentiality of these items, are supposed to be assessed and validated through the instrument assessment by the panel of subject matter experts. The assessment tool of the study instruments was administered to the six subject experts who were impaneled. Table 1 summarizes the profile of the experts who participated in validating the questionnaire items.

CVI and Kappa coefficients were calculated to assess the relevance of the items, and CVR was calculated to determine the items’ essentiality for the study context [ 74 ]. The responses of the experts on the item’s relevancy and essentiality were coded to spreadsheets for computation of CVI, Kappa coefficient, and CVR value as per the respective formulas [ 69 , 71 , 72 , 73 , 74 ]. The results after the computation of CVI, Kappa coefficient, and CVR are consolidated in Appendix 2 .

CVI indicates the proportion of experts who agreed on the tool and the measurement of the items for a given construct by considering ratings of 1 and 2 as invalid, whereas 3 and 4 are valid contents and consistent with the study conceptual framework [ 74 ]. Adopting the cut-offs suggested in previous established studies [ 69 , 74 ], items with a CVI of at least 0.84 were accepted in the validation. The validation tool indicated S-CVI as 0.86 and satisfies the minimum requirement of 0.80 per Shrotryia et al. [ 86 ] for an instrument to be considered content valid. Along with that, the Cohens Kappa coefficient was also computed with a cut-off of 0.74 to avoid any errors due to chance agreement by the expert panel [ 71 , 72 , 73 , 75 , 76 ]. According to Lawshe’s benchmark CVR index, for a panel size of 6, the cut-off CVR value prescribed is 0.99. It indicates agreement among the panel judges on the item’s necessity in the study questionnaire. Based on these inclusion criteria of CVI, Kappa, and CVR, the most essential and relevant shortlisted items and the final questionnaire were administered for construct validation. The finalized instrument for measuring the extent of the adoption of sustainability practices in shipping ports is shown in Appendix 3 .

4.3 Construct validity and reliability using confirmatory factor analysis and PLS-SEM

Empirical studies attempt to validate and justify the research framework developed with the help of primary data collected from respondents through a questionnaire instrument. Since the analysis solely depends on the data collected through the instrument and the data collected are not accurate measurements of factors of interest but observations of the respondent’s perceptions, the questionnaire should be subjected to validation and reliability checks [ 85 ]. The validation and reliability checking procedures aim to measure and address the measurement error caused by the difference in the actual scores measures from the measured or observed scores [ 66 ]. Validity exemplifies the extent to which the collected data represents the study’s primary purpose, in other words, “measuring what it proposes to measure. The content-validated measurement instrument was administered to port employees of Officer and above designation across various significant ports in India for data collection. Table 2 shows the demographic profiles of the samples who gave the responses to the questions in the instrument administered.

The goodness-of-fit indices were evaluated for the reflective measurement model considering the recommendation [ 81 , 82 ]. The model fit indexes for the hypothesized model were acceptable considering the benchmark recommended values [ 66 , 87 ]. The results [ \( \chi^{2} /\) df was 1.6, Goodness-of-fit index (GFI), Tucker Lewis Index (TLI) and Comparative Fit Index (CFI) > 0.9, Standardized Root Mean Square Residual (SRMR) = 0.053, and Root Mean Square Error of approximation (RMSEA) = 0.055] indicated acceptable model fit as per the recommendations. The standardized factor loading, construct validity, and reliability values are shown below in Table  3 .

Although Hair et al. [ 67 , p. 152–153] suggest a minimum factor loading benchmark value of 0.7 for statistical significance in general, it is also meant to consider 0.50 or above as practically significant in addition to another guideline recommending statistical significance of greater than 0.40-factor loading for a sample size of 200. Further, as per the recommendation of Chin et al. [ 88 ] and considering the practical significance of the items having more than 0.6 loadings, we believe all items with a factor loading above 0.60 are acceptable in the model structure. Therefore, all 26 items are retained in the measurement instrument. Construct reliabilities were assessed using Cronbach’s alpha and Composite reliability measures between 0.85 and 0.90, respectively. Following the reference guidelines by Hair et al. [ 89 ], the measures indicate good and acceptable internal consistency, thereby establishing the scale’s reliability in measuring the construct.

Construct validity was evaluated using convergent and discriminant validity measures except for the EcoSP construct; the other two constructs, viz. EnvSP and SocSP had AVE above the minimum benchmark of 0.50, whereas EcoSP was very close at 0.49. It can be approximated to 0.5, which is correct at the acceptable benchmark for estimating the convergent validity of the measurement model [ 90 ]. There are recommendations that marginal shortfall in AVE is adequate when Cronbach’s alpha and composite reliability are higher than 0.60 [ 89 , 90 ]. These results indicate the acceptable reliability of the scale for measuring sustainability practices in ports.

Hair et al. [ 89 ] emphasize the two established measures of discriminant validity in a model, viz., the Fornell–Larcker criterion and the Heterotrait-monotrait (HTMT) ratio. In the Fornell–Larcker criterion approach, the inter-construct correlations that measure the shared variance between latent variables are compared with the square root of average variance extracted values of the construct. The square root of AVE of the specific construct under consideration is expected to be greater than the particular construct’s highest inter-construct correlation, which signifies the shared variance with other constructs of the model under study. The square root of the AVE of all the constructs was compared with the correlation measures for every build. It was found to be greater than the respective correlation values of the construct under consideration, thereby ascertaining the discriminant validity of the construct. In the HTMT ratio approach, the estimated correlations measured are also termed unattenuated correlation, and the value of unattenuated correlation close to 1 implies an absence of discriminant validity. The benchmark value for the HTMT ratio is 0.90, and any measures above this threshold imply the absence of discriminant validity of the constructs [ 91 , 92 ]. All the measures of discriminant validity assessment indicated HTMT ratio values to be less than 0.9, thus satisfying the discriminant validity requirement of the measurement scale.

Variance-inflation-factor (VIF) was checked for the possibility of multi-collinearity issues [ 89 , 91 , 93 , 94 ]. Multi-collinearity was ruled out as all the VIF values were less than three. The above results support the reliability and validity of the sustainability constructs as collective indicators of the three dimensions of sustainability viz economic, environmental, and social sustainability, and confirm the relationship. Further, the bootstrapping procedure was run to test the significance of the path. The standardized path coefficient values, T-statistics, and p-values shown in Table  4 explain the variance of the three dimensions of the sustainability practice construct. The p-values (< 0.05) indicate that all the structural model relationships are statistically significant.

5 Discussion and implications of the study

The authors followed the systematic procedure of compiling a comprehensive list of related items for the three sustainability practices constructs through an extensive literature review followed by face validation and content validation to assess the relevance and essentiality of the items in the context of shipping ports in India. Empirical studies attempt to validate and justify the research framework developed with the help of primary data collected from respondents through a questionnaire instrument. Since the analysis solely depends on the data collected through the instrument and the data collected are not accurate measurements of factors of interest but observations of the respondent’s perceptions, the questionnaire should be subjected to validation and reliability checks [ 85 ]. The content-validated instrument was subjected to empirical evaluation with sample data collected and using the CFA technique to ascertain the reliability and validity of the model.

Specifically, the results indicate that the subject matter experts have prioritized essential and relevant items in the contemporary business environment, giving nearly equal weightage and importance to all three dimensions of sustainability practices: environmental, economic, and social. Among the items validated, the expert panel had the minor agreement for relevance and necessity on foreign direct investment and funding items, which postulates the shipping ports in India are primarily funded by the government as the minor and significant ports that comprise most of the ports controlled and administered by state and central government respectively. The same reason can be attributed to the low relevance of job security in the context of Indian shipping ports. Further, the items related to odor and smoke also received low relevance as they indicate the low degree of industrial development in shipping ports in India. Although shown as relevant, cold-ironing power sources for vessels on the berth also received a low degree of agreement for necessity. Even recognizing requirements and supporting the community also received little agreement for necessity. However, the remarks provided by the panelist highlight that these focus areas are essentially part of corporate social responsibility, and there is no necessity to assess this separately.

Content validation evaluates whether the items in the questionnaire instrument are clear, readable, and relevant to the study context [ 85 ]. After face and content-validation of the instrument, the finalized list comprised eight items as indicators of EnvSPs, ten as indicators of EcoSPs, and eight as indicators of SocSPs adopted in shipping ports. Thus, the content-validated items for the questionnaire instrument comprised 26 items for measuring the constructs of the study, which is closer to the number of items. Oh et al. [ 29 ] had adopted in the sustainability of ports study. Their study adopted the importance-performance analysis technique to evaluate the sustainability of South Korean ports using 27 vital measures of the sustainability of ports adapted from the findings and discussions of previous research and found that those measures are essential from a port sustainability point of view. Their study classified the indicators of port sustainability in the three dimensions of sustainability as opined in the TBL concept. Along similar lines, Narasimha et al. [ 32 ] conducted a thematic analysis of the sustainability performance of seaports followed by semi-structured interviews. They later applied a fuzzy analytical hierarchy process to compute the weight for each port sustainability performance indicator. Their study categorized the indicators into three dimensions of sustainability performance, namely social, environmental, and economic sustainability performance practices. Therefore, it can be interpreted from the results that these content-validated items are reflective indicators of the sustainability practice constructs and collectively constitute latent variables for empirical studies, confirming that the measurement model reflects the construct validity.

Very Specifically, this study supports the well-established “Tripple Bottom Line” (TBL) theory of sustainability coined by Elkington [ 52 ] that these validated sustainability practice-related items in the measuring instrument adequately represent the seaport domain, and the instrument can be used for measuring the constructs through empirical studies. Even the Sustainable Development Goals (SDG) of the United Nations Development Programme (UNDP) also talks about integrated sustainable development by balancing the three pillars of sustainability: environmental, economic, and social. Chang and Kuo [ 95 ] advise organizations to look at short- and long-term sustainable practices for short-term earnings and safeguard the environment and social integrity simultaneously. Thus, at the strategic level, the TBL practices are the higher-order constructs of sustainability practices, focusing on the long term [ 96 ]. Therefore, the findings of this study contribute to the extant body of literature knowledge by providing empirical evidence on the practical and statistical relationship between the environmental, economic, and social sustainability-related practices of the sustainability construct in the TBL theory-based framework applied for shipping ports.

Yadav et al. [ 97 ] also emphasized the availability of several environmental management systems (EMS) for achieving environmental sustainability. They also recommended introducing methods promoting green culture, supporting green behavior, and improving employee commitment to achieving environmental sustainability. The social dimension of sustainability primarily focuses on facilitating and providing equitable opportunities and the well-being of the port employees and other stakeholders, including the local community, driven by the policies and practices of the port authority. Alamoush et al. [ 2 ] equated economic sustainability to generating revenue and monetary gains and considered economic sustainability to be one of the primary drivers of the other two dimensions—environmental and social sustainability. Further, the results from the PLS-SEM analysis indicate that the most significant contribution towards the overall sustainability of the port is from the economic sustainability dimension of sustainability. Like the findings of Alamoush et al. [ 2 ], the financial investments in the port are the drivers of environmental and social sustainability. Poulsen et al. [ 98 ] proved with facts and figures that air quality was improved even with an increase in cargo throughput, mainly driven by the financial investments in air quality control systems in many ports across Europe.

The improvement in air quality around the port vicinity contributes to environmental sustainability. In addition, it also contributes to social sustainability as the community and the port surroundings, including the ecosystems and the natural habitat for birds and animals, experience better living conditions around the port vicinity. This affirms the indirect benefits achieved in environmental and social dimensions by implementing economic sustainability-related strategies and policies. Our findings also emphasize the need for an integrative approach to achieving sustainability of ports, and it can be achieved only when all three dimensions intersect and contribute to complement each other for overall sustainable development.

This study contributes with both novel theoretical and practical implications. Firstly, the study provides a comprehensive list of items about the indicators of sustainability practices in shipping ports, which are available in published scholarly articles and from domain experts working in the port industry. Secondly, as the first of its kind in the seaport sector, the study adopted a scientific content validation approach of indices and procedures to assess the relevance and essentiality of items in the context of shipping ports and contemporary sustainability practices focused on shipping ports. Our study validated an instrument for assessing the sustainability practices in shipping ports, which is a significant step in formulating policies and developing strategies focusing on the sustainable development of ports. The validated instrument can be adapted to determine the extent of adoption of sustainability practices and drive the necessary implementation through policy centered around the sustainability of shipping ports. The instrument can be a guideline for practitioners, policymakers, and researchers focusing on the sustainable development of shipping ports through environmental, economic, and social sustainability practices. Ports authorities can embrace the validated instrument to assess their level of adoption and focus on these sustainability practices, which will aid in developing policies and strategies for the sustainable development of ports. Further, the Global Reporting Initiative (GRI) Standards, developed by the Global Sustainability Standards Board (GSSB) primarily for sustainability reporting, can be referred to along with our validated instrument for sustainability evaluation and reporting in compliance with the GRI standards [ 99 ]. GRI Standards assist organizations in understanding and reporting the extent to which the organization impacts sustainability and contributes to sustainable development, considering the interests of all the stakeholders, including investors, policymakers, capital markets, and civil society, thus making the organization transparent and responsible for sustainability. Sector-specific standards have been developed, of which ports are part of Group 3, which comprises various Transport, infrastructure, and tourism-related sectors. However, it is not readily available for shipping ports but can be developed and customized by the port authorities. To do so, the findings of the validated instrument of our study can be considered as a guide in assessing and preparing the sustainability report as per the applicable GRI standards.

Further, sustainability assessment should not be considered a one-time activity in the port. Instead, the port authorities should have strategies and policies to track the trends and changes taking place to the extent of adopting sustainability practices in the port and their impact on sustainable development. Each individual port must do it through its team/department or personnel responsible for the sustainability assessment and policy implementation in the port, and it also must be a continuous activity at regular intervals, maybe once in 3 months or 6 months, depending on the policy and management decision. Thus, the longitudinal assessment, which keeps track of the various aspects of sustainability, will help the port evaluate the effectiveness of sustainability interventions implemented at shipping ports.

6 Limitation and scope for future work

Although the study achieved its objectives of a novel contribution of a content-validated sustainability measurement instrument for assessing sustainability practices in seaports, there were a few limitations, and there is also further scope for advancing the study in the future. The keywords used in the literature search were confined to published articles in the Scopus database. Future work can expand the search in other scholarly databases and increase the items’ relevance to measuring shipping ports’ sustainability practices. The study was limited to government-controlled significant ports on India’s east and west coasts. Due to permission and access-related challenges, the data collection did not cover the privately managed ports. The items of the study are generalized concerning a shipping port, and further research can consider further refinement specific to the type of cargo handled in the Port or confined to the terminal instead of a generalizable study irrespective of the kind of cargo being handled. The applications of digital technology and automation using Artificial Intelligence and Machine Learning, along with big data and blockchain technology, could be explored to assess their impact on sustainable port management and development. A different methodological approach can be adopted like the study of Yadav et al. [ 97 ], where the “multi-criteria-decision making (MCDM)” approach was used to identify the enablers of sustainability along with the determination of its intensity using “Robust-Best–Worst-method” (RBWM). Their analysis identified economic and environmental-related enablers as the high-intensity enablers of sustainability that organizations can focus on. Other stakeholders, such as customers, port users, government agencies linked with the port operations, and the local community, were not part of the panel for the validation process. In future studies, these other stakeholders can also be considered in the panel so that every aspect is covered in the evaluation. The items were based on a 5-point Likert scale in this study to capture only the perception of port employees on the sustainability practices adopted in the port. A suitable triangulation method and case studies can also be used to analyze the qualitative aspects of adopting sustainability practices in the port.

7 Conclusion

The study validated an instrument for assessing the sustainability practices in shipping ports, which is a significant step in formulating strategies focusing on the sustainable development of ports. The instrument can be a guideline for practitioners, policymakers, and researchers focusing on the sustainable development of shipping ports through environmental, economic, and social sustainability practices. The study prepared a comprehensive list comprising relevant items identified through a thorough literature review of articles published in the Scopus database. After face validation, the measurement tool was administered to six subject matter experts who evaluated its relevance and essentiality in measuring sustainability practices in shipping ports. The content validity was assessed using the most widely used and adopted indices: CVI, Cohen’s Kappa’s coefficient, and CVR. CVI and Cohen’s Kappa’s coefficient are the indices for assessing the relevance of the items in measuring sustainability practices, and CVR is the index for determining the essentiality of the items in measuring sustainability practices in shipping ports. Further, this study contributes to the extant body of literature by providing evidence on the empirical relationship between the environmental, economic, and social sustainability-related practices of the sustainability construct in the TBL theory-based framework applied for shipping ports.

Data availability

The data for analysis in the study was based on survey data collected through a questionnaire instrument administered on Likert scales, both online and offline modes of data collection. The data collection period was between December 2022 and Dec 2023. The instrument had a declaration mentioning maintaining the privacy of the participants and therefore, the data cannot be made public to protect study participant privacy. The primary data collected in the study are not publicly accessible but are available from the corresponding author upon reasonable request.

Meixell MJ, Luoma P. Stakeholder pressure in sustainable supply chain management: a systematic review. Int J Phys Distrib Logist Manag. 2015;45:69–89. https://doi.org/10.1108/IJPDLM-05-2013-0155 .

Article   Google Scholar  

Alamoush AS, Ballini F, Ölçer AI. Revisiting port sustainability as a foundation for the implementation of the United Nations Sustainable Development Goals (UN SDGs). J Shipp Trade. 2021;6(1):1–40. https://doi.org/10.1186/S41072-021-00101-6 .

Dyllick T, Hockerts K. Beyond the business case for corporate sustainability. Bus Strateg Environ. 2002;11(2):130–41. https://doi.org/10.1002/bse.323 .

Lun YHV, Lai K, Wong CWY, Cheng TCE. Green shipping management. Cham: Springer International Publishing; 2016. https://doi.org/10.1007/978-3-319-26482-0 .

Book   Google Scholar  

Porter ME, Van Der Linde C. Green and competitive: ending the stalemate. In: Corporate environmental responsibility. 2017. p. 47–60. https://doi.org/10.1016/0024-6301(95)99997-e .

Russo MV, Fouts PA. A resource-based perspective on corporate environmental performance and profitability. Acad Manag J. 1997;40(3):534–59. https://doi.org/10.2307/257052 .

Roszkowska-Menkes M. Porter and Kramer’s (2006) “shared value.” In: Encyclopedia of sustainable management. Cham: Springer International Publishing; 2021. p. 1–6. https://doi.org/10.1007/978-3-030-02006-4_393-1 .

Chapter   Google Scholar  

Zhu Q, Sarkis J. Relationships between operational practices and performance among early adopters of green supply chain management practices in Chinese manufacturing enterprises. J Oper Manag. 2004;22(3):265–89. https://doi.org/10.1016/j.jom.2004.01.005 .

Hong J, Zhang Y, Ding M. Sustainable supply chain management practices, supply chain dynamic capabilities, and enterprise performance. J Clean Prod. 2018;172:3508–19. https://doi.org/10.1016/J.JCLEPRO.2017.06.093 .

Ministry of Ports Shipping and Waterways. Maritime India vision 2030. Sagarmala; 2021.

Pradhan RP, Rathi C, Gupta S. Sagarmala & India’s maritime big push approach: seaports as India’s geo-economic gateways & neighborhood maritime lessons. J Indian Ocean Reg. 2022;18(3):209–29. https://doi.org/10.1080/19480881.2022.2114195 .

Mantry S, Ghatak RR. Comparing and contrasting competitiveness of major Indian and select international ports. Int J Res Finance Mark. 2017;7(5):1–19.

Google Scholar  

Song DW, Panayides PM. Global supply chain and port/terminal: integration and competitiveness. In: Maritime policy and management. London: Taylor & Francis; 2008. p. 73–87. https://doi.org/10.1080/03088830701848953 .

Yap WY, Lam JSL. 80 million-twenty-foot-equivalent-unit container port? Sustainability issues in port and coastal development. Ocean Coast Manag. 2013;71:13–25. https://doi.org/10.1016/j.ocecoaman.2012.10.011 .

Lee PTW, Kwon OK, Ruan X. Sustainability challenges in maritime transport and logistics industry and its way ahead. Sustainability. 2019;11(5):1331. https://doi.org/10.3390/SU11051331 .

Dragović B, Tzannatos E, Park NK. Simulation modelling in ports and container terminals: literature overview and analysis by research field, application area and tool. Flex Serv Manuf J. 2017;29(1):4–34. https://doi.org/10.1007/s10696-016-9239-5 .

Ashrafi M, Acciaro M, Walker TR, Magnan GM, Adams M. Corporate sustainability in Canadian and US maritime ports. J Clean Prod. 2019;220:386–97. https://doi.org/10.1016/j.jclepro.2019.02.098 .

Peris-Mora E, Orejas JMD, Subirats A, Ibáñez S, Alvarez P. Development of a system of indicators for sustainable port management. Mar Pollut Bull. 2005;50(12):1649–60. https://doi.org/10.1016/j.marpolbul.2005.06.048 .

Article   CAS   Google Scholar  

Ashrafi M, Walker TR, Magnan GM, Adams M, Acciaro M. A review of corporate sustainability drivers in maritime ports: a multi-stakeholder perspective. Marit Policy Manag. 2020;47(8):1027–44. https://doi.org/10.1080/03088839.2020.1736354 .

Stanković JJ, Marjanović I, Papathanasiou J, Drezgić S. Social, economic and environmental sustainability of port regions: MCDM approach in composite index creation. J Mar Sci Eng. 2021;9(1):74. https://doi.org/10.3390/JMSE9010074 .

Dinwoodie J, Tuck S, Knowles H, Benhin J, Sansom M. Sustainable development of maritime operations in ports. Bus Strateg Environ. 2011. https://doi.org/10.1002/bse.718 .

Ports primer: 7.1 environmental impacts | US EPA. https://www.epa.gov/community-port-collaboration/ports-primer-71-environmental-impacts . Accessed Apr 28 2024.

Notteboom T, Pallis A, Rodrigue J-P. Port economics, management and policy. Port Econ Manag Policy. 2021. https://doi.org/10.4324/9780429318184 .

Notteboom T, van der Lugt L, van Saase N, Sel S, Neyens K. The role of seaports in green supply chain management: initiatives, attitudes, and perspectives in Rotterdam, Antwerp, North Sea Port, and Zeebrugge. Sustainability. 2020;12(4):1688. https://doi.org/10.3390/su12041688 .

Molavi A, Lim GJ, Race B. A framework for building a smart port and smart port index. Int J Sustain Transp. 2020;14(9):686–700. https://doi.org/10.1080/15568318.2019.1610919 .

Wu Q, He Q, Duan Y. Explicating dynamic capabilities for corporate sustainability. EuroMed J Bus. 2013;8(3):255–72. https://doi.org/10.1108/EMJB-05-2013-0025 .

Argyriou I, Daras T, Tsoutsos T. Challenging a sustainable port. A case study of Souda port, Chania, Crete. Case Stud Transp Policy. 2022;10(4):2125–37. https://doi.org/10.1016/J.CSTP.2022.09.007 .

Bjerkan KY, Seter H. Reviewing tools and technologies for sustainable ports: does research enable decision making in ports? Transp Res D Transp Environ. 2019;72:243–60. https://doi.org/10.1016/j.trd.2019.05.003 .

Oh H, Lee S-W, Seo Y-J. The evaluation of seaport sustainability: the case of South Korea. Ocean Coast Manag. 2018;161:50–6. https://doi.org/10.1016/j.ocecoaman.2018.04.028 .

Lu CS, Shang KC, Lin CC. Examining sustainability performance at ports: port managers’ perspectives on developing sustainable supply chains. Marit Policy Manag. 2016;43(8):909–27. https://doi.org/10.1080/03088839.2016.1199918 .

Kang D, Kim S. Conceptual model development of sustainability practices: the case of port operations for collaboration and governance. Sustainability. 2017;9(12):2333. https://doi.org/10.3390/su9122333 .

Narasimha PT, Jena PR, Majhi R. Sustainability performance assessment framework for major seaports in India. Int J Sustain Dev Plan. 2022;17(2):693–704. https://doi.org/10.18280/ijsdp.170235 .

Vejvar M, Lai K, Lo CKY, Fürst EWM. Strategic responses to institutional forces pressuring sustainability practice adoption: case-based evidence from inland port operations. Transp Res D Transp Environ. 2018;61:274–88. https://doi.org/10.1016/j.trd.2017.08.014 .

Ayre C, Scally AJ. Critical values for Lawshe’s content validity ratio: revisiting the original methods of calculation. Meas Eval Couns Dev. 2014;47(1):79–86. https://doi.org/10.1177/0748175613513808 .

Barbosa MW, Cansino JM. A water footprint management construct in agri-food supply chains: a content validity analysis. Sustainability. 2022;14(9):4928. https://doi.org/10.3390/su14094928 .

Waltz CF, Strickland OL, Lenz ER. Measurement in nursing and health. Research. 2016. https://doi.org/10.1891/9780826170620 .

Ibiyemi A, Mohd Adnan Y, Daud MN, Olanrele S, Jogunola A. A content validity study of the test of valuers’ support for capturing sustainability in the valuation process in Nigeria. Pac Rim Prop Res J. 2019;25(3):177–93. https://doi.org/10.1080/14445921.2019.1703700 .

Diniz NV, Cunha DR, de Santana Porte M, Oliveira CBM, de Freitas Fernandes F. A bibliometric analysis of sustainable development goals in the maritime industry and port sector. Reg Stud Mar Sci. 2024;69: 103319. https://doi.org/10.1016/j.rsma.2023.103319 .

Amui LBL, Jabbour CJC, de Sousa Jabbour ABL, Kannan D. Sustainability as a dynamic organizational capability: a systematic review and a future agenda toward a sustainable transition. J Clean Prod. 2017;142:308–22. https://doi.org/10.1016/j.jclepro.2016.07.103 .

Maletič M, Maletič D, Gomišček B. The impact of sustainability exploration and sustainability exploitation practices on the organisational performance: a cross-country comparison. J Clean Prod. 2016. https://doi.org/10.1016/j.jclepro.2016.02.132 .

Berns M, Hopkins MS, Townend A, Khayat Z, Balagopal B, Reeves M. The business of sustainability: what it means to managers now. MIT Sloan Manag Rev. 2009;51(1).

Montiel I, Delgado-Ceballos J. Defining and measuring corporate sustainability. Organ Environ. 2014;27(2):113–39. https://doi.org/10.1177/1086026614526413 .

Laxe FG, Bermúdez FM, Palmero FM, Novo-Corti I. Assessment of port sustainability through synthetic indexes. Application to the Spanish case. Mar Pollut Bull. 2017;119(1):220–5. https://doi.org/10.1016/j.marpolbul.2017.03.064 .

Torugsa NA, O’Donohue W, Hecker R. Proactive CSR: an empirical analysis of the role of its economic, social and environmental dimensions on the association between capabilities and performance. J Bus Ethics. 2013. https://doi.org/10.1007/s10551-012-1405-4 .

Lauring J, Thomsen C. Collective ideals and practices in sustainable development: managing corporate identity. Corp Soc Responsib Environ Manag. 2009;16(1):38–47. https://doi.org/10.1002/csr.181 .

Hallstedt SI, Thompson AW, Lindahl P. Key elements for implementing a strategic sustainability perspective in the product innovation process. J Clean Prod. 2013;51:277–88. https://doi.org/10.1016/J.JCLEPRO.2013.01.043 .

Parola F, Risitano M, Ferretti M, Panetti E. The drivers of port competitiveness: a critical review. Transp Rev. 2017;37(1):116–38. https://doi.org/10.1080/01441647.2016.1231232 .

Simpson J, Weiner E, Durkin P. The Oxford English dictionary today. Trans Philol Soc. 2004;102(3):335–81. https://doi.org/10.1111/j.0079-1636.2004.00140.x .

Ruggerio CA. Sustainability and sustainable development: a review of principles and definitions. Sci Total Environ. 2021;786: 147481. https://doi.org/10.1016/J.SCITOTENV.2021.147481 .

Moore JE, Mascarenhas A, Bain J, Straus SE. Developing a comprehensive definition of sustainability. Implement Sci. 2017;12(1):1–8. https://doi.org/10.1186/S13012-017-0637-1/TABLES/3 .

Elkington J. Partnerships from cannibals with forks: the triple bottom line of 21st-century business. Environ Qual Manag. 1998;8(1):37–51. https://doi.org/10.1002/tqem.3310080106 .

Elkington J. Tripple bottom line. In: Cannibals with forks. Oxford: Capstone; 1997.

Waddock SA, Graves SB. The corporate social performance-financial performance link. Strateg Manag J. 1997;18(4):303–19. https://doi.org/10.1002/(SICI)1097-0266(199704)18:4%3c303::AID-SMJ869%3e3.0.CO;2-G .

Sharma S, Vredenburg H. Proactive corporate environmental strategy and the development of competitively valuable organizational capabilities. Strateg Manag J. 1998;19(8):729–53. https://doi.org/10.1002/(sici)1097-0266(199808)19:8%3c729::aid-smj967%3e3.3.co;2-w .

Carroll AB, Shabana KM. The business case for corporate social responsibility: a review of concepts, research and practice. Int J Manag Rev. 2010;12(1):85–105. https://doi.org/10.1111/j.1468-2370.2009.00275.x .

Beske P. Dynamic capabilities and sustainable supply chain management. Int J Phys Distrib Logist Manag. 2012;42(4):372–87. https://doi.org/10.1108/09600031211231344 .

Lam JSL, Li KX. Green port marketing for sustainable growth and development. Transp Policy. 2019;84:73–81. https://doi.org/10.1016/j.tranpol.2019.04.011 .

Olakitan Atanda J. Developing a social sustainability assessment framework. Sustain Cities Soc. 2019;44:237–52. https://doi.org/10.1016/j.scs.2018.09.023 .

Bansal P. Evolving sustainably: a longitudinal study of corporate sustainable development. Strateg Manag J. 2005;26(3):197–218. https://doi.org/10.1002/smj.441 .

Carter CR, Liane Easton P. Sustainable supply chain management: evolution and future directions. Int J Phys Distrib Logist Manag. 2011;41(1):46–62. https://doi.org/10.1108/09600031111101420 .

Janic M. Sustainable transport in the European Union: a review of the past research and future ideas. Transp Rev. 2006;26(1):81–104. https://doi.org/10.1080/01441640500178908 .

Steurer R, Langer ME, Konrad A, Martinuzzi A. Corporations, stakeholders and sustainable development I: a theoretical exploration of business-society relations. J Bus Ethics. 2005;61(3):263–81. https://doi.org/10.1007/s10551-005-7054-0 .

Stanković JJ, Marjanović IM, Papathanasiou J, Drezgić SD. Marine science and engineering social, economic and environmental sustainability of port regions: MCDM approach in composite index creation. J Mar Sci Eng. 2021. https://doi.org/10.3390/jmse9010074 .

Mori K, Christodoulou A. Review of sustainability indices and indicators: towards a new city sustainability index (CSI). Environ Impact Assess Rev. 2012;32(1):94–106. https://doi.org/10.1016/J.EIAR.2011.06.001 .

Mayer AL. Strengths and weaknesses of common sustainability indices for multidimensional systems. Environ Int. 2007. https://doi.org/10.1016/j.envint.2007.09.004 .

Hair J, Black W, Babin B, Anderson R. Multivariate data analysis: a global perspective. In: Multivariate data analysis: a global perspective, vol. 7. Upper Saddle River: Pearson Education; 2010.

Hair JF, Black WC, Babin BJ, Anderson RE. Multivariate data analysis. Hampshire: Cengage Learning; 2019.

Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR, Young SL. Best practices for developing and validating scales for health, social, and behavioral research: a primer. Front Public Health. 2018;6:149. https://doi.org/10.3389/fpubh.2018.00149 .

Elangovan N, Sundaravel E. Method of preparing a document for survey instrument validation by experts. MethodsX. 2021;8: 101326. https://doi.org/10.1016/J.MEX.2021.101326 .

Papadas KK, Avlonitis GJ, Carrigan M. Green marketing orientation: conceptualization, scale development and validation. J Bus Res. 2017;80:236–46. https://doi.org/10.1016/J.JBUSRES.2017.05.024 .

Polit DF, Beck CT. The content validity index: are you sure you know what’s being reported? Critique and recommendations. Res Nurs Health. 2008;31(4):489–97.

Polit DF, Beck CT, Owen SV. Focus on research methods: Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Res Nurs Health. 2007;30(4):459–67. https://doi.org/10.1002/nur.20199 .

Zamanzadeh V, Ghahramanian A, Rassouli M, Abbaszadeh A, Alavi-Majd H, Nikanfar A-R. Design and implementation content validity study: development of an instrument for measuring patient-centered communication. J Caring Sci. 2015;4(2):165. https://doi.org/10.15171/jcs.2015.017 .

de Souza AC, Alexandre NMC, Guirardello EDB, de Souza AC, Alexandre NMC, Guirardello EDB. Propriedades psicométricas na avaliação de instrumentos: avaliação da confiabilidade e da validade. Epidemiologia e Serviços de Saúde. 2017;26(3):649–59. https://doi.org/10.5123/S1679-49742017000300022 .

Rodrigues IB, Adachi JD, Beattie KA, MacDermid JC. Development and validation of a new tool to measure the facilitators, barriers and preferences to exercise in people with osteoporosis. BMC Musculoskelet Disord. 2017;18(1):540. https://doi.org/10.1186/s12891-017-1914-5 .

Bobos P, Pouliopoulou DVS, Harriss A, Sadi J, Rushton A, MacDermid JC. A systematic review and meta-analysis of measurement properties of objective structured clinical examinations used in physical therapy licensure and a structured review of licensure practices in countries with well-developed regulation systems. PLoS ONE. 2021;16(8): e0255696. https://doi.org/10.1371/journal.pone.0255696 .

Hair JF, Hult GTM, Ringle CM, Sarstedt M, Thiele KO. Mirror, mirror on the wall: a comparative evaluation of composite-based structural equation modeling methods. J Acad Mark Sci. 2017;45(5):616–32. https://doi.org/10.1007/s11747-017-0517-x .

Cohen J. A power primer. In: Methodological issues and strategies in clinical research. 4th ed. Washington: American Psychological Association; 2016. p. 279–84. https://doi.org/10.1037/14805-018 .

Roldán JL, Sánchez-Franco MJ. Variance-based structural equation modeling. In: Research methodologies, innovations and philosophies in software systems engineering and information systems. Pennsylvania: IGI Global; 2012. p. 193–221. https://doi.org/10.4018/978-1-4666-0179-6.ch010 .

Faul F, Erdfelder E, Buchner A, Lang A-G. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav Res Methods. 2009;41(4):1149–60. https://doi.org/10.3758/BRM.41.4.1149 .

Goodboy AK, Kline RB. Statistical and practical concerns with published communication research featuring structural equation modeling. Commun Res Rep. 2017;34(1):68–77. https://doi.org/10.1080/08824096.2016.1214121 .

Crawford JA, Kelder J-A. Do we measure leadership effectively? Articulating and evaluating scale development psychometrics for best practice. Leadersh Q. 2019;30(1):133–44. https://doi.org/10.1016/j.leaqua.2018.07.001 .

Sarstedt M, Hair JF, Cheah JH, Becker JM, Ringle CM. How to specify, estimate, and validate higher-order constructs in PLS-SEM. Australas Mark J. 2019;27(3):197–211. https://doi.org/10.1016/J.AUSMJ.2019.05.003 .

Ringle CM, Wende S, Becker J-M. SmartPLS 4. http://www.smartpls.com .

Malhotra S. Study of features of mobile trading apps: a silver lining of pandemic. J Global Inf Bus Strateg. 2020. https://doi.org/10.5958/2582-6115.2020.00009.0 .

Shrotryia VK, Dhanda U. Content validity of assessment instrument for employee engagement. SAGE Open. 2019;9(1):2158244018821751. https://doi.org/10.1177/2158244018821751 .

Bagozzi RP, Yi Y. On the evaluation of structural equation models. J Acad Mark Sci. 1988;16(1):74–94. https://doi.org/10.1007/BF02723327 .

Chin WW, Gopal A, Salisbury WD. Advancing the theory of adaptive structuration: the development of a scale to measure faithfulness of appropriation. Inf Syst Res. 1997;8(4):342–67. https://doi.org/10.1287/isre.8.4.342 .

Hair JF, Hult Jr GTM, Ringle CM, Sarstedt M. A primer on partial least squares structural equations modeling (PLS-SEM). J Tour Res. 2021;6(2).

Fornell C, Larcker DF. Evaluating structural equation models with unobservable variables and measurement error. J Mark Res. 1981;18(1):39–50. https://doi.org/10.1177/002224378101800104 .

Henseler J, Ringle CM, Sarstedt M. A new criterion for assessing discriminant validity in variance-based structural equation modeling. J Acad Mark Sci. 2015;43(1):115–35. https://doi.org/10.1007/s11747-014-0403-8 .

Franke G, Sarstedt M. Heuristics versus statistics in discriminant validity testing: a comparison of four procedures. Internet Res. 2019;29(3):430–47. https://doi.org/10.1108/IntR-12-2017-0515 .

Hu LT, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model. 1999;6(1):1–55. https://doi.org/10.1080/10705519909540118 .

Becker JM, Ringle CM, Sarstedt M, Völckner F. How collinearity affects mixture regression results. Mark Lett. 2015;26(4):643–59. https://doi.org/10.1007/s11002-014-9299-9 .

Chang DS, Kuo LCR. The effects of sustainable development on firms’ financial performance—an empirical approach. Sustain Dev. 2008;16(6):365–80. https://doi.org/10.1002/sd.351 .

Ogunbiyi O, Oladapo A, Goulding J. An empirical study of the impact of lean construction techniques on sustainable construction in the UK. Constr Innov. 2014;14(1):88–107. https://doi.org/10.1108/CI-08-2012-0045 .

Yadav G, Kumar A, Luthra S, Garza-Reyes JA, Kumar V, Batista L. A framework to achieve sustainability in manufacturing organisations of developing economies using industry 4.0 technologies’ enablers. Comput Ind. 2020;122: 103280. https://doi.org/10.1016/j.compind.2020.103280 .

Poulsen RT, Ponte S, Sornn-Friese H. Environmental upgrading in global value chains: the potential and limitations of ports in the greening of maritime transport. Geoforum. 2018;89:83–95. https://doi.org/10.1016/J.GEOFORUM.2018.01.011 .

GRI -Standards. https://www.globalreporting.org/standards/ . Accessed 09 May 2024.

Lu C-S, Shang K-C, Lin C-C. Identifying crucial sustainability assessment criteria for container seaports. Marit Bus Rev. 2016;1(2):90–106. https://doi.org/10.1108/MABR-05-2016-0009 .

Download references

Acknowledgements

We acknowledge the contribution of the expert panel for their reviews and feedback that enabled us to optimize the items in the instrument.

Open access funding provided by Manipal Academy of Higher Education, Manipal. This study has not received any funding from any institutions or agencies.

Author information

Authors and affiliations.

Department of Commerce, Manipal Academy of Higher Education, Manipal, 576104, India

Department of Humanities and Management, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, 576104, India

Yogesh P. Pai

T A Pai Management Institute, Yelahanka, Govindapura, Bengaluru, 560064, Karnataka, India

Parthesh Shanbhag

You can also search for this author in PubMed   Google Scholar

Contributions

All the authors contributed to the manuscript equally K.L conceptualized the study and executed data collection All the authors jointly performed data analysis and authored the manuscript All authors reviewed the manuscript before submitting.

Corresponding author

Correspondence to Yogesh P. Pai .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1. Construct-wise list of items and source

 

Source

Environmental sustainability practices

 Avoiding the use of unpolluted land in the port area

[ , , , , , ]

 Developing and maintaining mangroves, gardens, and landscapes

 Avoiding environmental destruction during dredging

 Considering environmental protection when handling cargo

 Using recyclable or environment-friendly materials in port construction

 Protecting the ecological environment in the port area

 Reduction of noise pollution

 Mitigating light influence on neighboring residents

 Controlling smoke level

 Maintaining air quality

 Reduction of greenhouse gas

 Reduction of carbon emissions

 Preventing odour pollution

 Optimal utilization of renewables and alternate energy sources

 Facilities for wastewater and sewage treatment

 Implementation of dust suppression systems

Economic sustainability practices

 Facilitating economic growth and acting as a supply chain link in local and global trade

[ , , , , , ]

 Investments in port infrastructure development

 Establishing port development funding

 Attracting foreign direct investments

 Promotion and development of cruise tourism services

 Employment generation and career growth opportunities

 Ensuring that cargo is handled safely and effectively

 Low damage or loss record for cargo delivery

 Usage of energy-efficient electrical and electronic appliances like LED lamps

 Optimal utilization of infrastructure, land, and space in the port area

 Offering one-stop logistics solutions, including freight forwarding and additional services

 Optimizing the routing of vehicles in and out of port

 Mitigating congestion in the port

 Providing incentives for green shipping practices

 Landlord activities

 Investment in climate change adaptation strategies

 Sustainable supply chain policy

 Investment in innovation strategy

 Transshipment and storage of dangerous goods

Social sustainability practices

 Recognizing the requirements of the neighboring community

[ , , , , , ]

 Giving support to community social activities

 Providing training and education for employees regularly

 Providing employees’ welfare benefits and other facilities

 Staff job security even during uncertainties of the business

 Strengthening safety and security management standards and protocols of the port

 Accident prevention in the port area

 Social equality and gender diversity in employment

 Job satisfaction of employees

 Consulting various interest groups such as labor unions and community leaders when making port project decision

 Strengthening port infrastructure for social contribution

 Engaging in corporate social responsibility practices

Appendix 2. Results of content validity

Item code

Items description

Agree to count for CVI

i-CVI

Pc

K

Agree to count for CVR

CVR

EnvSP1

Avoiding the use of unpolluted land in the port area

4

0.67

0.23

0.56

6

1.00

EnvSP2

Developing and maintaining mangroves, gardens, and landscapes

6

1

0.02

1

6

1.00

EnvSP3

Avoiding environmental destruction during dredging

4

0.67

0.23

0.56

5

0.67

EnvSP4

Considering environmental protection when handling cargo

4

0.67

0.23

0.56

6

1.00

EnvSP5

Using recyclable or environment-friendly materials in port construction

5

0.83

0.09

0.82

5

0.67

EnvSP6

Protecting the ecological environment in the port area

6

1

0.02

1

6

1.00

EnvSP7

Reduction of noise pollution

5

0.83

0.09

0.82

5

0.67

EnvSP8

Mitigating light influence on neighboring residents

4

0.67

0.23

0.56

4

0.33

EnvSP9

Controlling smoke level

3

0.5

0.31

0.27

6

1.00

EnvSP10

Maintaining air quality

6

1

0.02

1

6

1.00

EnvSP11

Reduction of greenhouse gas

6

1

0.02

1

6

1.00

EnvSP12

Reduction of carbon emissions

6

1

0.02

1

6

1.00

EnvSP13

Preventing odour pollution

3

0.5

0.31

0.27

4

0.33

EnvSP14

Optimal utilization of renewables and alternate energy sources

6

1

0.02

1

6

1.00

EnvSP15

Facilities for wastewater and sewage treatment

6

1

0.02

1

6

1.00

EnvSP16

Implementation of dust suppression systems

6

1

0.02

1

6

1.00

EnvSP17

Cold-ironing source of power for vessels on the berth

4

0.67

0.23

0.56

4

0.33

EcoSP1

Facilitating economic growth and acting as a supply chain link in local and global trade

6

1

0.02

1

6

1.00

EcoSP2

Investments in port infrastructure development

6

1

0.02

1

6

1.00

EcoSP3

Establishing port development funding

2

0.33

0.23

0.13

4

0.33

EcoSP4

Attracting foreign direct investments

2

0.33

0.23

0.13

4

0.33

EcoSP5

Promotion and development of cruise tourism services

6

1

0.02

1

6

1.00

EcoSP6

Employment generation and career growth opportunities

6

1

0.02

1

6

1.00

EcoSP7

Ensuring that cargo is handled safely and effectively

3

0.5

0.31

0.27

6

1.00

EcoSP8

Low damage or loss record for cargo delivery

4

0.67

0.23

0.56

6

1.00

EcoSP9

Usage of energy-efficient electrical and electronic appliances

6

1

0.02

1

6

1.00

EcoSP10

Optimal utilization of infrastructure, land, and space in the port area

6

1

0.02

1

6

1.00

EcoSP11

Offering one-stop logistics solutions, including freight forwarding and additional services

6

1

0.02

1

5

0.67

EcoSP12

Optimizing the routing of vehicles in and out of port

6

1

0.02

1

6

1.00

EcoSP13

Mitigating congestion in the port

6

1

0.02

1

6

1.00

EcoSP14

Providing incentives for green shipping practices

6

1

0.02

1

5

0.67

EcoSP15

Landlord activities

6

1

0.02

1

5

0.67

EcoSP16

Investment in climate change adaptation strategies

6

1

0.02

1

6

1.00

EcoSP17

Sustainable supply chain policy

6

1

0.02

1

6

1.00

EcoSP18

Investment in innovation strategy

6

1

0.02

1

5

0.67

EcoSP19

Transshipment and storage of dangerous goods

6

1

0.02

1

4

0.33

SocSP1

Recognizing the requirements of the neighboring community

6

1

0.02

1

4

0.33

SocSP2

Giving support to community social activities

6

1

0.02

1

4

0.33

SocSP3

Providing training and education for employees regularly

6

1

0.02

1

6

1.00

SocSP4

Providing employees’ welfare benefits and other facilities

6

1

0.02

1

6

1.00

SocSP5

Staff job security even during uncertainties of the business

3

0.5

0.31

0.27

5

0.67

SocSP6

Strengthening safety and security management standards and protocols of the port

6

1

0.02

1

6

1.00

SocSP7

Accident prevention in the port area

6

1

0.02

1

6

1.00

SocSP8

Social equality and gender diversity in employment

6

1

0.02

1

6

1.00

SocSP9

Job satisfaction of employees

6

1

0.02

1

6

1.00

SocSP10

Consulting various interest groups such as labor unions and community leaders when making port project decision

6

1

0.02

1

6

1.00

SocSP11

Strengthening port infrastructure for social contribution

6

1

0.02

1

5

0.67

SocSP12

Engaging in corporate social responsibility practices

6

1

0.02

1

6

1.00

  • i-CVI indicates item CVI, Pc the probability of a chance occurrence, and K—the kappa statistic

Appendix 3. Instrument for data collection

4.1 section a—demographic profile.

figure a

4.2 Section B—practices related to port

Please indicate the extent to which you agree on statements related to your port on a scale of 1–5.

1—strongly disagree, 2—disagree, 3—neutral, 4—agree, 5—strongly agree.

If you are unaware of the Port’s practices, you may choose “3-Neutral.”

Environmental sustainability practices adopted in your port focusses on

1

2

3

4

5

Developing and maintaining mangroves, gardens, and landscapes

     

Protecting the ecological environment in the port area

     

Maintaining air quality

     

Reduction of greenhouse gas

     

Reduction of carbon emissions

     

Optimal utilization of renewables and alternate energy sources

     

Facilities for wastewater and sewage treatment

     

Implementation of dust suppression systems

     

Economic sustainability practices adopted in your port focusses on

1

2

3

4

5

Facilitating economic growth and acting as a supply chain link in local and global trade

     

Investments in port infrastructure development

     

Promotion and development of cruise tourism services

     

Employment generation and career growth opportunities

     

Usage of energy-efficient electrical and electronic appliances like LED lamps

     

Optimal utilization of infrastructure, land, and space in the port area

     

Optimizing the routing of vehicles in and out of port

     

Mitigating congestion in the port

     

Investment in climate change adaptation strategies

     

Sustainable supply chain policy

     

Social sustainability practices adopted in your port focusses on

1

2

3

4

5

Providing training and education for employees regularly

     

Providing employees’ welfare benefits and other facilities

     

Strengthening port safety management standards and protocols

     

Accident prevention in the port area

     

Social equality and gender diversity in employment

     

Job satisfaction of employees

     

Consulting various interests groups such as labor unions and community leaders when making port projects decision

     

Engaging in corporate social responsibility practices

     

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Kishore, L., Pai, Y.P. & Shanbhag, P. Reliability and validity assessment of instrument to measure sustainability practices at shipping ports in India. Discov Sustain 5 , 236 (2024). https://doi.org/10.1007/s43621-024-00395-z

Download citation

Received : 27 January 2024

Accepted : 02 August 2024

Published : 03 September 2024

DOI : https://doi.org/10.1007/s43621-024-00395-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Shipping ports
  • Sustainability practices
  • Measurement instrument
  • Content Validity Index
  • Cohen’s Kappa Index
  • Content validity ratio
  • Confirmatory factor analysis
  • Structural equation model

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

healthcare-logo

Article Menu

what is validity of data in research

  • Subscribe SciFeed
  • Recommended Articles
  • Author Biographies
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Delphi technique on nursing competence studies: a scoping review.

what is validity of data in research

1. Introduction

1.1. the delphi technique, 1.1.1. selection and composition of the expert’s panel, 1.1.2. rounds, 1.1.3. data analysis and consensus, 1.1.4. reliability and validity, 1.1.5. advantages and disadvantages of the delphi technique, 1.2. rationale, context and aim of the scoping review, 2. materials and methods, 2.1. eligibility criteria, 2.2. search strategy, 2.3. study selection, 2.4. data extraction and presentation, 3.1. preparatory procedures, 3.2. access and expert selection procedures, 3.3. acquisition of experts’ inputs, 3.3.1. instrumentation, 3.3.2. first round, 3.3.3. subsequent rounds, 3.3.4. stability of the expert panel, 3.4. data analysis and consensus, 3.5. ethical–legal procedures and guarantees, 4. discussion, limitations, 5. conclusions, supplementary materials, author contributions, institutional review board statement, informed consent statement, conflicts of interest.

  • Le Boterf, G. Ingénierie et Évaluation des Compétences , 5th ed.; Groupe Eyrolles: Paris, France, 2006. [ Google Scholar ]
  • Benner, P. De Iniciado a Perito-Excelência e Poder na Prática Clínica de Enfermagem ; Edição Comemorativa; Quarteto: Coimbra, Portugal, 2005. [ Google Scholar ]
  • Meretoja, R.; Leino-Kilpi, H.; Kaira, M. Comparison of nurse competence in different hospital work environments. J. Nurs. Manag. 2004 , 12 , 329–336. [ Google Scholar ] [ CrossRef ]
  • Dunn, S.; Lawson, D.; Robertson, S.; Underwood, M.; Clark, R.; Valentine, T.; Walker, N.; Wilson-Row, C.; Crowder, K.; Herewane, D. The development of competency standards for specialist critical care nurses. J. Adv. Nurs. 2000 , 31 , 339–346. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Beeckman, D.; Vanderwee, K.; Demarre, L.; Paquay, L.; Van Hecke, A.; Defloor, T. Pressure ulcer prevention: Development and psychometric validation of a knowledge assessment instrument. Int. J. Nurs. Stud. 2010 , 47 , 399–410. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tang, Q.; Zhang, D.; Chen, J.; Liu, M.; Xiang, Y.; Luo, T.; Zhu, L. Tests on a scale for measuring the core competencies of paediatric specialist nurses: An exploratory quantitative study. Nurs. Open 2023 , 10 , 5098–5107. [ Google Scholar ] [ CrossRef ]
  • Tay, C.; Yuh, A.; Lan, E.; Ong, C.; Aloweni, F.; Lopez, V. Development and validation of the incontinence associated dermatitis knowledge, attitude and practice questionnaire. J. Tissue Viability 2020 , 29 , 244–251. [ Google Scholar ] [ CrossRef ]
  • Wheeler, K.; Phillips, K. The Development of Trauma and Resilience Competencies for Nursing Education. J. Am. Psychiatr. Nurses Assoc. 2021 , 27 , 322–333. [ Google Scholar ] [ CrossRef ]
  • Keeney, S.; Hasson, F.; McKenna, H. Consulting the oracle: Ten lessons from using the Delphi technique in nursing research. J. Adv. Nurs. 2006 , 53 , 205–212. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Barrios, M.; Guilera, G.; Nuño, L.; Gómez-Benito, J. Consensus in the delphi method: What makes a decision change? Technol. Forecast. Soc. Chang. 2021 , 163 , 120484. [ Google Scholar ] [ CrossRef ]
  • Foth, T.; Efstathiou, N.; Vanderspank-Wright, B.; Ufholz, L.; Dütthorn, N.; Zimansky, M.; Humphrey-Murto, S. The use of Delphi and Nominal Group Technique in nursing education: A review. Int. J. Nurs. Stud. 2016 , 60 , 112–120. [ Google Scholar ] [ CrossRef ]
  • Avella, J. Delphi Panels: Research Design, Procedures, Advantages, and Challenges. Int. J. Dr. Stud. 2016 , 11 , 305–321. [ Google Scholar ] [ CrossRef ]
  • James, D.; Warren-Forward, H. Research methods for formal consensus development. Nurse Res. 2015 , 22 , 35–40. [ Google Scholar ] [ CrossRef ]
  • Hasson, F.; Keeney, S.; McKenna, H. Research guidelines for the Delphi survey technique. J. Adv. Nurs. 2000 , 32 , 1008–1015. [ Google Scholar ] [ CrossRef ]
  • Fish, L.; Busby, D. The delphi technique. In Research Methods in Family Therapy , 2nd ed.; Sprenkle, D., Piercy, F., Eds.; Guilford: New York, NY, USA, 2005. [ Google Scholar ]
  • Linstone, H.; Turoff, M. The Delphi Method: Techniques and Applications ; Addison-Wesley Publishing Company, Advanced Book Program: New York, NY, USA, 2002. [ Google Scholar ]
  • Beiderbeck, D.; Frevel, N.; von der Gracht, H.A.; Schmidt, S.L.; Schweitzer, V.M. Preparing, conducting, and analyzing Delphi surveys: Cross-disciplinary practices, new directions, and advancements. MethodsX 2021 , 8 , 101401. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Fink-Hafner, D.; Dagen, T.; Doušak, M.; Novak, M.; Hafner-Fink, M. Delphi Method: Strengths and Weaknesses. Adv. Methodol. Stat. 2019 , 16 , 1–19. [ Google Scholar ] [ CrossRef ]
  • Grisham, T. The Delphi technique: A method for testing complex and multifaceted topics. Int. J. Manag. Proj. Bus. 2009 , 2 , 112–130. [ Google Scholar ] [ CrossRef ]
  • Dalkey, N.; Helmer, O. An Experimental Application of the Delphi Method to the Use of Experts. Manag. Sci. 1963 , 9 , 458–467. [ Google Scholar ] [ CrossRef ]
  • Hsu, C.; Sandford, B. The Delphi Tehcnique: Making sense of consensus. Pract. Assess. Res. Eval. 2007 , 12 , 1–8. [ Google Scholar ]
  • Dalkey, N. An experimental study of group opinion: The Delphi method. Futures 1969 , 1 , 408–426. [ Google Scholar ] [ CrossRef ]
  • Dalkey, N. Delphi ; RAND Corporation: Santa Monica, CA, USA, 1967. [ Google Scholar ]
  • Adams, S. Projecting the next decade in safety management: A Delphi technique study. Prof. Saf. 2001 , 46 , 26–29. [ Google Scholar ]
  • Sossa, J.; William, H.; Hernandez-Zarta, R. Delphi method: Analysis of rounds, stakeholder and statistical indicators. Foresight 2019 , 21 , 525–544. [ Google Scholar ] [ CrossRef ]
  • Donohoe, H.; Stellefson, M.; Tennant, B. Advantages and Limitations of the e-Delphi Technique. Am. J. Health Educ. 2012 , 43 , 38–46. [ Google Scholar ] [ CrossRef ]
  • Keeney, S.; Hasson, F.; McKenna, H. The Delphi Technique in Nursing and Health Research ; John Wiley & Sons Ltd.: London, UK, 2011. [ Google Scholar ]
  • Meijering, J.; Tobi, H. The effects of feeding back experts’ own initial ratings in Delphi studies: A randomized trial. Int. J. Forecast. 2018 , 34 , 216–224. [ Google Scholar ] [ CrossRef ]
  • Keeney, S.; Hasson, F.; McKenna, H. A critical review of the Delphi technique as a research methodology for nursing. Int. J. Nurs. Stud. 2001 , 38 , 195–200. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • McKenna, H. The Delphi technique: A worthwhile research approach for nursing? J. Adv. Nurs. 1994 , 19 , 1221–1225. [ Google Scholar ] [ CrossRef ]
  • Thangaratinam, S.; Redman, C. The Delphi technique. Obstet. Gynaecol. 2005 , 7 , 120–125. [ Google Scholar ] [ CrossRef ]
  • Meyrick, J. The Delphi method and health research. Health Educ. 2003 , 103 , 7–16. [ Google Scholar ] [ CrossRef ]
  • Hasson, F.; Keeney, S. Enhancing rigour in the Delphi technique research. Technol. Forecast. Soc. Chang. 2011 , 78 , 1695–1704. [ Google Scholar ] [ CrossRef ]
  • Custer, R.; Scarcella, J.; Stewart, B. The Modified Delphi Technique-A Rotational Modification. J. Edu. Voc. Stud. 1999 , 15 . [ Google Scholar ] [ CrossRef ]
  • Mauksch, S.; von der Gracht, H.; Gordon, T. Who is an expert for foresight? A review of identification methods. Technol. Forecast. Soc. Chang. 2020 , 154 , 119982. [ Google Scholar ] [ CrossRef ]
  • Humphrey-Murto, S.; Varpio, L.; Wood, T.; Gonsalves, C.; Ufholz, L.; Mascioli, K.; Wang, C.; Foth, T. The Use of the Delphi and Other Consensus Group Methods in Medical Education Research: A Review. Acad. Med. 2017 , 92 , 1491–1498. [ Google Scholar ] [ CrossRef ]
  • Förster, B.; von der Gracht, H. Assessing Delphi panel composition for strategic foresight—A comparison of panels based on company-internal and external participants. Technol. Forecast. Soc. Chang. 2014 , 84 , 215–229. [ Google Scholar ] [ CrossRef ]
  • Boulkedid, R.; Abdoul, H.; Loustau, M.; Sibony, O.; Alberti, C. Using and reporting the Delphi method for selecting healthcare quality indicators: A systematic review. PLoS ONE 2011 , 6 , e20476. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Giannarou, L.; Zervas, E. Using Delphi technique to build consensus in practice. Int. J. Appl. Manag. Sci. 2014 , 9 , 66–82. [ Google Scholar ]
  • Lau, P.; Ryan, S.; Abbott, P.; Tannous, K.; Trankle, S.; Peters, K.; Page, A.; Cochrane, N.; Usherwood, T.; Reath, J. Protocol for a Delphi consensus study to select indicators of high-quality general practice to achieve Quality Equity and Systems Transformation in Primary Health Care (QUEST-PHC) in Australia. PLoS ONE 2022 , 17 , e0268096. [ Google Scholar ] [ CrossRef ]
  • Naisola-Ruiter, V. The Delphi technique: A tutorial. Hosp. Res. J. 2022 , 12 , 91–97. [ Google Scholar ] [ CrossRef ]
  • Massaroli, A.; Martini, J.; Lino, M.; Spenassato, D.; Massaroli, R. Método Delphi como Referencial Metodológico para a Pesquisa em Enfermagem. Texto Contexto Enferm. 2017 , 26 , e1110017. [ Google Scholar ] [ CrossRef ]
  • Winkler, J.; Moser, R. Biases in future-oriented Delphi studies: A cognitive perspective. Technol. Forecast. Soc. Chang. 2016 , 105 , 63–76. [ Google Scholar ] [ CrossRef ]
  • Marques, J.; Freitas, D. Método DELPHI: Caracterização e potencialidades na pesquisa em Educação. Pro-Posições 2018 , 29 , 389–415. [ Google Scholar ] [ CrossRef ]
  • Birko, S.; Dove, E.; Özdemir, V. Evaluation of Nine Consensus Indices in Delphi Foresight Research and Their Dependency on Delphi Survey Characteristics: A Simulation Study and Debate on Delphi Design and Interpretation. PLoS ONE 2015 , 10 , e0135162. [ Google Scholar ] [ CrossRef ]
  • Meijering, J.; Kampen, J.; Tobi, H. Quantifying the development of agreement among experts in Delphi studies. Technol. Forecast. Soc. Chang. 2013 , 80 , 1607–1614. [ Google Scholar ] [ CrossRef ]
  • von der Gracht, H. Consensus measurement in Delphi studies: Review and implications for future quality assurance. Technol. Forecast. Soc. Chang. 2012 , 79 , 1525–1536. [ Google Scholar ] [ CrossRef ]
  • Diamond, I.; Grant, R.; Feldman, B.; Pencharz, P.; Ling, S.; Moore, A.; Wales, P. Defining consensus: A systematic review recommends methodologic criteria for reporting of Delphi studies. J. Clin. Epidemiol. 2014 , 67 , 401–409. [ Google Scholar ] [ CrossRef ]
  • Collins, D. Pretesting survey instruments: An overview of cognitive methods. Qual. Life Res. 2003 , 12 , 229–238. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Aromataris, E.; Munn, Z. (Eds.) JBI Manual for Evidence Synthesis ; JBI: Adelaide, Australia, 2020; Available online: www.synthesismanual.jbi.global (accessed on 12 March 2023).
  • Munn, Z.; Peters, M.; Stern, C.; Tufanaru, C.; McArthur, A.; Aromataris, E. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med. Res. Methodol. 2018 , 18 , 143. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Peters, M.; Godfrey, C.; McInerney, P.; Munn, Z.; Tricco, C.; Khalil, H. Chapter 11: Scoping Reviews (2020 version). In JBI Manual for Evidence Synthesis ; Aromataris, E., Munn, Z., Eds.; JBI: Adelaide, Australia, 2020; Available online: www.synthesismanual.jbi.global (accessed on 12 March 2023).
  • Page, M.; McKenzie, J.; Bossuyt, P.; Boutron, I.; Hoffmann, T.; Mulrow, C.; Shamseer, L.; Tetzlaff, J.; Akl, E.; Brennan, S.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021 , 372 , n71. [ Google Scholar ] [ CrossRef ]
  • Tricco, A.; Lillie, E.; Zarin, W.; O’Brien, K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.; Horsley, T.; Weeks, L.; et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 2018 , 169 , 467–473. [ Google Scholar ] [ CrossRef ]
  • Furtado, L. Advancing the Delphi Technique: A Critical Review of Literature on Nursing Competence Studies. Available online: https://archive.org/details/osf-registrations-kp2vw-v1 (accessed on 9 December 2023).
  • Peters, M.; Marnie, C.; Colquhoun, H.; Garritty, C.; Hempel, S.; Horsley, T.; Langlois, E.; Lillie, E.; O’Brien, K.; Tunçalp, Ӧ.; et al. Scoping reviews: Reinforcing and advancing the methodology and application. Syst. Rev. 2021 , 10 , 263. [ Google Scholar ] [ CrossRef ]
  • Tracy, M.; O’Grady, E. Hamric and Hanson’s Advanced Practice Nursing: An Integrative Approach ; Elsevier: Amsterdam, The Netherlands, 2019. [ Google Scholar ]
  • Levac, D.; Colquhoun, H.; O’Brien, K. Scoping studies: Advancing the methodology. Implement. Sci. 2010 , 5 , 69. [ Google Scholar ] [ CrossRef ]
  • Beauvais, A.; Phillips, K. Incorporating Future of Nursing Competencies Into a Clinical and Simulation Assessment Tool: Validating the Clinical Simulation Competency Assessment Tool. Nurs. Educ. Perspect. 2020 , 41 , 280–284. [ Google Scholar ] [ CrossRef ]
  • He, H.; Zhou, T.; Zeng, D.; Ma, Y. Development of the competency assessment scale for clinical nursing teachers: Results of a Delphi study and validation. Nurse Educ. Today 2021 , 101 , 104876. [ Google Scholar ] [ CrossRef ]
  • Janssens, I.; Van Hauwe, M.; Ceulemans, M.; Allegaert, K. Development and Pilot Use of a Questionnaire to Assess the Knowledge of Midwives and Pediatric Nurses on Maternal Use of Analgesics during Lactation. Int. J. Environ. Res. Public Health 2021 , 18 , 11555. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhang, J.; Zhou, X.; Wang, H.; Luo, Y.; Li, W. Development and Validation of the Humanistic Practice Ability of Nursing Scale. Asian Nurs. Res. (Korean Soc. Nurs. Sci.) 2021 , 15 , 105–112. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Penataro-Pintado, E.; Rodriguez-Higueras, E.; Llaurado-Serra, M.; Gomez-Delgado, N.; Llorens-Ortega, R.; Diaz-Agea, J. Development and Validation of a Questionnaire of the Perioperative Nursing Competencies in Patient Safety. Int. J. Environ. Res. Public Health 2022 , 19 , 2584. [ Google Scholar ] [ CrossRef ]
  • Wang, S.; Tong, J.; Wang, Y.; Zhang, D. A Study on Nurse Manager Competency Model of Tertiary General Hospitals in China. Int. J. Environ. Res. Public Health 2022 , 19 , 8513. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bing-Jonsson, P.; Bjork, I.; Hofoss, D.; Kirkevold, M.; Foss, C. Competence in advanced older people nursing: Development of ‘Nursing older people-Competence evaluation tool’. Int. J. Older People Nurs. 2015 , 10 , 59–72. [ Google Scholar ] [ CrossRef ]
  • Fan, L.; Gui, L.; Xi, S.; Qiao, A. Core competence evaluation standards for emergency nurse specialist: Developing and testing psychometric properties. Int. J. Nurs. Sci. 2016 , 3 , 274–280. [ Google Scholar ] [ CrossRef ]
  • Zheng, Y.; Shi, X.; Jiang, S.; Li, Z.; Zhang, X. Evaluation of core competencies of nurses by novel holistic assessment system. Biomed. Res. J. 2017 , 28 , 3259–3265. [ Google Scholar ]
  • Chen, H.; Pu, L.; Chen, Q.; Xu, X.; Bai, C.; Hu, X. Instrument Development for Evaluation of Gerontological Nurse Specialists Core Competencies in China. Clin. Nurse Spec. 2019 , 33 , 217–227. [ Google Scholar ] [ CrossRef ]
  • Holanda, F.; Marra, C.; Cunha, I. Professional competence of nurses in emergency services: Evidence of content validity. Rev. Bras. Enferm. 2019 , 72 , 66–73. [ Google Scholar ] [ CrossRef ]
  • Licen, S.; Plazar, N. Developing a Universal Nursing Competencies Framework for Registered Nurses: A Mixed-Methods Approach. J. Nurs. Scholarsh. 2019 , 51 , 459–469. [ Google Scholar ] [ CrossRef ]
  • Mei, N.; Chang, L.; Zhu, Z.; Dong, M.; Zhang, M.; Zeng, L. Core competency scale for operating room nurses in China: Scale development, reliability and validity evaluation. Nurs. Open 2022 , 9 , 2814–2825. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lakanmaa, R.; Suominen, T.; Perttilä, J.; Puukka, P.; Leino-Kilpi, H. Competence requirements in intensive and critical care nursing–Still in need of definition? A Delphi study. Intensive Crit. Care Nurs. 2012 , 28 , 329–336. [ Google Scholar ] [ CrossRef ]
  • Liu, L.; Curtis, J.; Crookes, P. Identifying essential infection control competencies for newly graduated nurses: A three-phase study in Australia and Taiwan. J. Hosp. Infect. 2014 , 86 , 100–109. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Van Hecke, A.; Goeman, C.; Beeckman, D.; Heinen, M.; Defloor, T. Development and psychometric evaluation of an instrument to assess venous leg ulcer lifestyle knowledge among nurses. J. Adv. Nurs. 2011 , 67 , 2574–2585. [ Google Scholar ] [ CrossRef ]
  • Hoyt, K.; Coyne, E.; Ramirez, E.; Peard, A.; Gisness, C.; Gacki-Smith, J. Nurse Practitioner Delphi Study: Competencies for practice in emergency care. J. Emerg. Nurs. 2010 , 36 , 439–449. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chang, A.; Gardner, G.; Duffield, C.; Ramis, M. A Delphi study to validate an Advanced Practice Nursing tool. J. Adv. Nurs. 2010 , 66 , 2320–2330. [ Google Scholar ] [ CrossRef ]
  • Jirwe, M.; Gerrish, K.; Keeney, S.; Emami, A. Identifying the core components of cultural competence: Findings from a Delphi study. J. Clin. Nurs. 2009 , 18 , 2622–2634. [ Google Scholar ] [ CrossRef ]
  • Irvine, F. Exploring district nursing competencies in health promotion: The use of the Delphi technique. J. Clin. Nurs. 2005 , 14 , 965–975. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hardy, D.; O’Brien, A.; Gaskin, C.; O’Brien, A.; Morrison-Ngatai, E.; Skews, G.; Ryan, T.; McNulty, N. Practical application of the Delphi technique in a bicultural mental health nursing study in New Zealand. J. Adv. Nurs. 2004 , 46 , 95–109. [ Google Scholar ] [ CrossRef ]
  • European Parliament and of the Council. General Data Protection Regulation-Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016. Available online: https://gdpr-info.eu (accessed on 20 February 2024).
  • Cornock, M. General Data Protection Regulation (GDPR) and implications for research. Maturitas 2018 , 111 , A1–A2. [ Google Scholar ] [ CrossRef ]
  • Centers for Disease Control and Prevention. Health Insurance Portability and Accountability Act of 1996. Available online: https://www.cdc.gov/phlp/php/resources/health-insurance-portability-and-accountability-act-of-1996-hipaa.html (accessed on 20 February 2024).
  • Niederberger, M.; Spranger, J. Delphi Technique in Health Sciences: A Map. Front. Public Health 2020 , 8 , 457. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Fitch, K.; Bernstein, S.; Aguilar, M.; Burnand, B.; LaCalle, J.; Lazaro, P.; van het Loo, M.; McDonnell, J.; Vader, J.; Kahan, J. The RAND/UCLA Appropriateness Method User’s Manual ; RAND Corporation: Santa Monica, CA, USA, 2001. [ Google Scholar ]
  • Greatorex, J.; Dexter, T. An accessible analytical approach for investigating what happens between the rounds of a Delphi study. J. Adv. Nurs. 2000 , 32 , 1016–1024. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Casali, P.; Vyas, M. Data protection and research in the European Union: A major step forward, with a step back. Ann. Oncol. 2021 , 32 , 15–19. [ Google Scholar ] [ CrossRef ]
  • Farah, M.; Helou, S.; Tufenkji, P.; El Helou, E. Data Protection in Healthcare Research: Medical Students’ Knowledge and Behavior. Stud. Health Technol. Inform. 2022 , 295 , 104–107. [ Google Scholar ] [ PubMed ]
  • Gattrell, W.; Logullo, P.; van Zuuren, E.; Price, A.; Hughes, E.; Blazey, P.; Winchester, C.; Tovey, D.; Goldman, K.; Hungin, A.; et al. ACCORD (ACcurate COnsensus Reporting Document): A reporting guideline for consensus methods in biomedicine developed via a modified Delphi. PLoS Med. 2024 , 21 , e1004326. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Search No.Search Terms and ExpressionsResults
S1MM “Delphi Technique” OR TI “delphi” OR AB “delphi” OR TI “delphi technique” OR AB “delphi technique” OR TI “delphi survey” OR AB “delphi survey” OR TI “delphi consensus” OR AB “delphi consensus” OR TI “delphi study” OR AB “delphi study” TI “delphi method” OR AB “delphi method” OR TI “expert consensus method” OR AB “expert consensus method” OR TI “modified nominal group technique” OR AB “modified nominal group technique” OR TI “forecasting method” OR AB “forecasting method” OR TI “decision-making method” OR AB “decision-making”185,322
S2TI “assessment scale” OR AB “assessment scale” OR TI “evaluation scale” OR AB “evaluation scale” OR TI “assessment instrument development” OR AB “assessment instrument development” OR TI “evaluation tool” OR AB “evaluation tool” OR TI “scale development” OR AB “scale development” OR TI “factor analysis” OR AB “factor analysis” OR TI “instrument design” OR AB “instrument design” OR TI “instrument development” OR AB “instrument development” OR TI “instrument validation” OR AB “instrument validation” OR TI “item analysis” OR AB “item analysis” OR TI “psychometric instrument development” OR AB “psychometric instrument development” OR TI “psychometric testing” OR AB “psychometric testing” OR TI “questionnaire development” OR AB “questionnaire development” OR TI “reliability testing” OR AB “reliability testing” OR TI “survey development” OR AB “survey development” OR TI “validation studies” OR AB “validation studies”85,186
S3MM “Professional Competence” OR TI “professional competence” OR AB “professional competence” OR TI “competenc*” OR AB “competenc*” OR TI “knowledge” OR AB “knowledge” OR TI “proficiency” OR AB “proficiency” OR TI “expertise” OR AB “expertise” OR TI “capability” OR AB “capability” OR TI “ability” OR AB “ability” OR TI “skill*” OR AB “skill*”2,309,012
S4MM “Nursing” OR TI “nurs*” OR AB “nurs*” OR TI “nursing practice” OR AB “nursing practice” OR TI “nursing research” OR AB “nursing research” OR TI “nursing education” OR AB “nursing education” OR TI “nursing management” OR AB “nursing management” OR TI “nursing care” OR AB “nursing care” OR TI “nursing interventions” OR AB “nursing interventions”526,118
S5S1 AND S2 AND S3 AND S4136
Steps and ProceduresMethodological OptionsStudy
Preparatory procedures [ , , , , , , , , , ]
[ , , , , , , ]
[ , , , ]
[ , , , ]
Expert access procedures [ , , ]
[ , , ]
[ , ]
[ , , ]
[ , ]
[ ]
Call for expert participation procedures [ , , , , , , ]
[ ]
[ ]
Expert selection procedures [ , , ]
[ , , , , , , , , , , , , , , ]
[ , , ]
[ , , , , , , , , , , , ]
[ , , ]
Instrumentation [ , , , , , , , ]
[ , ]
[ , , , ]
Data analysis [ , , , ]
[ , , , , ]
[ , , , , , , , , , , , ]
[ , , , , ]
[ , , , , ]
[ ]
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Furtado, L.; Coelho, F.; Pina, S.; Ganito, C.; Araújo, B.; Ferrito, C. Delphi Technique on Nursing Competence Studies: A Scoping Review. Healthcare 2024 , 12 , 1757. https://doi.org/10.3390/healthcare12171757

Furtado L, Coelho F, Pina S, Ganito C, Araújo B, Ferrito C. Delphi Technique on Nursing Competence Studies: A Scoping Review. Healthcare . 2024; 12(17):1757. https://doi.org/10.3390/healthcare12171757

Furtado, Luís, Fábio Coelho, Sara Pina, Cátia Ganito, Beatriz Araújo, and Cândida Ferrito. 2024. "Delphi Technique on Nursing Competence Studies: A Scoping Review" Healthcare 12, no. 17: 1757. https://doi.org/10.3390/healthcare12171757

Article Metrics

Article access statistics, supplementary material.

ZIP-Document (ZIP, 315 KiB)

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

COMMENTS

  1. Reliability vs. Validity in Research

    Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world. High reliability is one indicator that a measurement is valid.

  2. The 4 Types of Validity in Research

    Construct validity. Construct validity evaluates whether a measurement tool really represents the thing we are interested in measuring. It's central to establishing the overall validity of a method. What is a construct? A construct refers to a concept or characteristic that can't be directly observed, but can be measured by observing other indicators that are associated with it.

  3. Validity

    Examples of Validity. Internal Validity: A randomized controlled trial (RCT) where the random assignment of participants helps eliminate biases. External Validity: A study on educational interventions that can be applied to different schools across various regions. Construct Validity: A psychological test that accurately measures depression levels.

  4. Reliability and Validity

    Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid. Example: If you weigh yourself on a ...

  5. What is Validity in Research?

    Validity is an important concept in establishing qualitative research rigor. At its core, validity in research speaks to the degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure or understand. It's about ensuring that the study investigates what it purports to investigate.

  6. Validity, reliability, and generalizability in qualitative research

    Validity. Validity in qualitative research means "appropriateness" of the tools, processes, and data. Whether the research question is valid for the desired outcome, the choice of methodology is appropriate for answering the research question, the design is valid for the methodology, the sampling and data analysis is appropriate, and ...

  7. Validity in Research: A Guide to Better Results

    Validity in research is the ability to conduct an accurate study with the right tools and conditions to yield acceptable and reliable data that can be reproduced. Researchers rely on carefully calibrated tools for precise measurements. However, collecting accurate information can be more of a challenge. Studies must be conducted in environments ...

  8. Reliability vs Validity in Research

    Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world. High reliability is one indicator that a measurement is valid.

  9. Validity in Research and Psychology: Types & Examples

    In this vein, there are many different types of validity and ways of thinking about it. Let's take a look at several of the more common types. Each kind is a line of evidence that can help support or refute a test's overall validity. In this post, learn about face, content, criterion, discriminant, concurrent, predictive, and construct ...

  10. Validity & Reliability In Research

    As with validity, reliability is an attribute of a measurement instrument - for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the "thing" it's supposed to be measuring, reliability is concerned with consistency and stability.

  11. Validity In Psychology Research: Types & Examples

    In psychology research, validity refers to the extent to which a test or measurement tool accurately measures what it's intended to measure. It ensures that the research findings are genuine and not due to extraneous factors. Validity can be categorized into different types, including construct validity (measuring the intended abstract trait), internal validity (ensuring causal conclusions ...

  12. The 4 Types of Validity

    Face validity. Face validity considers how suitable the content of a test seems to be on the surface. It's similar to content validity, but face validity is a more informal and subjective assessment. Example: Face validity. You create a survey to measure the regularity of people's dietary habits. You review the survey items, which ask ...

  13. Reliability and validity: Importance in Medical Research

    Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtained and the degree to which any measuring tool ...

  14. Reliability vs Validity: Differences & Examples

    Typically, researchers need to collect data using an instrument and evaluate the quality of the measurements. In other words, they conduct an assessment before the primary research to assess reliability and validity. For data to be good enough to allow you to draw meaningful conclusions from a research study, they must be reliable and valid.

  15. Validity vs. Reliability

    For a study to be robust, it must achieve both reliability and validity. Reliability ensures the study's findings are reproducible while validity confirms that it accurately represents the phenomena it claims to. Ensuring both in a study means the results are both dependable and accurate, forming a cornerstone for high-quality research.

  16. Validity and reliability in quantitative studies

    Validity. Validity is defined as the extent to which a concept is accurately measured in a quantitative study. For example, a survey designed to explore depression but which actually measures anxiety would not be considered valid. The second measure of quality in a quantitative study is reliability, or the accuracy of an instrument.In other words, the extent to which a research instrument ...

  17. Validity

    Research validity in surveys relates to the extent at which the survey measures right elements that need to be measured. In simple terms, validity refers to how well an instrument as measures what it is intended to measure. Reliability alone is not enough, measures need to be reliable, as well as, valid. For example, if a weight measuring scale ...

  18. Validity in Qualitative Evaluation: Linking Purposes, Paradigms, and

    Whether it concerns member checks, keeping an audit trail, or thick description of the data, respecting validity criteria for qualitative research is easier said than done (causing some researchers to present a "procedural charade" in their reports, see Whittemore, Chase, & Mandle, 2001). In the realm of policy and program evaluation, in ...

  19. Internal and external validity: can you apply research study results to

    The validity of a research study includes two domains: internal and external validity. Internal validity is defined as the extent to which the observed results represent the truth in the population we are studying and, thus, are not due to methodological errors. In our example, if the authors can support that the study has internal validity ...

  20. Why is data validation important in research?

    Importance of data validation. Data validation is important for several aspects of a well-conducted study: To ensure a robust dataset: The primary aim of data validation is to ensure an error-free dataset for further analysis. This is especially important if you or other researchers plan to use the dataset for future studies or to train machine ...

  21. Research Data Management: Validate Data

    Data validation is important for ensuring regular monitoring of your data and assuring all stakeholders that your data is of a high quality that reliably meets research integrity standards — and also a crucial aspect of Yale's Research Data and Materials Policy, which states "The University deems appropriate stewardship of research data as fundamental to both high-quality research and ...

  22. (PDF) Validity and Reliability in Quantitative Research

    The validity and reliability of the scales used in research are important factors that enable the research to yield healthy results. For this reason, it is useful to understand how the reliability ...

  23. Understanding Survey Validity and Reliability

    A valid survey accurately reflects the reality it aims to capture, providing trustworthy data that can successfully inform decisions. Without validity, a study's results can be misleading, which results in incorrect conclusions and potentially costly mistakes. Survey validity is crucial because it ensures the accuracy of the data collected.

  24. 6 Strategies to Enhance Validity in Qualitative Research

    In quantitative research, validity and reliability are quite straightforward terms. So reliability refers to replicability and consistency of certain measurements and validity to whether this measurement is measuring what it's supposed to measure. ... and its possible influence on the data, on what the participants say, and so on and so forth ...

  25. Invited commentary: is the polysocial score approach valuable for

    Subsequently, a growing body of research has investigated the predictive validity of the polysocial score for a range of health outcomes across diverse study populations. Using data on over 160 000 adults from the 2013-2017 National Health Interview Survey, Javed et al 2 selected 7 SDOH from 38 factors to construct a polysocial score for ...

  26. On the validity of measuring change over time in routine clinical

    On the validity of measuring change over time in routine clinical assessment: a close examination of item-level response shifts in psychosomatic inpatients ... Study design and setting: Complete pretest and posttest data were available from 1188 patients who had filled out the ICD-10 Symptom Rating (ISR) scale at admission and discharge, on ...

  27. Comprehensive Guide to Quantitative Research Methods in Education

    Conclusion validity is the degree to which conclusions drawn about relationships in the data are reasonable. External validity concerns the process of generalizing, or the degree to which the conclusions in your study would hold for other persons in other places and at other times.

  28. Reliability and validity assessment of instrument to measure

    Sustainability has emerged as one of the most critical factors influencing the competitiveness of maritime shipping ports. This emergence has led to a surge in research publications on port sustainability-related topics. However, despite the increasing awareness and adoption of sustainability practices, documented literature on empirical studies with survey and interview data is very limited ...

  29. Delphi Technique on Nursing Competence Studies: A Scoping Review

    This scoping review was conducted under the Joanna Briggs Institute (JBI) framework. It included primary studies published until 30 April 2023, obtained through a systematic search across PubMed, Web of Science, CINAHL, and MEDLINE databases. The review focused on primary studies that used the Delphi technique in nursing competence research, especially those related to defining core competency ...

  30. SEC Charges Creative Legal Fundings CEO Maria Dickerson with Operating

    The Securities and Exchange Commission today charged Maria Dulce Pino Dickerson and her companies Creative Legal Fundings in CA and The Ubiquity Group LLC with raising approximately $7 million from more than 130 investors through a fraudulent securities offering targeting members of the Filipino-American community across the United States.