The Classroom | Empowering Students in Their College Journey

The Relationship Between Scientific Method & Critical Thinking

Scott Neuffer

What Is the Function of the Hypothesis?

Critical thinking, that is the mind’s ability to analyze claims about the world, is the intellectual basis of the scientific method. The scientific method can be viewed as an extensive, structured mode of critical thinking that involves hypothesis, experimentation and conclusion.

Critical Thinking

Broadly speaking, critical thinking is any analytical thought aimed at determining the validity of a specific claim. It can be as simple as a nine-year-old questioning a parent’s claim that Santa Claus exists, or as complex as physicists questioning the relativity of space and time. Critical thinking is the point when the mind turns in opposition to an accepted truth and begins analyzing its underlying premises. As American philosopher John Dewey said, it is the “active, persistent and careful consideration of a belief or supposed form of knowledge in light of the grounds that support it, and the further conclusions to which it tends.”

Critical thinking initiates the act of hypothesis. In the scientific method, the hypothesis is the initial supposition, or theoretical claim about the world, based on questions and observations. If critical thinking asks the question, then the hypothesis is the best attempt at the time to answer the question using observable phenomenon. For example, an astrophysicist may question existing theories of black holes based on his own observation. He may posit a contrary hypothesis, arguing black holes actually produce white light. It is not a final conclusion, however, as the scientific method requires specific forms of verification.

Experimentation

The scientific method uses formal experimentation to analyze any hypothesis. The rigorous and specific methodology of experimentation is designed to gather unbiased empirical evidence that either supports or contradicts a given claim. Controlled variables are used to provide an objective basis of comparison. For example, researchers studying the effects of a certain drug may provide half the test population with a placebo pill and the other half with the real drug. The effects of the real drug can then be assessed relative to the control group.

In the scientific method, conclusions are drawn only after tested, verifiable evidence supports them. Even then, conclusions are subject to peer review and often retested before general consensus is reached. Thus, what begins as an act of critical thinking becomes, in the scientific method, a complex process of testing the validity of a claim. English philosopher Francis Bacon put it this way: “If a man will begin with certainties, he shall end in doubts; but if he will be content to begin with doubts, he shall end in certainties.”

Related Articles

According to the Constitution, What Power Is Denied to the Judicial Branch?

According to the Constitution, What Power Is Denied to the Judicial ...

How to Evaluate Statistical Analysis

How to Evaluate Statistical Analysis

The Disadvantages of Qualitative & Quantitative Research

The Disadvantages of Qualitative & Quantitative Research

Qualitative and Quantitative Research Methods

Qualitative and Quantitative Research Methods

What Is Experimental Research Design?

What Is Experimental Research Design?

The Parts of an Argument

The Parts of an Argument

What Is a Confirmed Hypothesis?

What Is a Confirmed Hypothesis?

The Formula for T Scores

The Formula for T Scores

  • How We Think: John Dewey
  • The Advancement of Learning: Francis Bacon

Scott Neuffer is an award-winning journalist and writer who lives in Nevada. He holds a bachelor's degree in English and spent five years as an education and business reporter for Sierra Nevada Media Group. His first collection of short stories, "Scars of the New Order," was published in 2014.

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Scientific Method

Science is an enormously successful human enterprise. The study of scientific method is the attempt to discern the activities by which that success is achieved. Among the activities often identified as characteristic of science are systematic observation and experimentation, inductive and deductive reasoning, and the formation and testing of hypotheses and theories. How these are carried out in detail can vary greatly, but characteristics like these have been looked to as a way of demarcating scientific activity from non-science, where only enterprises which employ some canonical form of scientific method or methods should be considered science (see also the entry on science and pseudo-science ). Others have questioned whether there is anything like a fixed toolkit of methods which is common across science and only science. Some reject privileging one view of method as part of rejecting broader views about the nature of science, such as naturalism (Dupré 2004); some reject any restriction in principle (pluralism).

Scientific method should be distinguished from the aims and products of science, such as knowledge, predictions, or control. Methods are the means by which those goals are achieved. Scientific method should also be distinguished from meta-methodology, which includes the values and justifications behind a particular characterization of scientific method (i.e., a methodology) — values such as objectivity, reproducibility, simplicity, or past successes. Methodological rules are proposed to govern method and it is a meta-methodological question whether methods obeying those rules satisfy given values. Finally, method is distinct, to some degree, from the detailed and contextual practices through which methods are implemented. The latter might range over: specific laboratory techniques; mathematical formalisms or other specialized languages used in descriptions and reasoning; technological or other material means; ways of communicating and sharing results, whether with other scientists or with the public at large; or the conventions, habits, enforced customs, and institutional controls over how and what science is carried out.

While it is important to recognize these distinctions, their boundaries are fuzzy. Hence, accounts of method cannot be entirely divorced from their methodological and meta-methodological motivations or justifications, Moreover, each aspect plays a crucial role in identifying methods. Disputes about method have therefore played out at the detail, rule, and meta-rule levels. Changes in beliefs about the certainty or fallibility of scientific knowledge, for instance (which is a meta-methodological consideration of what we can hope for methods to deliver), have meant different emphases on deductive and inductive reasoning, or on the relative importance attached to reasoning over observation (i.e., differences over particular methods.) Beliefs about the role of science in society will affect the place one gives to values in scientific method.

The issue which has shaped debates over scientific method the most in the last half century is the question of how pluralist do we need to be about method? Unificationists continue to hold out for one method essential to science; nihilism is a form of radical pluralism, which considers the effectiveness of any methodological prescription to be so context sensitive as to render it not explanatory on its own. Some middle degree of pluralism regarding the methods embodied in scientific practice seems appropriate. But the details of scientific practice vary with time and place, from institution to institution, across scientists and their subjects of investigation. How significant are the variations for understanding science and its success? How much can method be abstracted from practice? This entry describes some of the attempts to characterize scientific method or methods, as well as arguments for a more context-sensitive approach to methods embedded in actual scientific practices.

1. Overview and organizing themes

2. historical review: aristotle to mill, 3.1 logical constructionism and operationalism, 3.2. h-d as a logic of confirmation, 3.3. popper and falsificationism, 3.4 meta-methodology and the end of method, 4. statistical methods for hypothesis testing, 5.1 creative and exploratory practices.

  • 5.2 Computer methods and the ‘new ways’ of doing science

6.1 “The scientific method” in science education and as seen by scientists

6.2 privileged methods and ‘gold standards’, 6.3 scientific method in the court room, 6.4 deviating practices, 7. conclusion, other internet resources, related entries.

This entry could have been given the title Scientific Methods and gone on to fill volumes, or it could have been extremely short, consisting of a brief summary rejection of the idea that there is any such thing as a unique Scientific Method at all. Both unhappy prospects are due to the fact that scientific activity varies so much across disciplines, times, places, and scientists that any account which manages to unify it all will either consist of overwhelming descriptive detail, or trivial generalizations.

The choice of scope for the present entry is more optimistic, taking a cue from the recent movement in philosophy of science toward a greater attention to practice: to what scientists actually do. This “turn to practice” can be seen as the latest form of studies of methods in science, insofar as it represents an attempt at understanding scientific activity, but through accounts that are neither meant to be universal and unified, nor singular and narrowly descriptive. To some extent, different scientists at different times and places can be said to be using the same method even though, in practice, the details are different.

Whether the context in which methods are carried out is relevant, or to what extent, will depend largely on what one takes the aims of science to be and what one’s own aims are. For most of the history of scientific methodology the assumption has been that the most important output of science is knowledge and so the aim of methodology should be to discover those methods by which scientific knowledge is generated.

Science was seen to embody the most successful form of reasoning (but which form?) to the most certain knowledge claims (but how certain?) on the basis of systematically collected evidence (but what counts as evidence, and should the evidence of the senses take precedence, or rational insight?) Section 2 surveys some of the history, pointing to two major themes. One theme is seeking the right balance between observation and reasoning (and the attendant forms of reasoning which employ them); the other is how certain scientific knowledge is or can be.

Section 3 turns to 20 th century debates on scientific method. In the second half of the 20 th century the epistemic privilege of science faced several challenges and many philosophers of science abandoned the reconstruction of the logic of scientific method. Views changed significantly regarding which functions of science ought to be captured and why. For some, the success of science was better identified with social or cultural features. Historical and sociological turns in the philosophy of science were made, with a demand that greater attention be paid to the non-epistemic aspects of science, such as sociological, institutional, material, and political factors. Even outside of those movements there was an increased specialization in the philosophy of science, with more and more focus on specific fields within science. The combined upshot was very few philosophers arguing any longer for a grand unified methodology of science. Sections 3 and 4 surveys the main positions on scientific method in 20 th century philosophy of science, focusing on where they differ in their preference for confirmation or falsification or for waiving the idea of a special scientific method altogether.

In recent decades, attention has primarily been paid to scientific activities traditionally falling under the rubric of method, such as experimental design and general laboratory practice, the use of statistics, the construction and use of models and diagrams, interdisciplinary collaboration, and science communication. Sections 4–6 attempt to construct a map of the current domains of the study of methods in science.

As these sections illustrate, the question of method is still central to the discourse about science. Scientific method remains a topic for education, for science policy, and for scientists. It arises in the public domain where the demarcation or status of science is at issue. Some philosophers have recently returned, therefore, to the question of what it is that makes science a unique cultural product. This entry will close with some of these recent attempts at discerning and encapsulating the activities by which scientific knowledge is achieved.

Attempting a history of scientific method compounds the vast scope of the topic. This section briefly surveys the background to modern methodological debates. What can be called the classical view goes back to antiquity, and represents a point of departure for later divergences. [ 1 ]

We begin with a point made by Laudan (1968) in his historical survey of scientific method:

Perhaps the most serious inhibition to the emergence of the history of theories of scientific method as a respectable area of study has been the tendency to conflate it with the general history of epistemology, thereby assuming that the narrative categories and classificatory pigeon-holes applied to the latter are also basic to the former. (1968: 5)

To see knowledge about the natural world as falling under knowledge more generally is an understandable conflation. Histories of theories of method would naturally employ the same narrative categories and classificatory pigeon holes. An important theme of the history of epistemology, for example, is the unification of knowledge, a theme reflected in the question of the unification of method in science. Those who have identified differences in kinds of knowledge have often likewise identified different methods for achieving that kind of knowledge (see the entry on the unity of science ).

Different views on what is known, how it is known, and what can be known are connected. Plato distinguished the realms of things into the visible and the intelligible ( The Republic , 510a, in Cooper 1997). Only the latter, the Forms, could be objects of knowledge. The intelligible truths could be known with the certainty of geometry and deductive reasoning. What could be observed of the material world, however, was by definition imperfect and deceptive, not ideal. The Platonic way of knowledge therefore emphasized reasoning as a method, downplaying the importance of observation. Aristotle disagreed, locating the Forms in the natural world as the fundamental principles to be discovered through the inquiry into nature ( Metaphysics Z , in Barnes 1984).

Aristotle is recognized as giving the earliest systematic treatise on the nature of scientific inquiry in the western tradition, one which embraced observation and reasoning about the natural world. In the Prior and Posterior Analytics , Aristotle reflects first on the aims and then the methods of inquiry into nature. A number of features can be found which are still considered by most to be essential to science. For Aristotle, empiricism, careful observation (but passive observation, not controlled experiment), is the starting point. The aim is not merely recording of facts, though. For Aristotle, science ( epistêmê ) is a body of properly arranged knowledge or learning—the empirical facts, but also their ordering and display are of crucial importance. The aims of discovery, ordering, and display of facts partly determine the methods required of successful scientific inquiry. Also determinant is the nature of the knowledge being sought, and the explanatory causes proper to that kind of knowledge (see the discussion of the four causes in the entry on Aristotle on causality ).

In addition to careful observation, then, scientific method requires a logic as a system of reasoning for properly arranging, but also inferring beyond, what is known by observation. Methods of reasoning may include induction, prediction, or analogy, among others. Aristotle’s system (along with his catalogue of fallacious reasoning) was collected under the title the Organon . This title would be echoed in later works on scientific reasoning, such as Novum Organon by Francis Bacon, and Novum Organon Restorum by William Whewell (see below). In Aristotle’s Organon reasoning is divided primarily into two forms, a rough division which persists into modern times. The division, known most commonly today as deductive versus inductive method, appears in other eras and methodologies as analysis/​synthesis, non-ampliative/​ampliative, or even confirmation/​verification. The basic idea is there are two “directions” to proceed in our methods of inquiry: one away from what is observed, to the more fundamental, general, and encompassing principles; the other, from the fundamental and general to instances or implications of principles.

The basic aim and method of inquiry identified here can be seen as a theme running throughout the next two millennia of reflection on the correct way to seek after knowledge: carefully observe nature and then seek rules or principles which explain or predict its operation. The Aristotelian corpus provided the framework for a commentary tradition on scientific method independent of science itself (cosmos versus physics.) During the medieval period, figures such as Albertus Magnus (1206–1280), Thomas Aquinas (1225–1274), Robert Grosseteste (1175–1253), Roger Bacon (1214/1220–1292), William of Ockham (1287–1347), Andreas Vesalius (1514–1546), Giacomo Zabarella (1533–1589) all worked to clarify the kind of knowledge obtainable by observation and induction, the source of justification of induction, and best rules for its application. [ 2 ] Many of their contributions we now think of as essential to science (see also Laudan 1968). As Aristotle and Plato had employed a framework of reasoning either “to the forms” or “away from the forms”, medieval thinkers employed directions away from the phenomena or back to the phenomena. In analysis, a phenomena was examined to discover its basic explanatory principles; in synthesis, explanations of a phenomena were constructed from first principles.

During the Scientific Revolution these various strands of argument, experiment, and reason were forged into a dominant epistemic authority. The 16 th –18 th centuries were a period of not only dramatic advance in knowledge about the operation of the natural world—advances in mechanical, medical, biological, political, economic explanations—but also of self-awareness of the revolutionary changes taking place, and intense reflection on the source and legitimation of the method by which the advances were made. The struggle to establish the new authority included methodological moves. The Book of Nature, according to the metaphor of Galileo Galilei (1564–1642) or Francis Bacon (1561–1626), was written in the language of mathematics, of geometry and number. This motivated an emphasis on mathematical description and mechanical explanation as important aspects of scientific method. Through figures such as Henry More and Ralph Cudworth, a neo-Platonic emphasis on the importance of metaphysical reflection on nature behind appearances, particularly regarding the spiritual as a complement to the purely mechanical, remained an important methodological thread of the Scientific Revolution (see the entries on Cambridge platonists ; Boyle ; Henry More ; Galileo ).

In Novum Organum (1620), Bacon was critical of the Aristotelian method for leaping from particulars to universals too quickly. The syllogistic form of reasoning readily mixed those two types of propositions. Bacon aimed at the invention of new arts, principles, and directions. His method would be grounded in methodical collection of observations, coupled with correction of our senses (and particularly, directions for the avoidance of the Idols, as he called them, kinds of systematic errors to which naïve observers are prone.) The community of scientists could then climb, by a careful, gradual and unbroken ascent, to reliable general claims.

Bacon’s method has been criticized as impractical and too inflexible for the practicing scientist. Whewell would later criticize Bacon in his System of Logic for paying too little attention to the practices of scientists. It is hard to find convincing examples of Bacon’s method being put in to practice in the history of science, but there are a few who have been held up as real examples of 16 th century scientific, inductive method, even if not in the rigid Baconian mold: figures such as Robert Boyle (1627–1691) and William Harvey (1578–1657) (see the entry on Bacon ).

It is to Isaac Newton (1642–1727), however, that historians of science and methodologists have paid greatest attention. Given the enormous success of his Principia Mathematica and Opticks , this is understandable. The study of Newton’s method has had two main thrusts: the implicit method of the experiments and reasoning presented in the Opticks, and the explicit methodological rules given as the Rules for Philosophising (the Regulae) in Book III of the Principia . [ 3 ] Newton’s law of gravitation, the linchpin of his new cosmology, broke with explanatory conventions of natural philosophy, first for apparently proposing action at a distance, but more generally for not providing “true”, physical causes. The argument for his System of the World ( Principia , Book III) was based on phenomena, not reasoned first principles. This was viewed (mainly on the continent) as insufficient for proper natural philosophy. The Regulae counter this objection, re-defining the aims of natural philosophy by re-defining the method natural philosophers should follow. (See the entry on Newton’s philosophy .)

To his list of methodological prescriptions should be added Newton’s famous phrase “ hypotheses non fingo ” (commonly translated as “I frame no hypotheses”.) The scientist was not to invent systems but infer explanations from observations, as Bacon had advocated. This would come to be known as inductivism. In the century after Newton, significant clarifications of the Newtonian method were made. Colin Maclaurin (1698–1746), for instance, reconstructed the essential structure of the method as having complementary analysis and synthesis phases, one proceeding away from the phenomena in generalization, the other from the general propositions to derive explanations of new phenomena. Denis Diderot (1713–1784) and editors of the Encyclopédie did much to consolidate and popularize Newtonianism, as did Francesco Algarotti (1721–1764). The emphasis was often the same, as much on the character of the scientist as on their process, a character which is still commonly assumed. The scientist is humble in the face of nature, not beholden to dogma, obeys only his eyes, and follows the truth wherever it leads. It was certainly Voltaire (1694–1778) and du Chatelet (1706–1749) who were most influential in propagating the latter vision of the scientist and their craft, with Newton as hero. Scientific method became a revolutionary force of the Enlightenment. (See also the entries on Newton , Leibniz , Descartes , Boyle , Hume , enlightenment , as well as Shank 2008 for a historical overview.)

Not all 18 th century reflections on scientific method were so celebratory. Famous also are George Berkeley’s (1685–1753) attack on the mathematics of the new science, as well as the over-emphasis of Newtonians on observation; and David Hume’s (1711–1776) undermining of the warrant offered for scientific claims by inductive justification (see the entries on: George Berkeley ; David Hume ; Hume’s Newtonianism and Anti-Newtonianism ). Hume’s problem of induction motivated Immanuel Kant (1724–1804) to seek new foundations for empirical method, though as an epistemic reconstruction, not as any set of practical guidelines for scientists. Both Hume and Kant influenced the methodological reflections of the next century, such as the debate between Mill and Whewell over the certainty of inductive inferences in science.

The debate between John Stuart Mill (1806–1873) and William Whewell (1794–1866) has become the canonical methodological debate of the 19 th century. Although often characterized as a debate between inductivism and hypothetico-deductivism, the role of the two methods on each side is actually more complex. On the hypothetico-deductive account, scientists work to come up with hypotheses from which true observational consequences can be deduced—hence, hypothetico-deductive. Because Whewell emphasizes both hypotheses and deduction in his account of method, he can be seen as a convenient foil to the inductivism of Mill. However, equally if not more important to Whewell’s portrayal of scientific method is what he calls the “fundamental antithesis”. Knowledge is a product of the objective (what we see in the world around us) and subjective (the contributions of our mind to how we perceive and understand what we experience, which he called the Fundamental Ideas). Both elements are essential according to Whewell, and he was therefore critical of Kant for too much focus on the subjective, and John Locke (1632–1704) and Mill for too much focus on the senses. Whewell’s fundamental ideas can be discipline relative. An idea can be fundamental even if it is necessary for knowledge only within a given scientific discipline (e.g., chemical affinity for chemistry). This distinguishes fundamental ideas from the forms and categories of intuition of Kant. (See the entry on Whewell .)

Clarifying fundamental ideas would therefore be an essential part of scientific method and scientific progress. Whewell called this process “Discoverer’s Induction”. It was induction, following Bacon or Newton, but Whewell sought to revive Bacon’s account by emphasising the role of ideas in the clear and careful formulation of inductive hypotheses. Whewell’s induction is not merely the collecting of objective facts. The subjective plays a role through what Whewell calls the Colligation of Facts, a creative act of the scientist, the invention of a theory. A theory is then confirmed by testing, where more facts are brought under the theory, called the Consilience of Inductions. Whewell felt that this was the method by which the true laws of nature could be discovered: clarification of fundamental concepts, clever invention of explanations, and careful testing. Mill, in his critique of Whewell, and others who have cast Whewell as a fore-runner of the hypothetico-deductivist view, seem to have under-estimated the importance of this discovery phase in Whewell’s understanding of method (Snyder 1997a,b, 1999). Down-playing the discovery phase would come to characterize methodology of the early 20 th century (see section 3 ).

Mill, in his System of Logic , put forward a narrower view of induction as the essence of scientific method. For Mill, induction is the search first for regularities among events. Among those regularities, some will continue to hold for further observations, eventually gaining the status of laws. One can also look for regularities among the laws discovered in a domain, i.e., for a law of laws. Which “law law” will hold is time and discipline dependent and open to revision. One example is the Law of Universal Causation, and Mill put forward specific methods for identifying causes—now commonly known as Mill’s methods. These five methods look for circumstances which are common among the phenomena of interest, those which are absent when the phenomena are, or those for which both vary together. Mill’s methods are still seen as capturing basic intuitions about experimental methods for finding the relevant explanatory factors ( System of Logic (1843), see Mill entry). The methods advocated by Whewell and Mill, in the end, look similar. Both involve inductive generalization to covering laws. They differ dramatically, however, with respect to the necessity of the knowledge arrived at; that is, at the meta-methodological level (see the entries on Whewell and Mill entries).

3. Logic of method and critical responses

The quantum and relativistic revolutions in physics in the early 20 th century had a profound effect on methodology. Conceptual foundations of both theories were taken to show the defeasibility of even the most seemingly secure intuitions about space, time and bodies. Certainty of knowledge about the natural world was therefore recognized as unattainable. Instead a renewed empiricism was sought which rendered science fallible but still rationally justifiable.

Analyses of the reasoning of scientists emerged, according to which the aspects of scientific method which were of primary importance were the means of testing and confirming of theories. A distinction in methodology was made between the contexts of discovery and justification. The distinction could be used as a wedge between the particularities of where and how theories or hypotheses are arrived at, on the one hand, and the underlying reasoning scientists use (whether or not they are aware of it) when assessing theories and judging their adequacy on the basis of the available evidence. By and large, for most of the 20 th century, philosophy of science focused on the second context, although philosophers differed on whether to focus on confirmation or refutation as well as on the many details of how confirmation or refutation could or could not be brought about. By the mid-20 th century these attempts at defining the method of justification and the context distinction itself came under pressure. During the same period, philosophy of science developed rapidly, and from section 4 this entry will therefore shift from a primarily historical treatment of the scientific method towards a primarily thematic one.

Advances in logic and probability held out promise of the possibility of elaborate reconstructions of scientific theories and empirical method, the best example being Rudolf Carnap’s The Logical Structure of the World (1928). Carnap attempted to show that a scientific theory could be reconstructed as a formal axiomatic system—that is, a logic. That system could refer to the world because some of its basic sentences could be interpreted as observations or operations which one could perform to test them. The rest of the theoretical system, including sentences using theoretical or unobservable terms (like electron or force) would then either be meaningful because they could be reduced to observations, or they had purely logical meanings (called analytic, like mathematical identities). This has been referred to as the verifiability criterion of meaning. According to the criterion, any statement not either analytic or verifiable was strictly meaningless. Although the view was endorsed by Carnap in 1928, he would later come to see it as too restrictive (Carnap 1956). Another familiar version of this idea is operationalism of Percy William Bridgman. In The Logic of Modern Physics (1927) Bridgman asserted that every physical concept could be defined in terms of the operations one would perform to verify the application of that concept. Making good on the operationalisation of a concept even as simple as length, however, can easily become enormously complex (for measuring very small lengths, for instance) or impractical (measuring large distances like light years.)

Carl Hempel’s (1950, 1951) criticisms of the verifiability criterion of meaning had enormous influence. He pointed out that universal generalizations, such as most scientific laws, were not strictly meaningful on the criterion. Verifiability and operationalism both seemed too restrictive to capture standard scientific aims and practice. The tenuous connection between these reconstructions and actual scientific practice was criticized in another way. In both approaches, scientific methods are instead recast in methodological roles. Measurements, for example, were looked to as ways of giving meanings to terms. The aim of the philosopher of science was not to understand the methods per se , but to use them to reconstruct theories, their meanings, and their relation to the world. When scientists perform these operations, however, they will not report that they are doing them to give meaning to terms in a formal axiomatic system. This disconnect between methodology and the details of actual scientific practice would seem to violate the empiricism the Logical Positivists and Bridgman were committed to. The view that methodology should correspond to practice (to some extent) has been called historicism, or intuitionism. We turn to these criticisms and responses in section 3.4 . [ 4 ]

Positivism also had to contend with the recognition that a purely inductivist approach, along the lines of Bacon-Newton-Mill, was untenable. There was no pure observation, for starters. All observation was theory laden. Theory is required to make any observation, therefore not all theory can be derived from observation alone. (See the entry on theory and observation in science .) Even granting an observational basis, Hume had already pointed out that one could not deductively justify inductive conclusions without begging the question by presuming the success of the inductive method. Likewise, positivist attempts at analyzing how a generalization can be confirmed by observations of its instances were subject to a number of criticisms. Goodman (1965) and Hempel (1965) both point to paradoxes inherent in standard accounts of confirmation. Recent attempts at explaining how observations can serve to confirm a scientific theory are discussed in section 4 below.

The standard starting point for a non-inductive analysis of the logic of confirmation is known as the Hypothetico-Deductive (H-D) method. In its simplest form, a sentence of a theory which expresses some hypothesis is confirmed by its true consequences. As noted in section 2 , this method had been advanced by Whewell in the 19 th century, as well as Nicod (1924) and others in the 20 th century. Often, Hempel’s (1966) description of the H-D method, illustrated by the case of Semmelweiss’ inferential procedures in establishing the cause of childbed fever, has been presented as a key account of H-D as well as a foil for criticism of the H-D account of confirmation (see, for example, Lipton’s (2004) discussion of inference to the best explanation; also the entry on confirmation ). Hempel described Semmelsweiss’ procedure as examining various hypotheses explaining the cause of childbed fever. Some hypotheses conflicted with observable facts and could be rejected as false immediately. Others needed to be tested experimentally by deducing which observable events should follow if the hypothesis were true (what Hempel called the test implications of the hypothesis), then conducting an experiment and observing whether or not the test implications occurred. If the experiment showed the test implication to be false, the hypothesis could be rejected. If the experiment showed the test implications to be true, however, this did not prove the hypothesis true. The confirmation of a test implication does not verify a hypothesis, though Hempel did allow that “it provides at least some support, some corroboration or confirmation for it” (Hempel 1966: 8). The degree of this support then depends on the quantity, variety and precision of the supporting evidence.

Another approach that took off from the difficulties with inductive inference was Karl Popper’s critical rationalism or falsificationism (Popper 1959, 1963). Falsification is deductive and similar to H-D in that it involves scientists deducing observational consequences from the hypothesis under test. For Popper, however, the important point was not the degree of confirmation that successful prediction offered to a hypothesis. The crucial thing was the logical asymmetry between confirmation, based on inductive inference, and falsification, which can be based on a deductive inference. (This simple opposition was later questioned, by Lakatos, among others. See the entry on historicist theories of scientific rationality. )

Popper stressed that, regardless of the amount of confirming evidence, we can never be certain that a hypothesis is true without committing the fallacy of affirming the consequent. Instead, Popper introduced the notion of corroboration as a measure for how well a theory or hypothesis has survived previous testing—but without implying that this is also a measure for the probability that it is true.

Popper was also motivated by his doubts about the scientific status of theories like the Marxist theory of history or psycho-analysis, and so wanted to demarcate between science and pseudo-science. Popper saw this as an importantly different distinction than demarcating science from metaphysics. The latter demarcation was the primary concern of many logical empiricists. Popper used the idea of falsification to draw a line instead between pseudo and proper science. Science was science because its method involved subjecting theories to rigorous tests which offered a high probability of failing and thus refuting the theory.

A commitment to the risk of failure was important. Avoiding falsification could be done all too easily. If a consequence of a theory is inconsistent with observations, an exception can be added by introducing auxiliary hypotheses designed explicitly to save the theory, so-called ad hoc modifications. This Popper saw done in pseudo-science where ad hoc theories appeared capable of explaining anything in their field of application. In contrast, science is risky. If observations showed the predictions from a theory to be wrong, the theory would be refuted. Hence, scientific hypotheses must be falsifiable. Not only must there exist some possible observation statement which could falsify the hypothesis or theory, were it observed, (Popper called these the hypothesis’ potential falsifiers) it is crucial to the Popperian scientific method that such falsifications be sincerely attempted on a regular basis.

The more potential falsifiers of a hypothesis, the more falsifiable it would be, and the more the hypothesis claimed. Conversely, hypotheses without falsifiers claimed very little or nothing at all. Originally, Popper thought that this meant the introduction of ad hoc hypotheses only to save a theory should not be countenanced as good scientific method. These would undermine the falsifiabililty of a theory. However, Popper later came to recognize that the introduction of modifications (immunizations, he called them) was often an important part of scientific development. Responding to surprising or apparently falsifying observations often generated important new scientific insights. Popper’s own example was the observed motion of Uranus which originally did not agree with Newtonian predictions. The ad hoc hypothesis of an outer planet explained the disagreement and led to further falsifiable predictions. Popper sought to reconcile the view by blurring the distinction between falsifiable and not falsifiable, and speaking instead of degrees of testability (Popper 1985: 41f.).

From the 1960s on, sustained meta-methodological criticism emerged that drove philosophical focus away from scientific method. A brief look at those criticisms follows, with recommendations for further reading at the end of the entry.

Thomas Kuhn’s The Structure of Scientific Revolutions (1962) begins with a well-known shot across the bow for philosophers of science:

History, if viewed as a repository for more than anecdote or chronology, could produce a decisive transformation in the image of science by which we are now possessed. (1962: 1)

The image Kuhn thought needed transforming was the a-historical, rational reconstruction sought by many of the Logical Positivists, though Carnap and other positivists were actually quite sympathetic to Kuhn’s views. (See the entry on the Vienna Circle .) Kuhn shares with other of his contemporaries, such as Feyerabend and Lakatos, a commitment to a more empirical approach to philosophy of science. Namely, the history of science provides important data, and necessary checks, for philosophy of science, including any theory of scientific method.

The history of science reveals, according to Kuhn, that scientific development occurs in alternating phases. During normal science, the members of the scientific community adhere to the paradigm in place. Their commitment to the paradigm means a commitment to the puzzles to be solved and the acceptable ways of solving them. Confidence in the paradigm remains so long as steady progress is made in solving the shared puzzles. Method in this normal phase operates within a disciplinary matrix (Kuhn’s later concept of a paradigm) which includes standards for problem solving, and defines the range of problems to which the method should be applied. An important part of a disciplinary matrix is the set of values which provide the norms and aims for scientific method. The main values that Kuhn identifies are prediction, problem solving, simplicity, consistency, and plausibility.

An important by-product of normal science is the accumulation of puzzles which cannot be solved with resources of the current paradigm. Once accumulation of these anomalies has reached some critical mass, it can trigger a communal shift to a new paradigm and a new phase of normal science. Importantly, the values that provide the norms and aims for scientific method may have transformed in the meantime. Method may therefore be relative to discipline, time or place

Feyerabend also identified the aims of science as progress, but argued that any methodological prescription would only stifle that progress (Feyerabend 1988). His arguments are grounded in re-examining accepted “myths” about the history of science. Heroes of science, like Galileo, are shown to be just as reliant on rhetoric and persuasion as they are on reason and demonstration. Others, like Aristotle, are shown to be far more reasonable and far-reaching in their outlooks then they are given credit for. As a consequence, the only rule that could provide what he took to be sufficient freedom was the vacuous “anything goes”. More generally, even the methodological restriction that science is the best way to pursue knowledge, and to increase knowledge, is too restrictive. Feyerabend suggested instead that science might, in fact, be a threat to a free society, because it and its myth had become so dominant (Feyerabend 1978).

An even more fundamental kind of criticism was offered by several sociologists of science from the 1970s onwards who rejected the methodology of providing philosophical accounts for the rational development of science and sociological accounts of the irrational mistakes. Instead, they adhered to a symmetry thesis on which any causal explanation of how scientific knowledge is established needs to be symmetrical in explaining truth and falsity, rationality and irrationality, success and mistakes, by the same causal factors (see, e.g., Barnes and Bloor 1982, Bloor 1991). Movements in the Sociology of Science, like the Strong Programme, or in the social dimensions and causes of knowledge more generally led to extended and close examination of detailed case studies in contemporary science and its history. (See the entries on the social dimensions of scientific knowledge and social epistemology .) Well-known examinations by Latour and Woolgar (1979/1986), Knorr-Cetina (1981), Pickering (1984), Shapin and Schaffer (1985) seem to bear out that it was social ideologies (on a macro-scale) or individual interactions and circumstances (on a micro-scale) which were the primary causal factors in determining which beliefs gained the status of scientific knowledge. As they saw it therefore, explanatory appeals to scientific method were not empirically grounded.

A late, and largely unexpected, criticism of scientific method came from within science itself. Beginning in the early 2000s, a number of scientists attempting to replicate the results of published experiments could not do so. There may be close conceptual connection between reproducibility and method. For example, if reproducibility means that the same scientific methods ought to produce the same result, and all scientific results ought to be reproducible, then whatever it takes to reproduce a scientific result ought to be called scientific method. Space limits us to the observation that, insofar as reproducibility is a desired outcome of proper scientific method, it is not strictly a part of scientific method. (See the entry on reproducibility of scientific results .)

By the close of the 20 th century the search for the scientific method was flagging. Nola and Sankey (2000b) could introduce their volume on method by remarking that “For some, the whole idea of a theory of scientific method is yester-year’s debate …”.

Despite the many difficulties that philosophers encountered in trying to providing a clear methodology of conformation (or refutation), still important progress has been made on understanding how observation can provide evidence for a given theory. Work in statistics has been crucial for understanding how theories can be tested empirically, and in recent decades a huge literature has developed that attempts to recast confirmation in Bayesian terms. Here these developments can be covered only briefly, and we refer to the entry on confirmation for further details and references.

Statistics has come to play an increasingly important role in the methodology of the experimental sciences from the 19 th century onwards. At that time, statistics and probability theory took on a methodological role as an analysis of inductive inference, and attempts to ground the rationality of induction in the axioms of probability theory have continued throughout the 20 th century and in to the present. Developments in the theory of statistics itself, meanwhile, have had a direct and immense influence on the experimental method, including methods for measuring the uncertainty of observations such as the Method of Least Squares developed by Legendre and Gauss in the early 19 th century, criteria for the rejection of outliers proposed by Peirce by the mid-19 th century, and the significance tests developed by Gosset (a.k.a. “Student”), Fisher, Neyman & Pearson and others in the 1920s and 1930s (see, e.g., Swijtink 1987 for a brief historical overview; and also the entry on C.S. Peirce ).

These developments within statistics then in turn led to a reflective discussion among both statisticians and philosophers of science on how to perceive the process of hypothesis testing: whether it was a rigorous statistical inference that could provide a numerical expression of the degree of confidence in the tested hypothesis, or if it should be seen as a decision between different courses of actions that also involved a value component. This led to a major controversy among Fisher on the one side and Neyman and Pearson on the other (see especially Fisher 1955, Neyman 1956 and Pearson 1955, and for analyses of the controversy, e.g., Howie 2002, Marks 2000, Lenhard 2006). On Fisher’s view, hypothesis testing was a methodology for when to accept or reject a statistical hypothesis, namely that a hypothesis should be rejected by evidence if this evidence would be unlikely relative to other possible outcomes, given the hypothesis were true. In contrast, on Neyman and Pearson’s view, the consequence of error also had to play a role when deciding between hypotheses. Introducing the distinction between the error of rejecting a true hypothesis (type I error) and accepting a false hypothesis (type II error), they argued that it depends on the consequences of the error to decide whether it is more important to avoid rejecting a true hypothesis or accepting a false one. Hence, Fisher aimed for a theory of inductive inference that enabled a numerical expression of confidence in a hypothesis. To him, the important point was the search for truth, not utility. In contrast, the Neyman-Pearson approach provided a strategy of inductive behaviour for deciding between different courses of action. Here, the important point was not whether a hypothesis was true, but whether one should act as if it was.

Similar discussions are found in the philosophical literature. On the one side, Churchman (1948) and Rudner (1953) argued that because scientific hypotheses can never be completely verified, a complete analysis of the methods of scientific inference includes ethical judgments in which the scientists must decide whether the evidence is sufficiently strong or that the probability is sufficiently high to warrant the acceptance of the hypothesis, which again will depend on the importance of making a mistake in accepting or rejecting the hypothesis. Others, such as Jeffrey (1956) and Levi (1960) disagreed and instead defended a value-neutral view of science on which scientists should bracket their attitudes, preferences, temperament, and values when assessing the correctness of their inferences. For more details on this value-free ideal in the philosophy of science and its historical development, see Douglas (2009) and Howard (2003). For a broad set of case studies examining the role of values in science, see e.g. Elliott & Richards 2017.

In recent decades, philosophical discussions of the evaluation of probabilistic hypotheses by statistical inference have largely focused on Bayesianism that understands probability as a measure of a person’s degree of belief in an event, given the available information, and frequentism that instead understands probability as a long-run frequency of a repeatable event. Hence, for Bayesians probabilities refer to a state of knowledge, whereas for frequentists probabilities refer to frequencies of events (see, e.g., Sober 2008, chapter 1 for a detailed introduction to Bayesianism and frequentism as well as to likelihoodism). Bayesianism aims at providing a quantifiable, algorithmic representation of belief revision, where belief revision is a function of prior beliefs (i.e., background knowledge) and incoming evidence. Bayesianism employs a rule based on Bayes’ theorem, a theorem of the probability calculus which relates conditional probabilities. The probability that a particular hypothesis is true is interpreted as a degree of belief, or credence, of the scientist. There will also be a probability and a degree of belief that a hypothesis will be true conditional on a piece of evidence (an observation, say) being true. Bayesianism proscribes that it is rational for the scientist to update their belief in the hypothesis to that conditional probability should it turn out that the evidence is, in fact, observed (see, e.g., Sprenger & Hartmann 2019 for a comprehensive treatment of Bayesian philosophy of science). Originating in the work of Neyman and Person, frequentism aims at providing the tools for reducing long-run error rates, such as the error-statistical approach developed by Mayo (1996) that focuses on how experimenters can avoid both type I and type II errors by building up a repertoire of procedures that detect errors if and only if they are present. Both Bayesianism and frequentism have developed over time, they are interpreted in different ways by its various proponents, and their relations to previous criticism to attempts at defining scientific method are seen differently by proponents and critics. The literature, surveys, reviews and criticism in this area are vast and the reader is referred to the entries on Bayesian epistemology and confirmation .

5. Method in Practice

Attention to scientific practice, as we have seen, is not itself new. However, the turn to practice in the philosophy of science of late can be seen as a correction to the pessimism with respect to method in philosophy of science in later parts of the 20 th century, and as an attempted reconciliation between sociological and rationalist explanations of scientific knowledge. Much of this work sees method as detailed and context specific problem-solving procedures, and methodological analyses to be at the same time descriptive, critical and advisory (see Nickles 1987 for an exposition of this view). The following section contains a survey of some of the practice focuses. In this section we turn fully to topics rather than chronology.

A problem with the distinction between the contexts of discovery and justification that figured so prominently in philosophy of science in the first half of the 20 th century (see section 2 ) is that no such distinction can be clearly seen in scientific activity (see Arabatzis 2006). Thus, in recent decades, it has been recognized that study of conceptual innovation and change should not be confined to psychology and sociology of science, but are also important aspects of scientific practice which philosophy of science should address (see also the entry on scientific discovery ). Looking for the practices that drive conceptual innovation has led philosophers to examine both the reasoning practices of scientists and the wide realm of experimental practices that are not directed narrowly at testing hypotheses, that is, exploratory experimentation.

Examining the reasoning practices of historical and contemporary scientists, Nersessian (2008) has argued that new scientific concepts are constructed as solutions to specific problems by systematic reasoning, and that of analogy, visual representation and thought-experimentation are among the important reasoning practices employed. These ubiquitous forms of reasoning are reliable—but also fallible—methods of conceptual development and change. On her account, model-based reasoning consists of cycles of construction, simulation, evaluation and adaption of models that serve as interim interpretations of the target problem to be solved. Often, this process will lead to modifications or extensions, and a new cycle of simulation and evaluation. However, Nersessian also emphasizes that

creative model-based reasoning cannot be applied as a simple recipe, is not always productive of solutions, and even its most exemplary usages can lead to incorrect solutions. (Nersessian 2008: 11)

Thus, while on the one hand she agrees with many previous philosophers that there is no logic of discovery, discoveries can derive from reasoned processes, such that a large and integral part of scientific practice is

the creation of concepts through which to comprehend, structure, and communicate about physical phenomena …. (Nersessian 1987: 11)

Similarly, work on heuristics for discovery and theory construction by scholars such as Darden (1991) and Bechtel & Richardson (1993) present science as problem solving and investigate scientific problem solving as a special case of problem-solving in general. Drawing largely on cases from the biological sciences, much of their focus has been on reasoning strategies for the generation, evaluation, and revision of mechanistic explanations of complex systems.

Addressing another aspect of the context distinction, namely the traditional view that the primary role of experiments is to test theoretical hypotheses according to the H-D model, other philosophers of science have argued for additional roles that experiments can play. The notion of exploratory experimentation was introduced to describe experiments driven by the desire to obtain empirical regularities and to develop concepts and classifications in which these regularities can be described (Steinle 1997, 2002; Burian 1997; Waters 2007)). However the difference between theory driven experimentation and exploratory experimentation should not be seen as a sharp distinction. Theory driven experiments are not always directed at testing hypothesis, but may also be directed at various kinds of fact-gathering, such as determining numerical parameters. Vice versa , exploratory experiments are usually informed by theory in various ways and are therefore not theory-free. Instead, in exploratory experiments phenomena are investigated without first limiting the possible outcomes of the experiment on the basis of extant theory about the phenomena.

The development of high throughput instrumentation in molecular biology and neighbouring fields has given rise to a special type of exploratory experimentation that collects and analyses very large amounts of data, and these new ‘omics’ disciplines are often said to represent a break with the ideal of hypothesis-driven science (Burian 2007; Elliott 2007; Waters 2007; O’Malley 2007) and instead described as data-driven research (Leonelli 2012; Strasser 2012) or as a special kind of “convenience experimentation” in which many experiments are done simply because they are extraordinarily convenient to perform (Krohs 2012).

5.2 Computer methods and ‘new ways’ of doing science

The field of omics just described is possible because of the ability of computers to process, in a reasonable amount of time, the huge quantities of data required. Computers allow for more elaborate experimentation (higher speed, better filtering, more variables, sophisticated coordination and control), but also, through modelling and simulations, might constitute a form of experimentation themselves. Here, too, we can pose a version of the general question of method versus practice: does the practice of using computers fundamentally change scientific method, or merely provide a more efficient means of implementing standard methods?

Because computers can be used to automate measurements, quantifications, calculations, and statistical analyses where, for practical reasons, these operations cannot be otherwise carried out, many of the steps involved in reaching a conclusion on the basis of an experiment are now made inside a “black box”, without the direct involvement or awareness of a human. This has epistemological implications, regarding what we can know, and how we can know it. To have confidence in the results, computer methods are therefore subjected to tests of verification and validation.

The distinction between verification and validation is easiest to characterize in the case of computer simulations. In a typical computer simulation scenario computers are used to numerically integrate differential equations for which no analytic solution is available. The equations are part of the model the scientist uses to represent a phenomenon or system under investigation. Verifying a computer simulation means checking that the equations of the model are being correctly approximated. Validating a simulation means checking that the equations of the model are adequate for the inferences one wants to make on the basis of that model.

A number of issues related to computer simulations have been raised. The identification of validity and verification as the testing methods has been criticized. Oreskes et al. (1994) raise concerns that “validiation”, because it suggests deductive inference, might lead to over-confidence in the results of simulations. The distinction itself is probably too clean, since actual practice in the testing of simulations mixes and moves back and forth between the two (Weissart 1997; Parker 2008a; Winsberg 2010). Computer simulations do seem to have a non-inductive character, given that the principles by which they operate are built in by the programmers, and any results of the simulation follow from those in-built principles in such a way that those results could, in principle, be deduced from the program code and its inputs. The status of simulations as experiments has therefore been examined (Kaufmann and Smarr 1993; Humphreys 1995; Hughes 1999; Norton and Suppe 2001). This literature considers the epistemology of these experiments: what we can learn by simulation, and also the kinds of justifications which can be given in applying that knowledge to the “real” world. (Mayo 1996; Parker 2008b). As pointed out, part of the advantage of computer simulation derives from the fact that huge numbers of calculations can be carried out without requiring direct observation by the experimenter/​simulator. At the same time, many of these calculations are approximations to the calculations which would be performed first-hand in an ideal situation. Both factors introduce uncertainties into the inferences drawn from what is observed in the simulation.

For many of the reasons described above, computer simulations do not seem to belong clearly to either the experimental or theoretical domain. Rather, they seem to crucially involve aspects of both. This has led some authors, such as Fox Keller (2003: 200) to argue that we ought to consider computer simulation a “qualitatively different way of doing science”. The literature in general tends to follow Kaufmann and Smarr (1993) in referring to computer simulation as a “third way” for scientific methodology (theoretical reasoning and experimental practice are the first two ways.). It should also be noted that the debates around these issues have tended to focus on the form of computer simulation typical in the physical sciences, where models are based on dynamical equations. Other forms of simulation might not have the same problems, or have problems of their own (see the entry on computer simulations in science ).

In recent years, the rapid development of machine learning techniques has prompted some scholars to suggest that the scientific method has become “obsolete” (Anderson 2008, Carrol and Goodstein 2009). This has resulted in an intense debate on the relative merit of data-driven and hypothesis-driven research (for samples, see e.g. Mazzocchi 2015 or Succi and Coveney 2018). For a detailed treatment of this topic, we refer to the entry scientific research and big data .

6. Discourse on scientific method

Despite philosophical disagreements, the idea of the scientific method still figures prominently in contemporary discourse on many different topics, both within science and in society at large. Often, reference to scientific method is used in ways that convey either the legend of a single, universal method characteristic of all science, or grants to a particular method or set of methods privilege as a special ‘gold standard’, often with reference to particular philosophers to vindicate the claims. Discourse on scientific method also typically arises when there is a need to distinguish between science and other activities, or for justifying the special status conveyed to science. In these areas, the philosophical attempts at identifying a set of methods characteristic for scientific endeavors are closely related to the philosophy of science’s classical problem of demarcation (see the entry on science and pseudo-science ) and to the philosophical analysis of the social dimension of scientific knowledge and the role of science in democratic society.

One of the settings in which the legend of a single, universal scientific method has been particularly strong is science education (see, e.g., Bauer 1992; McComas 1996; Wivagg & Allchin 2002). [ 5 ] Often, ‘the scientific method’ is presented in textbooks and educational web pages as a fixed four or five step procedure starting from observations and description of a phenomenon and progressing over formulation of a hypothesis which explains the phenomenon, designing and conducting experiments to test the hypothesis, analyzing the results, and ending with drawing a conclusion. Such references to a universal scientific method can be found in educational material at all levels of science education (Blachowicz 2009), and numerous studies have shown that the idea of a general and universal scientific method often form part of both students’ and teachers’ conception of science (see, e.g., Aikenhead 1987; Osborne et al. 2003). In response, it has been argued that science education need to focus more on teaching about the nature of science, although views have differed on whether this is best done through student-led investigations, contemporary cases, or historical cases (Allchin, Andersen & Nielsen 2014)

Although occasionally phrased with reference to the H-D method, important historical roots of the legend in science education of a single, universal scientific method are the American philosopher and psychologist Dewey’s account of inquiry in How We Think (1910) and the British mathematician Karl Pearson’s account of science in Grammar of Science (1892). On Dewey’s account, inquiry is divided into the five steps of

(i) a felt difficulty, (ii) its location and definition, (iii) suggestion of a possible solution, (iv) development by reasoning of the bearing of the suggestions, (v) further observation and experiment leading to its acceptance or rejection. (Dewey 1910: 72)

Similarly, on Pearson’s account, scientific investigations start with measurement of data and observation of their correction and sequence from which scientific laws can be discovered with the aid of creative imagination. These laws have to be subject to criticism, and their final acceptance will have equal validity for “all normally constituted minds”. Both Dewey’s and Pearson’s accounts should be seen as generalized abstractions of inquiry and not restricted to the realm of science—although both Dewey and Pearson referred to their respective accounts as ‘the scientific method’.

Occasionally, scientists make sweeping statements about a simple and distinct scientific method, as exemplified by Feynman’s simplified version of a conjectures and refutations method presented, for example, in the last of his 1964 Cornell Messenger lectures. [ 6 ] However, just as often scientists have come to the same conclusion as recent philosophy of science that there is not any unique, easily described scientific method. For example, the physicist and Nobel Laureate Weinberg described in the paper “The Methods of Science … And Those By Which We Live” (1995) how

The fact that the standards of scientific success shift with time does not only make the philosophy of science difficult; it also raises problems for the public understanding of science. We do not have a fixed scientific method to rally around and defend. (1995: 8)

Interview studies with scientists on their conception of method shows that scientists often find it hard to figure out whether available evidence confirms their hypothesis, and that there are no direct translations between general ideas about method and specific strategies to guide how research is conducted (Schickore & Hangel 2019, Hangel & Schickore 2017)

Reference to the scientific method has also often been used to argue for the scientific nature or special status of a particular activity. Philosophical positions that argue for a simple and unique scientific method as a criterion of demarcation, such as Popperian falsification, have often attracted practitioners who felt that they had a need to defend their domain of practice. For example, references to conjectures and refutation as the scientific method are abundant in much of the literature on complementary and alternative medicine (CAM)—alongside the competing position that CAM, as an alternative to conventional biomedicine, needs to develop its own methodology different from that of science.

Also within mainstream science, reference to the scientific method is used in arguments regarding the internal hierarchy of disciplines and domains. A frequently seen argument is that research based on the H-D method is superior to research based on induction from observations because in deductive inferences the conclusion follows necessarily from the premises. (See, e.g., Parascandola 1998 for an analysis of how this argument has been made to downgrade epidemiology compared to the laboratory sciences.) Similarly, based on an examination of the practices of major funding institutions such as the National Institutes of Health (NIH), the National Science Foundation (NSF) and the Biomedical Sciences Research Practices (BBSRC) in the UK, O’Malley et al. (2009) have argued that funding agencies seem to have a tendency to adhere to the view that the primary activity of science is to test hypotheses, while descriptive and exploratory research is seen as merely preparatory activities that are valuable only insofar as they fuel hypothesis-driven research.

In some areas of science, scholarly publications are structured in a way that may convey the impression of a neat and linear process of inquiry from stating a question, devising the methods by which to answer it, collecting the data, to drawing a conclusion from the analysis of data. For example, the codified format of publications in most biomedical journals known as the IMRAD format (Introduction, Method, Results, Analysis, Discussion) is explicitly described by the journal editors as “not an arbitrary publication format but rather a direct reflection of the process of scientific discovery” (see the so-called “Vancouver Recommendations”, ICMJE 2013: 11). However, scientific publications do not in general reflect the process by which the reported scientific results were produced. For example, under the provocative title “Is the scientific paper a fraud?”, Medawar argued that scientific papers generally misrepresent how the results have been produced (Medawar 1963/1996). Similar views have been advanced by philosophers, historians and sociologists of science (Gilbert 1976; Holmes 1987; Knorr-Cetina 1981; Schickore 2008; Suppe 1998) who have argued that scientists’ experimental practices are messy and often do not follow any recognizable pattern. Publications of research results, they argue, are retrospective reconstructions of these activities that often do not preserve the temporal order or the logic of these activities, but are instead often constructed in order to screen off potential criticism (see Schickore 2008 for a review of this work).

Philosophical positions on the scientific method have also made it into the court room, especially in the US where judges have drawn on philosophy of science in deciding when to confer special status to scientific expert testimony. A key case is Daubert vs Merrell Dow Pharmaceuticals (92–102, 509 U.S. 579, 1993). In this case, the Supreme Court argued in its 1993 ruling that trial judges must ensure that expert testimony is reliable, and that in doing this the court must look at the expert’s methodology to determine whether the proffered evidence is actually scientific knowledge. Further, referring to works of Popper and Hempel the court stated that

ordinarily, a key question to be answered in determining whether a theory or technique is scientific knowledge … is whether it can be (and has been) tested. (Justice Blackmun, Daubert v. Merrell Dow Pharmaceuticals; see Other Internet Resources for a link to the opinion)

But as argued by Haack (2005a,b, 2010) and by Foster & Hubner (1999), by equating the question of whether a piece of testimony is reliable with the question whether it is scientific as indicated by a special methodology, the court was producing an inconsistent mixture of Popper’s and Hempel’s philosophies, and this has later led to considerable confusion in subsequent case rulings that drew on the Daubert case (see Haack 2010 for a detailed exposition).

The difficulties around identifying the methods of science are also reflected in the difficulties of identifying scientific misconduct in the form of improper application of the method or methods of science. One of the first and most influential attempts at defining misconduct in science was the US definition from 1989 that defined misconduct as

fabrication, falsification, plagiarism, or other practices that seriously deviate from those that are commonly accepted within the scientific community . (Code of Federal Regulations, part 50, subpart A., August 8, 1989, italics added)

However, the “other practices that seriously deviate” clause was heavily criticized because it could be used to suppress creative or novel science. For example, the National Academy of Science stated in their report Responsible Science (1992) that it

wishes to discourage the possibility that a misconduct complaint could be lodged against scientists based solely on their use of novel or unorthodox research methods. (NAS: 27)

This clause was therefore later removed from the definition. For an entry into the key philosophical literature on conduct in science, see Shamoo & Resnick (2009).

The question of the source of the success of science has been at the core of philosophy since the beginning of modern science. If viewed as a matter of epistemology more generally, scientific method is a part of the entire history of philosophy. Over that time, science and whatever methods its practitioners may employ have changed dramatically. Today, many philosophers have taken up the banners of pluralism or of practice to focus on what are, in effect, fine-grained and contextually limited examinations of scientific method. Others hope to shift perspectives in order to provide a renewed general account of what characterizes the activity we call science.

One such perspective has been offered recently by Hoyningen-Huene (2008, 2013), who argues from the history of philosophy of science that after three lengthy phases of characterizing science by its method, we are now in a phase where the belief in the existence of a positive scientific method has eroded and what has been left to characterize science is only its fallibility. First was a phase from Plato and Aristotle up until the 17 th century where the specificity of scientific knowledge was seen in its absolute certainty established by proof from evident axioms; next was a phase up to the mid-19 th century in which the means to establish the certainty of scientific knowledge had been generalized to include inductive procedures as well. In the third phase, which lasted until the last decades of the 20 th century, it was recognized that empirical knowledge was fallible, but it was still granted a special status due to its distinctive mode of production. But now in the fourth phase, according to Hoyningen-Huene, historical and philosophical studies have shown how “scientific methods with the characteristics as posited in the second and third phase do not exist” (2008: 168) and there is no longer any consensus among philosophers and historians of science about the nature of science. For Hoyningen-Huene, this is too negative a stance, and he therefore urges the question about the nature of science anew. His own answer to this question is that “scientific knowledge differs from other kinds of knowledge, especially everyday knowledge, primarily by being more systematic” (Hoyningen-Huene 2013: 14). Systematicity can have several different dimensions: among them are more systematic descriptions, explanations, predictions, defense of knowledge claims, epistemic connectedness, ideal of completeness, knowledge generation, representation of knowledge and critical discourse. Hence, what characterizes science is the greater care in excluding possible alternative explanations, the more detailed elaboration with respect to data on which predictions are based, the greater care in detecting and eliminating sources of error, the more articulate connections to other pieces of knowledge, etc. On this position, what characterizes science is not that the methods employed are unique to science, but that the methods are more carefully employed.

Another, similar approach has been offered by Haack (2003). She sets off, similar to Hoyningen-Huene, from a dissatisfaction with the recent clash between what she calls Old Deferentialism and New Cynicism. The Old Deferentialist position is that science progressed inductively by accumulating true theories confirmed by empirical evidence or deductively by testing conjectures against basic statements; while the New Cynics position is that science has no epistemic authority and no uniquely rational method and is merely just politics. Haack insists that contrary to the views of the New Cynics, there are objective epistemic standards, and there is something epistemologically special about science, even though the Old Deferentialists pictured this in a wrong way. Instead, she offers a new Critical Commonsensist account on which standards of good, strong, supportive evidence and well-conducted, honest, thorough and imaginative inquiry are not exclusive to the sciences, but the standards by which we judge all inquirers. In this sense, science does not differ in kind from other kinds of inquiry, but it may differ in the degree to which it requires broad and detailed background knowledge and a familiarity with a technical vocabulary that only specialists may possess.

  • Aikenhead, G.S., 1987, “High-school graduates’ beliefs about science-technology-society. III. Characteristics and limitations of scientific knowledge”, Science Education , 71(4): 459–487.
  • Allchin, D., H.M. Andersen and K. Nielsen, 2014, “Complementary Approaches to Teaching Nature of Science: Integrating Student Inquiry, Historical Cases, and Contemporary Cases in Classroom Practice”, Science Education , 98: 461–486.
  • Anderson, C., 2008, “The end of theory: The data deluge makes the scientific method obsolete”, Wired magazine , 16(7): 16–07
  • Arabatzis, T., 2006, “On the inextricability of the context of discovery and the context of justification”, in Revisiting Discovery and Justification , J. Schickore and F. Steinle (eds.), Dordrecht: Springer, pp. 215–230.
  • Barnes, J. (ed.), 1984, The Complete Works of Aristotle, Vols I and II , Princeton: Princeton University Press.
  • Barnes, B. and D. Bloor, 1982, “Relativism, Rationalism, and the Sociology of Knowledge”, in Rationality and Relativism , M. Hollis and S. Lukes (eds.), Cambridge: MIT Press, pp. 1–20.
  • Bauer, H.H., 1992, Scientific Literacy and the Myth of the Scientific Method , Urbana: University of Illinois Press.
  • Bechtel, W. and R.C. Richardson, 1993, Discovering complexity , Princeton, NJ: Princeton University Press.
  • Berkeley, G., 1734, The Analyst in De Motu and The Analyst: A Modern Edition with Introductions and Commentary , D. Jesseph (trans. and ed.), Dordrecht: Kluwer Academic Publishers, 1992.
  • Blachowicz, J., 2009, “How science textbooks treat scientific method: A philosopher’s perspective”, The British Journal for the Philosophy of Science , 60(2): 303–344.
  • Bloor, D., 1991, Knowledge and Social Imagery , Chicago: University of Chicago Press, 2 nd edition.
  • Boyle, R., 1682, New experiments physico-mechanical, touching the air , Printed by Miles Flesher for Richard Davis, bookseller in Oxford.
  • Bridgman, P.W., 1927, The Logic of Modern Physics , New York: Macmillan.
  • –––, 1956, “The Methodological Character of Theoretical Concepts”, in The Foundations of Science and the Concepts of Science and Psychology , Herbert Feigl and Michael Scriven (eds.), Minnesota: University of Minneapolis Press, pp. 38–76.
  • Burian, R., 1997, “Exploratory Experimentation and the Role of Histochemical Techniques in the Work of Jean Brachet, 1938–1952”, History and Philosophy of the Life Sciences , 19(1): 27–45.
  • –––, 2007, “On microRNA and the need for exploratory experimentation in post-genomic molecular biology”, History and Philosophy of the Life Sciences , 29(3): 285–311.
  • Carnap, R., 1928, Der logische Aufbau der Welt , Berlin: Bernary, transl. by R.A. George, The Logical Structure of the World , Berkeley: University of California Press, 1967.
  • –––, 1956, “The methodological character of theoretical concepts”, Minnesota studies in the philosophy of science , 1: 38–76.
  • Carrol, S., and D. Goodstein, 2009, “Defining the scientific method”, Nature Methods , 6: 237.
  • Churchman, C.W., 1948, “Science, Pragmatics, Induction”, Philosophy of Science , 15(3): 249–268.
  • Cooper, J. (ed.), 1997, Plato: Complete Works , Indianapolis: Hackett.
  • Darden, L., 1991, Theory Change in Science: Strategies from Mendelian Genetics , Oxford: Oxford University Press
  • Dewey, J., 1910, How we think , New York: Dover Publications (reprinted 1997).
  • Douglas, H., 2009, Science, Policy, and the Value-Free Ideal , Pittsburgh: University of Pittsburgh Press.
  • Dupré, J., 2004, “Miracle of Monism ”, in Naturalism in Question , Mario De Caro and David Macarthur (eds.), Cambridge, MA: Harvard University Press, pp. 36–58.
  • Elliott, K.C., 2007, “Varieties of exploratory experimentation in nanotoxicology”, History and Philosophy of the Life Sciences , 29(3): 311–334.
  • Elliott, K. C., and T. Richards (eds.), 2017, Exploring inductive risk: Case studies of values in science , Oxford: Oxford University Press.
  • Falcon, Andrea, 2005, Aristotle and the science of nature: Unity without uniformity , Cambridge: Cambridge University Press.
  • Feyerabend, P., 1978, Science in a Free Society , London: New Left Books
  • –––, 1988, Against Method , London: Verso, 2 nd edition.
  • Fisher, R.A., 1955, “Statistical Methods and Scientific Induction”, Journal of The Royal Statistical Society. Series B (Methodological) , 17(1): 69–78.
  • Foster, K. and P.W. Huber, 1999, Judging Science. Scientific Knowledge and the Federal Courts , Cambridge: MIT Press.
  • Fox Keller, E., 2003, “Models, Simulation, and ‘computer experiments’”, in The Philosophy of Scientific Experimentation , H. Radder (ed.), Pittsburgh: Pittsburgh University Press, 198–215.
  • Gilbert, G., 1976, “The transformation of research findings into scientific knowledge”, Social Studies of Science , 6: 281–306.
  • Gimbel, S., 2011, Exploring the Scientific Method , Chicago: University of Chicago Press.
  • Goodman, N., 1965, Fact , Fiction, and Forecast , Indianapolis: Bobbs-Merrill.
  • Haack, S., 1995, “Science is neither sacred nor a confidence trick”, Foundations of Science , 1(3): 323–335.
  • –––, 2003, Defending science—within reason , Amherst: Prometheus.
  • –––, 2005a, “Disentangling Daubert: an epistemological study in theory and practice”, Journal of Philosophy, Science and Law , 5, Haack 2005a available online . doi:10.5840/jpsl2005513
  • –––, 2005b, “Trial and error: The Supreme Court’s philosophy of science”, American Journal of Public Health , 95: S66-S73.
  • –––, 2010, “Federal Philosophy of Science: A Deconstruction-and a Reconstruction”, NYUJL & Liberty , 5: 394.
  • Hangel, N. and J. Schickore, 2017, “Scientists’ conceptions of good research practice”, Perspectives on Science , 25(6): 766–791
  • Harper, W.L., 2011, Isaac Newton’s Scientific Method: Turning Data into Evidence about Gravity and Cosmology , Oxford: Oxford University Press.
  • Hempel, C., 1950, “Problems and Changes in the Empiricist Criterion of Meaning”, Revue Internationale de Philosophie , 41(11): 41–63.
  • –––, 1951, “The Concept of Cognitive Significance: A Reconsideration”, Proceedings of the American Academy of Arts and Sciences , 80(1): 61–77.
  • –––, 1965, Aspects of scientific explanation and other essays in the philosophy of science , New York–London: Free Press.
  • –––, 1966, Philosophy of Natural Science , Englewood Cliffs: Prentice-Hall.
  • Holmes, F.L., 1987, “Scientific writing and scientific discovery”, Isis , 78(2): 220–235.
  • Howard, D., 2003, “Two left turns make a right: On the curious political career of North American philosophy of science at midcentury”, in Logical Empiricism in North America , G.L. Hardcastle & A.W. Richardson (eds.), Minneapolis: University of Minnesota Press, pp. 25–93.
  • Hoyningen-Huene, P., 2008, “Systematicity: The nature of science”, Philosophia , 36(2): 167–180.
  • –––, 2013, Systematicity. The Nature of Science , Oxford: Oxford University Press.
  • Howie, D., 2002, Interpreting probability: Controversies and developments in the early twentieth century , Cambridge: Cambridge University Press.
  • Hughes, R., 1999, “The Ising Model, Computer Simulation, and Universal Physics”, in Models as Mediators , M. Morgan and M. Morrison (eds.), Cambridge: Cambridge University Press, pp. 97–145
  • Hume, D., 1739, A Treatise of Human Nature , D. Fate Norton and M.J. Norton (eds.), Oxford: Oxford University Press, 2000.
  • Humphreys, P., 1995, “Computational science and scientific method”, Minds and Machines , 5(1): 499–512.
  • ICMJE, 2013, “Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals”, International Committee of Medical Journal Editors, available online , accessed August 13 2014
  • Jeffrey, R.C., 1956, “Valuation and Acceptance of Scientific Hypotheses”, Philosophy of Science , 23(3): 237–246.
  • Kaufmann, W.J., and L.L. Smarr, 1993, Supercomputing and the Transformation of Science , New York: Scientific American Library.
  • Knorr-Cetina, K., 1981, The Manufacture of Knowledge , Oxford: Pergamon Press.
  • Krohs, U., 2012, “Convenience experimentation”, Studies in History and Philosophy of Biological and BiomedicalSciences , 43: 52–57.
  • Kuhn, T.S., 1962, The Structure of Scientific Revolutions , Chicago: University of Chicago Press
  • Latour, B. and S. Woolgar, 1986, Laboratory Life: The Construction of Scientific Facts , Princeton: Princeton University Press, 2 nd edition.
  • Laudan, L., 1968, “Theories of scientific method from Plato to Mach”, History of Science , 7(1): 1–63.
  • Lenhard, J., 2006, “Models and statistical inference: The controversy between Fisher and Neyman-Pearson”, The British Journal for the Philosophy of Science , 57(1): 69–91.
  • Leonelli, S., 2012, “Making Sense of Data-Driven Research in the Biological and the Biomedical Sciences”, Studies in the History and Philosophy of the Biological and Biomedical Sciences , 43(1): 1–3.
  • Levi, I., 1960, “Must the scientist make value judgments?”, Philosophy of Science , 57(11): 345–357
  • Lindley, D., 1991, Theory Change in Science: Strategies from Mendelian Genetics , Oxford: Oxford University Press.
  • Lipton, P., 2004, Inference to the Best Explanation , London: Routledge, 2 nd edition.
  • Marks, H.M., 2000, The progress of experiment: science and therapeutic reform in the United States, 1900–1990 , Cambridge: Cambridge University Press.
  • Mazzochi, F., 2015, “Could Big Data be the end of theory in science?”, EMBO reports , 16: 1250–1255.
  • Mayo, D.G., 1996, Error and the Growth of Experimental Knowledge , Chicago: University of Chicago Press.
  • McComas, W.F., 1996, “Ten myths of science: Reexamining what we think we know about the nature of science”, School Science and Mathematics , 96(1): 10–16.
  • Medawar, P.B., 1963/1996, “Is the scientific paper a fraud”, in The Strange Case of the Spotted Mouse and Other Classic Essays on Science , Oxford: Oxford University Press, 33–39.
  • Mill, J.S., 1963, Collected Works of John Stuart Mill , J. M. Robson (ed.), Toronto: University of Toronto Press
  • NAS, 1992, Responsible Science: Ensuring the integrity of the research process , Washington DC: National Academy Press.
  • Nersessian, N.J., 1987, “A cognitive-historical approach to meaning in scientific theories”, in The process of science , N. Nersessian (ed.), Berlin: Springer, pp. 161–177.
  • –––, 2008, Creating Scientific Concepts , Cambridge: MIT Press.
  • Newton, I., 1726, Philosophiae naturalis Principia Mathematica (3 rd edition), in The Principia: Mathematical Principles of Natural Philosophy: A New Translation , I.B. Cohen and A. Whitman (trans.), Berkeley: University of California Press, 1999.
  • –––, 1704, Opticks or A Treatise of the Reflections, Refractions, Inflections & Colors of Light , New York: Dover Publications, 1952.
  • Neyman, J., 1956, “Note on an Article by Sir Ronald Fisher”, Journal of the Royal Statistical Society. Series B (Methodological) , 18: 288–294.
  • Nickles, T., 1987, “Methodology, heuristics, and rationality”, in Rational changes in science: Essays on Scientific Reasoning , J.C. Pitt (ed.), Berlin: Springer, pp. 103–132.
  • Nicod, J., 1924, Le problème logique de l’induction , Paris: Alcan. (Engl. transl. “The Logical Problem of Induction”, in Foundations of Geometry and Induction , London: Routledge, 2000.)
  • Nola, R. and H. Sankey, 2000a, “A selective survey of theories of scientific method”, in Nola and Sankey 2000b: 1–65.
  • –––, 2000b, After Popper, Kuhn and Feyerabend. Recent Issues in Theories of Scientific Method , London: Springer.
  • –––, 2007, Theories of Scientific Method , Stocksfield: Acumen.
  • Norton, S., and F. Suppe, 2001, “Why atmospheric modeling is good science”, in Changing the Atmosphere: Expert Knowledge and Environmental Governance , C. Miller and P. Edwards (eds.), Cambridge, MA: MIT Press, 88–133.
  • O’Malley, M., 2007, “Exploratory experimentation and scientific practice: Metagenomics and the proteorhodopsin case”, History and Philosophy of the Life Sciences , 29(3): 337–360.
  • O’Malley, M., C. Haufe, K. Elliot, and R. Burian, 2009, “Philosophies of Funding”, Cell , 138: 611–615.
  • Oreskes, N., K. Shrader-Frechette, and K. Belitz, 1994, “Verification, Validation and Confirmation of Numerical Models in the Earth Sciences”, Science , 263(5147): 641–646.
  • Osborne, J., S. Simon, and S. Collins, 2003, “Attitudes towards science: a review of the literature and its implications”, International Journal of Science Education , 25(9): 1049–1079.
  • Parascandola, M., 1998, “Epidemiology—2 nd -Rate Science”, Public Health Reports , 113(4): 312–320.
  • Parker, W., 2008a, “Franklin, Holmes and the Epistemology of Computer Simulation”, International Studies in the Philosophy of Science , 22(2): 165–83.
  • –––, 2008b, “Computer Simulation through an Error-Statistical Lens”, Synthese , 163(3): 371–84.
  • Pearson, K. 1892, The Grammar of Science , London: J.M. Dents and Sons, 1951
  • Pearson, E.S., 1955, “Statistical Concepts in Their Relation to Reality”, Journal of the Royal Statistical Society , B, 17: 204–207.
  • Pickering, A., 1984, Constructing Quarks: A Sociological History of Particle Physics , Edinburgh: Edinburgh University Press.
  • Popper, K.R., 1959, The Logic of Scientific Discovery , London: Routledge, 2002
  • –––, 1963, Conjectures and Refutations , London: Routledge, 2002.
  • –––, 1985, Unended Quest: An Intellectual Autobiography , La Salle: Open Court Publishing Co..
  • Rudner, R., 1953, “The Scientist Qua Scientist Making Value Judgments”, Philosophy of Science , 20(1): 1–6.
  • Rudolph, J.L., 2005, “Epistemology for the masses: The origin of ‘The Scientific Method’ in American Schools”, History of Education Quarterly , 45(3): 341–376
  • Schickore, J., 2008, “Doing science, writing science”, Philosophy of Science , 75: 323–343.
  • Schickore, J. and N. Hangel, 2019, “‘It might be this, it should be that…’ uncertainty and doubt in day-to-day science practice”, European Journal for Philosophy of Science , 9(2): 31. doi:10.1007/s13194-019-0253-9
  • Shamoo, A.E. and D.B. Resnik, 2009, Responsible Conduct of Research , Oxford: Oxford University Press.
  • Shank, J.B., 2008, The Newton Wars and the Beginning of the French Enlightenment , Chicago: The University of Chicago Press.
  • Shapin, S. and S. Schaffer, 1985, Leviathan and the air-pump , Princeton: Princeton University Press.
  • Smith, G.E., 2002, “The Methodology of the Principia”, in The Cambridge Companion to Newton , I.B. Cohen and G.E. Smith (eds.), Cambridge: Cambridge University Press, 138–173.
  • Snyder, L.J., 1997a, “Discoverers’ Induction”, Philosophy of Science , 64: 580–604.
  • –––, 1997b, “The Mill-Whewell Debate: Much Ado About Induction”, Perspectives on Science , 5: 159–198.
  • –––, 1999, “Renovating the Novum Organum: Bacon, Whewell and Induction”, Studies in History and Philosophy of Science , 30: 531–557.
  • Sober, E., 2008, Evidence and Evolution. The logic behind the science , Cambridge: Cambridge University Press
  • Sprenger, J. and S. Hartmann, 2019, Bayesian philosophy of science , Oxford: Oxford University Press.
  • Steinle, F., 1997, “Entering New Fields: Exploratory Uses of Experimentation”, Philosophy of Science (Proceedings), 64: S65–S74.
  • –––, 2002, “Experiments in History and Philosophy of Science”, Perspectives on Science , 10(4): 408–432.
  • Strasser, B.J., 2012, “Data-driven sciences: From wonder cabinets to electronic databases”, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences , 43(1): 85–87.
  • Succi, S. and P.V. Coveney, 2018, “Big data: the end of the scientific method?”, Philosophical Transactions of the Royal Society A , 377: 20180145. doi:10.1098/rsta.2018.0145
  • Suppe, F., 1998, “The Structure of a Scientific Paper”, Philosophy of Science , 65(3): 381–405.
  • Swijtink, Z.G., 1987, “The objectification of observation: Measurement and statistical methods in the nineteenth century”, in The probabilistic revolution. Ideas in History, Vol. 1 , L. Kruger (ed.), Cambridge MA: MIT Press, pp. 261–285.
  • Waters, C.K., 2007, “The nature and context of exploratory experimentation: An introduction to three case studies of exploratory research”, History and Philosophy of the Life Sciences , 29(3): 275–284.
  • Weinberg, S., 1995, “The methods of science… and those by which we live”, Academic Questions , 8(2): 7–13.
  • Weissert, T., 1997, The Genesis of Simulation in Dynamics: Pursuing the Fermi-Pasta-Ulam Problem , New York: Springer Verlag.
  • William H., 1628, Exercitatio Anatomica de Motu Cordis et Sanguinis in Animalibus , in On the Motion of the Heart and Blood in Animals , R. Willis (trans.), Buffalo: Prometheus Books, 1993.
  • Winsberg, E., 2010, Science in the Age of Computer Simulation , Chicago: University of Chicago Press.
  • Wivagg, D. & D. Allchin, 2002, “The Dogma of the Scientific Method”, The American Biology Teacher , 64(9): 645–646
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Blackmun opinion , in Daubert v. Merrell Dow Pharmaceuticals (92–102), 509 U.S. 579 (1993).
  • Scientific Method at philpapers. Darrell Rowbottom (ed.).
  • Recent Articles | Scientific Method | The Scientist Magazine

al-Kindi | Albert the Great [= Albertus magnus] | Aquinas, Thomas | Arabic and Islamic Philosophy, disciplines in: natural philosophy and natural science | Arabic and Islamic Philosophy, historical and methodological topics in: Greek sources | Arabic and Islamic Philosophy, historical and methodological topics in: influence of Arabic and Islamic Philosophy on the Latin West | Aristotle | Bacon, Francis | Bacon, Roger | Berkeley, George | biology: experiment in | Boyle, Robert | Cambridge Platonists | confirmation | Descartes, René | Enlightenment | epistemology | epistemology: Bayesian | epistemology: social | Feyerabend, Paul | Galileo Galilei | Grosseteste, Robert | Hempel, Carl | Hume, David | Hume, David: Newtonianism and Anti-Newtonianism | induction: problem of | Kant, Immanuel | Kuhn, Thomas | Leibniz, Gottfried Wilhelm | Locke, John | Mill, John Stuart | More, Henry | Neurath, Otto | Newton, Isaac | Newton, Isaac: philosophy | Ockham [Occam], William | operationalism | Peirce, Charles Sanders | Plato | Popper, Karl | rationality: historicist theories of | Reichenbach, Hans | reproducibility, scientific | Schlick, Moritz | science: and pseudo-science | science: theory and observation in | science: unity of | scientific discovery | scientific knowledge: social dimensions of | simulations in science | skepticism: medieval | space and time: absolute and relational space and motion, post-Newtonian theories | Vienna Circle | Whewell, William | Zabarella, Giacomo

Copyright © 2021 by Brian Hepburn < brian . hepburn @ wichita . edu > Hanne Andersen < hanne . andersen @ ind . ku . dk >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

Scientific thinking and critical thinking in science education · Two distinct but symbiotically related intellectual processes

  • September 2023
  • Science & Education Online first
  • Online first

Antonio García-Carmona at Universidad de Sevilla

  • Universidad de Sevilla

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Irene Guevara Herrero

  • Antonio Joaquín Franco-Mariscal
  • José María Oliva
  • RES SCI EDUC

Mario Caracuel

  • Benjamin L. Lloyd-Lewis

Dan Miller

  • Pablo Cabrera

Jonathan Osborne

  • Daniel Pimentel
  • Bruce Alberts
  • Sam Wineburg
  • Gloria Moreno-Fontiveros

Daniel Cebrián Robles

  • María Pilar Jiménez-Aleixandre

Moritz Krell

  • Cult Stud Sci Educ

Antonio García-Carmona

  • María del Mar López-Fernández
  • Francisco González-García

Antonio-Joaquín Franco-Mariscal

  • Jordi Tena Sánchez
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PLoS Comput Biol
  • v.15(9); 2019 Sep

Logo of ploscomp

Perspective: Dimensions of the scientific method

Eberhard o. voit.

Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, United States of America

The scientific method has been guiding biological research for a long time. It not only prescribes the order and types of activities that give a scientific study validity and a stamp of approval but also has substantially shaped how we collectively think about the endeavor of investigating nature. The advent of high-throughput data generation, data mining, and advanced computational modeling has thrown the formerly undisputed, monolithic status of the scientific method into turmoil. On the one hand, the new approaches are clearly successful and expect the same acceptance as the traditional methods, but on the other hand, they replace much of the hypothesis-driven reasoning with inductive argumentation, which philosophers of science consider problematic. Intrigued by the enormous wealth of data and the power of machine learning, some scientists have even argued that significant correlations within datasets could make the entire quest for causation obsolete. Many of these issues have been passionately debated during the past two decades, often with scant agreement. It is proffered here that hypothesis-driven, data-mining–inspired, and “allochthonous” knowledge acquisition, based on mathematical and computational models, are vectors spanning a 3D space of an expanded scientific method. The combination of methods within this space will most certainly shape our thinking about nature, with implications for experimental design, peer review and funding, sharing of result, education, medical diagnostics, and even questions of litigation.

The traditional scientific method: Hypothesis-driven deduction

Research is the undisputed core activity defining science. Without research, the advancement of scientific knowledge would come to a screeching halt. While it is evident that researchers look for new information or insights, the term “research” is somewhat puzzling. Never mind the prefix “re,” which simply means “coming back and doing it again and again,” the word “search” seems to suggest that the research process is somewhat haphazard, that not much of a strategy is involved in the process. One might argue that research a few hundred years ago had the character of hoping for enough luck to find something new. The alchemists come to mind in their quest to turn mercury or lead into gold, or to discover an elixir for eternal youth, through methods we nowadays consider laughable.

Today’s sciences, in stark contrast, are clearly different. Yes, we still try to find something new—and may need a good dose of luck—but the process is anything but unstructured. In fact, it is prescribed in such rigor that it has been given the widely known moniker “scientific method.” This scientific method has deep roots going back to Aristotle and Herophilus (approximately 300 BC), Avicenna and Alhazen (approximately 1,000 AD), Grosseteste and Robert Bacon (approximately 1,250 AD), and many others, but solidified and crystallized into the gold standard of quality research during the 17th and 18th centuries [ 1 – 7 ]. In particular, Sir Francis Bacon (1561–1626) and René Descartes (1596–1650) are often considered the founders of the scientific method, because they insisted on careful, systematic observations of high quality, rather than metaphysical speculations that were en vogue among the scholars of the time [ 1 , 8 ]. In contrast to their peers, they strove for objectivity and insisted that observations, rather than an investigator’s preconceived ideas or superstitions, should be the basis for formulating a research idea [ 7 , 9 ].

Bacon and his 19th century follower John Stuart Mill explicitly proposed gaining knowledge through inductive reasoning: Based on carefully recorded observations, or from data obtained in a well-planned experiment, generalized assertions were to be made about similar yet (so far) unobserved phenomena [ 7 ]. Expressed differently, inductive reasoning attempts to derive general principles or laws directly from empirical evidence [ 10 ]. An example is the 19th century epigram of the physician Rudolf Virchow, Omnis cellula e cellula . There is no proof that indeed “every cell derives from a cell,” but like Virchow, we have made the observation time and again and never encountered anything suggesting otherwise.

In contrast to induction, the widely accepted, traditional scientific method is based on formulating and testing hypotheses. From the results of these tests, a deduction is made whether the hypothesis is presumably true or false. This type of hypotheticodeductive reasoning goes back to William Whewell, William Stanley Jevons, and Charles Peirce in the 19th century [ 1 ]. By the 20th century, the deductive, hypothesis-based scientific method had become deeply ingrained in the scientific psyche, and it is now taught as early as middle school in order to teach students valid means of discovery [ 8 , 11 , 12 ]. The scientific method has not only guided most research studies but also fundamentally influenced how we think about the process of scientific discovery.

Alas, because biology has almost no general laws, deduction in the strictest sense is difficult. It may therefore be preferable to use the term abduction, which refers to the logical inference toward the most plausible explanation, given a set of observations, although this explanation cannot be proven and is not necessarily true.

Over the decades, the hypothesis-based scientific method did experience variations here and there, but its conceptual scaffold remained essentially unchanged ( Fig 1 ). Its key is a process that begins with the formulation of a hypothesis that is to be rigorously tested, either in the wet lab or computationally; nonadherence to this principle is seen as lacking rigor and can lead to irreproducible results [ 1 , 13 – 15 ].

An external file that holds a picture, illustration, etc.
Object name is pcbi.1007279.g001.jpg

The central concept of the traditional scientific method is a falsifiable hypothesis regarding some phenomenon of interest. This hypothesis is to be tested experimentally or computationally. The test results support or refute the hypothesis, triggering a new round of hypothesis formulation and testing.

Going further, the prominent philosopher of science Sir Karl Popper argued that a scientific hypothesis can never be verified but that it can be disproved by a single counterexample. He therefore demanded that scientific hypotheses had to be falsifiable, because otherwise, testing would be moot [ 16 , 17 ] (see also [ 18 ]). As Gillies put it, “successful theories are those that survive elimination through falsification” [ 19 ]. Kelley and Scott agreed to some degree but warned that complete insistence on falsifiability is too restrictive as it would mark many computational techniques, statistical hypothesis testing, and even Darwin’s theory of evolution as nonscientific [ 20 ].

While the hypothesis-based scientific method has been very successful, its exclusive reliance on deductive reasoning is dangerous because according to the so-called Duhem–Quine thesis, hypothesis testing always involves an unknown number of explicit or implicit assumptions, some of which may steer the researcher away from hypotheses that seem implausible, although they are, in fact, true [ 21 ]. According to Kuhn, this bias can obstruct the recognition of paradigm shifts [ 22 ], which require the rethinking of previously accepted “truths” and the development of radically new ideas [ 23 , 24 ]. The testing of simultaneous alternative hypotheses [ 25 – 27 ] ameliorates this problem to some degree but not entirely.

The traditional scientific method is often presented in discrete steps, but it should really be seen as a form of critical thinking, subject to review and independent validation [ 8 ]. It has proven very influential, not only by prescribing valid experimentation, but also for affecting the way we attempt to understand nature [ 18 ], for teaching [ 8 , 12 ], reporting, publishing, and otherwise sharing information [ 28 ], for peer review and the awarding of funds by research-supporting agencies [ 29 , 30 ], for medical diagnostics [ 7 ], and even in litigation [ 31 ].

A second dimension of the scientific method: Data-mining–inspired induction

A major shift in biological experimentation occurred with the–omics revolution of the early 21st century. All of a sudden, it became feasible to perform high-throughput experiments that generated thousands of measurements, typically characterizing the expression or abundances of very many—if not all—genes, proteins, metabolites, or other biological quantities in a sample.

The strategy of measuring large numbers of items in a nontargeted fashion is fundamentally different from the traditional scientific method and constitutes a new, second dimension of the scientific method. Instead of hypothesizing and testing whether gene X is up-regulated under some altered condition, the leading question becomes which of the thousands of genes in a sample are up- or down-regulated. This shift in focus elevates the data to the supreme role of revealing novel insights by themselves ( Fig 2 ). As an important, generic advantage over the traditional strategy, this second dimension is free of a researcher’s preconceived notions regarding the molecular mechanisms governing the phenomenon of interest, which are otherwise the key to formulating a hypothesis. The prominent biologists Patrick Brown and David Botstein commented that “the patterns of expression will often suffice to begin de novo discovery of potential gene functions” [ 32 ].

An external file that holds a picture, illustration, etc.
Object name is pcbi.1007279.g002.jpg

Data-driven research begins with an untargeted exploration, in which the data speak for themselves. Machine learning extracts patterns from the data, which suggest hypotheses that are to be tested in the lab or computationally.

This data-driven, discovery-generating approach is at once appealing and challenging. On the one hand, very many data are explored simultaneously and essentially without bias. On the other hand, the large datasets supporting this approach create a genuine challenge to understanding and interpreting the experimental results because the thousands of data points, often superimposed with a fair amount of noise, make it difficult to detect meaningful differences between sample and control. This situation can only be addressed with computational methods that first “clean” the data, for instance, through the statistically valid removal of outliers, and then use machine learning to identify statistically significant, distinguishing molecular profiles or signatures. In favorable cases, such signatures point to specific biological pathways, whereas other signatures defy direct explanation but may become the launch pad for follow-up investigations [ 33 ].

Today’s scientists are very familiar with this discovery-driven exploration of “what’s out there” and might consider it a quaint quirk of history that this strategy was at first widely chastised and ridiculed as a “fishing expedition” [ 30 , 34 ]. Strict traditionalists were outraged that rigor was leaving science with the new approach and that sufficient guidelines were unavailable to assure the validity and reproducibility of results [ 10 , 35 , 36 ].

From the view point of philosophy of science, this second dimension of the scientific method uses inductive reasoning and reflects Bacon’s idea that observations can and should dictate the research question to be investigated [ 1 , 7 ]. Allen [ 36 ] forcefully rejected this type of reasoning, stating “the thinking goes, we can now expect computer programs to derive significance, relevance and meaning from chunks of information, be they nucleotide sequences or gene expression profiles… In contrast with this view, many are convinced that no purely logical process can turn observation into understanding.” His conviction goes back to the 18th century philosopher David Hume and again to Popper, who identified as the overriding problem with inductive reasoning that it can never truly reveal causality, even if a phenomenon is observed time and again [ 16 , 17 , 37 , 38 ]. No number of observations, even if they always have the same result, can guard against an exception that would violate the generality of a law inferred from these observations [ 1 , 35 ]. Worse, Popper argued, through inference by induction, we cannot even know the probability of something being true [ 10 , 17 , 36 ].

Others argued that data-driven and hypothesis-driven research actually do not differ all that much in principle, as long as there is cycling between developing new ideas and testing them with care [ 27 ]. In fact, Kell and Oliver [ 34 ] maintained that the exclusive acceptance of hypothesis-driven programs misrepresents the complexities of biological knowledge generation. Similarly refuting the prominent rule of deduction, Platt [ 26 ] and Beard and Kushmerick [ 27 ] argued that repeated inductive reasoning, called strong inference, corresponds to a logically sound decision tree of disproving or refining hypotheses that can rapidly yield firm conclusions; nonetheless, Platt had to admit that inductive inference is not as certain as deduction, because it projects into the unknown. Lander compared the task of obtaining causality by induction to the problem of inferring the design of a microprocessor from input-output readings, which in a strict sense is impossible, because the microprocessor could be arbitrarily complicated; even so, inference often leads to novel insights and therefore is valuable [ 39 ].

An interesting special case of almost pure inductive reasoning is epidemiology, where hypothesis-driven reasoning is rare and instead, the fundamental question is whether data-based evidence is sufficient to associate health risks with specific causes [ 31 , 34 ].

Recent advances in machine learning and “big-data” mining have driven the use of inductive reasoning to unprecedented heights. As an example, machine learning can greatly assist in the discovery of patterns, for instance, in biological sequences [ 40 ]. Going a step further, a pithy article by Andersen [ 41 ] proffered that we may not need to look for causality or mechanistic explanations anymore if we just have enough correlation: “With enough data, the numbers speak for themselves, correlation replaces causation, and science can advance even without coherent models or unified theories.”

Of course, the proposal to abandon the quest for causality caused pushback on philosophical as well as mathematical grounds. Allen [ 10 , 35 ] considered the idea “absurd” that data analysis could enhance understanding in the absence of a hypothesis. He felt confident “that even the formidable combination of computing power with ease of access to data cannot produce a qualitative shift in the way that we do science: the making of hypotheses remains an indispensable component in the growth of knowledge” [ 36 ]. Succi and Coveney [ 42 ] refuted the “most extravagant claims” of big-data proponents very differently, namely by analyzing the theories on which machine learning is founded. They contrasted the assumptions underlying these theories, such as the law of large numbers, with the mathematical reality of complex biological systems. Specifically, they carefully identified genuine features of these systems, such as nonlinearities, nonlocality of effects, fractal aspects, and high dimensionality, and argued that they fundamentally violate some of the statistical assumptions implicitly underlying big-data analysis, like independence of events. They concluded that these discrepancies “may lead to false expectations and, at their nadir, even to dangerous social, economical and political manipulation.” To ameliorate the situation, the field of big-data analysis would need new strong theorems characterizing the validity of its methods and the numbers of data required for obtaining reliable insights. Succi and Coveney go as far as stating that too many data are just as bad as insufficient data [ 42 ].

While philosophical doubts regarding inductive methods will always persist, one cannot deny that -omics-based, high-throughput studies, combined with machine learning and big-data analysis, have been very successful [ 43 ]. Yes, induction cannot truly reveal general laws, no matter how large the datasets, but they do provide insights that are very different from what science had offered before and may at least suggest novel patterns, trends, or principles. As a case in point, if many transcriptomic studies indicate that a particular gene set is involved in certain classes of phenomena, there is probably some truth to the observation, even though it is not mathematically provable. Kepler’s laws of astronomy were arguably derived solely from inductive reasoning [ 34 ].

Notwithstanding the opposing views on inductive methods, successful strategies shape how we think about science. Thus, to take advantage of all experimental options while ensuring quality of research, we must not allow that “anything goes” but instead identify and characterize standard operating procedures and controls that render this emerging scientific method valid and reproducible. A laudable step in this direction was the wide acceptance of “minimum information about a microarray experiment” (MIAME) standards for microarray experiments [ 44 ].

A third dimension of the scientific method: Allochthonous reasoning

Parallel to the blossoming of molecular biology and the rapid rise in the power and availability of computing in the late 20th century, the use of mathematical and computational models became increasingly recognized as relevant and beneficial for understanding biological phenomena. Indeed, mathematical models eventually achieved cornerstone status in the new field of computational systems biology.

Mathematical modeling has been used as a tool of biological analysis for a long time [ 27 , 45 – 48 ]. Interesting for the discussion here is that the use of mathematical and computational modeling in biology follows a scientific approach that is distinctly different from the traditional and the data-driven methods, because it is distributed over two entirely separate domains of knowledge. One consists of the biological reality of DNA, elephants, and roses, whereas the other is the world of mathematics, which is governed by numbers, symbols, theorems, and abstract work protocols. Because the ways of thinking—and even the languages—are different in these two realms, I suggest calling this type of knowledge acquisition “allochthonous” (literally Greek: in or from a “piece of land different from where one is at home”; one could perhaps translate it into modern lingo as “outside one’s comfort zone”). De facto, most allochthonous reasoning in biology presently refers to mathematics and computing, but one might also consider, for instance, the application of methods from linguistics in the analysis of DNA sequences or proteins [ 49 ].

One could argue that biologists have employed “models” for a long time, for instance, in the form of “model organisms,” cell lines, or in vitro experiments, which more or less faithfully reflect features of the organisms of true interest but are easier to manipulate. However, this type of biological model use is rather different from allochthonous reasoning, as it does not leave the realm of biology and uses the same language and often similar methodologies.

A brief discussion of three experiences from our lab may illustrate the benefits of allochthonous reasoning. (1) In a case study of renal cell carcinoma, a dynamic model was able to explain an observed yet nonintuitive metabolic profile in terms of the enzymatic reaction steps that had been altered during the disease [ 50 ]. (2) A transcriptome analysis had identified several genes as displaying significantly different expression patterns during malaria infection in comparison to the state of health. Considered by themselves and focusing solely on genes coding for specific enzymes of purine metabolism, the findings showed patterns that did not make sense. However, integrating the changes in a dynamic model revealed that purine metabolism globally shifted, in response to malaria, from guanine compounds to adenine, inosine, and hypoxanthine [ 51 ]. (3) Data capturing the dynamics of malaria parasites suggested growth rates that were biologically impossible. Speculation regarding possible explanations led to the hypothesis that many parasite-harboring red blood cells might “hide” from circulation and therewith from detection in the blood stream. While experimental testing of the feasibility of the hypothesis would have been expensive, a dynamic model confirmed that such a concealment mechanism could indeed quantitatively explain the apparently very high growth rates [ 52 ]. In all three cases, the insights gained inductively from computational modeling would have been difficult to obtain purely with experimental laboratory methods. Purely deductive allochthonous reasoning is the ultimate goal of the search for design and operating principles [ 53 – 55 ], which strives to explain why certain structures or functions are employed by nature time and again. An example is a linear metabolic pathway, in which feedback inhibition is essentially always exerted on the first step [ 56 , 57 ]. This generality allows the deduction that a so far unstudied linear pathway is most likely (or even certain to be) inhibited at the first step. Not strictly deductive—but rather abductive—was a study in our lab in which we analyzed time series data with a mathematical model that allowed us to infer the most likely regulatory structure of a metabolic pathway [ 58 , 59 ].

A typical allochthonous investigation begins in the realm of biology with the formulation of a hypothesis ( Fig 3 ). Instead of testing this hypothesis with laboratory experiments, the system encompassing the hypothesis is moved into the realm of mathematics. This move requires two sets of ingredients. One set consists of the simplification and abstraction of the biological system: Any distracting details that seem unrelated to the hypothesis and its context are omitted or represented collectively with other details. This simplification step carries the greatest risk of the entire modeling approach, as omission of seemingly negligible but, in truth, important details can easily lead to wrong results. The second set of ingredients consists of correspondence rules that translate every biological component or process into the language of mathematics [ 60 , 61 ].

An external file that holds a picture, illustration, etc.
Object name is pcbi.1007279.g003.jpg

This mathematical and computational approach is distributed over two realms, which are connected by correspondence rules.

Once the system is translated, it has become an entirely mathematical construct that can be analyzed purely with mathematical and computational means. The results of this analysis are also strictly mathematical. They typically consist of values of variables, magnitudes of processes, sensitivity patterns, signs of eigenvalues, or qualitative features like the onset of oscillations or the potential for limit cycles. Correspondence rules are used again to move these results back into the realm of biology. As an example, the mathematical result that “two eigenvalues have positive real parts” does not make much sense to many biologists, whereas the interpretation that “the system is not stable at the steady state in question” is readily explained. New biological insights may lead to new hypotheses, which are tested either by experiments or by returning once more to the realm of mathematics. The model design, diagnosis, refinements, and validation consist of several phases, which have been discussed widely in the biomathematical literature. Importantly, each iteration of a typical modeling analysis consists of a move from the biological to the mathematical realm and back.

The reasoning within the realm of mathematics is often deductive, in the form of an Aristotelian syllogism, such as the well-known “All men are mortal; Socrates is a man; therefore, Socrates is mortal.” However, the reasoning may also be inductive, as it is the case with large-scale Monte-Carlo simulations that generate arbitrarily many “observations,” although they cannot reveal universal principles or theorems. An example is a simulation randomly drawing numbers in an attempt to show that every real number has an inverse. The simulation will always attest to this hypothesis but fail to discover the truth because it will never randomly draw 0. Generically, computational models may be considered sets of hypotheses, formulated as equations or as algorithms that reflect our perception of a complex system [ 27 ].

Impact of the multidimensional scientific method on learning

Almost all we know in biology has come from observation, experimentation, and interpretation. The traditional scientific method not only offered clear guidance for this knowledge gathering, but it also fundamentally shaped the way we think about the exploration of nature. When presented with a new research question, scientists were trained to think immediately in terms of hypotheses and alternatives, pondering the best feasible ways of testing them, and designing in their minds strong controls that would limit the effects of known or unknown confounders. Shaped by the rigidity of this ever-repeating process, our thinking became trained to move forward one well-planned step at a time. This modus operandi was rigid and exact. It also minimized the erroneous pursuit of long speculative lines of thought, because every step required testing before a new hypothesis was formed. While effective, the process was also very slow and driven by ingenuity—as well as bias—on the scientist’s part. This bias was sometimes a hindrance to necessary paradigm shifts [ 22 ].

High-throughput data generation, big-data analysis, and mathematical-computational modeling changed all that within a few decades. In particular, the acceptance of inductive principles and of the allochthonous use of nonbiological strategies to answer biological questions created an unprecedented mix of successes and chaos. To the horror of traditionalists, the importance of hypotheses became minimized, and the suggestion spread that the data would speak for themselves [ 36 ]. Importantly, within this fog of “anything goes,” the fundamental question arose how to determine whether an experiment was valid.

Because agreed-upon operating procedures affect research progress and interpretation, thinking, teaching, and sharing of results, this question requires a deconvolution of scientific strategies. Here I proffer that the single scientific method of the past should be expanded toward a vector space of scientific methods, with spanning vectors that correspond to different dimensions of the scientific method ( Fig 4 ).

An external file that holds a picture, illustration, etc.
Object name is pcbi.1007279.g004.jpg

The traditional hypothesis-based deductive scientific method is expanded into a 3D space that allows for synergistic blends of methods that include data-mining–inspired, inductive knowledge acquisition, and mathematical model-based, allochthonous reasoning.

Obviously, all three dimensions have their advantages and drawbacks. The traditional, hypothesis-driven deductive method is philosophically “clean,” except that it is confounded by preconceptions and assumptions. The data-mining–inspired inductive method cannot offer universal truths but helps us explore very large spaces of factors that contribute to a phenomenon. Allochthonous, model-based reasoning can be performed mentally, with paper and pencil, through rigorous analysis, or with a host of computational methods that are precise and disprovable [ 27 ]. At the same time, they are incomparable faster, cheaper, and much more comprehensive than experiments in molecular biology. This reduction in cost and time, and the increase in coverage, may eventually have far-reaching consequences, as we can already fathom from much of modern physics.

Due to its long history, the traditional dimension of the scientific method is supported by clear and very strong standard operating procedures. Similarly, strong procedures need to be developed for the other two dimensions. The MIAME rules for microarray analysis provide an excellent example [ 44 ]. On the mathematical modeling front, no such rules are generally accepted yet, but trends toward them seem to emerge at the horizon. For instance, it seems to be becoming common practice to include sensitivity analyses in typical modeling studies and to assess the identifiability or sloppiness of ensembles of parameter combinations that fit a given dataset well [ 62 , 63 ].

From a philosophical point of view, it seems unlikely that objections against inductive reasoning will disappear. However, instead of pitting hypothesis-based deductive reasoning against inductivism, it seems more beneficial to determine how the different methods can be synergistically blended ( cf . [ 18 , 27 , 34 , 42 ]) as linear combinations of the three vectors of knowledge acquisition ( Fig 4 ). It is at this point unclear to what degree the identified three dimensions are truly independent of each other, whether additional dimensions should be added [ 24 ], or whether the different versions could be amalgamated into a single scientific method [ 18 ], especially if it is loosely defined as a form of critical thinking [ 8 ]. Nobel Laureate Percy Bridgman even concluded that “science is what scientists do, and there are as many scientific methods as there are individual scientists” [ 8 , 64 ].

Combinations of the three spanning vectors of the scientific method have been emerging for some time. Many biologists already use inductive high-throughput methods to develop specific hypotheses that are subsequently tested with deductive or further inductive methods [ 34 , 65 ]. In terms of including mathematical modeling, physics and geology have been leading the way for a long time, often by beginning an investigation in theory, before any actual experiment is performed. It will benefit biology to look into this strategy and to develop best practices of allochthonous reasoning.

The blending of methods may take quite different shapes. Early on, Ideker and colleagues [ 65 ] proposed an integrated experimental approach for pathway analysis that offered a glimpse of new experimental strategies within the space of scientific methods. In a similar vein, Covert and colleagues [ 66 ] included computational methods into such an integrated approach. Additional examples of blended analyses in systems biology can be seen in other works, such as [ 43 , 67 – 73 ]. Generically, it is often beneficial to start with big data, determine patterns in associations and correlations, then switch to the mathematical realm in order to filter out spurious correlations in a high-throughput fashion. If this procedure is executed in an iterative manner, the “surviving” associations have an increased level of confidence and are good candidates for further experimental or computational testing (personal communication from S. Chandrasekaran).

If each component of a blended scientific method follows strict, commonly agreed guidelines, “linear combinations” within the 3D space can also be checked objectively, per deconvolution. In addition, guidelines for synergistic blends of component procedures should be developed. If we carefully monitor such blends, time will presumably indicate which method is best for which task and how the different approaches optimally inform each other. For instance, it will be interesting to study whether there is an optimal sequence of experiments along the three axes for a particular class of tasks. Big-data analysis together with inductive reasoning might be optimal for creating initial hypotheses and possibly refuting wrong speculations (“we had thought this gene would be involved, but apparently it isn’t”). If the logic of an emerging hypotheses can be tested with mathematical and computational tools, it will almost certainly be faster and cheaper than an immediate launch into wet-lab experimentation. It is also likely that mathematical reasoning will be able to refute some apparently feasible hypothesis and suggest amendments. Ultimately, the “surviving” hypotheses must still be tested for validity through conventional experiments. Deconvolving current practices and optimizing the combination of methods within the 3D or higher-dimensional space of scientific methods will likely result in better planning of experiments and in synergistic blends of approaches that have the potential capacity of addressing some of the grand challenges in biology.

Acknowledgments

The author is very grateful to Dr. Sriram Chandrasekaran and Ms. Carla Kumbale for superb suggestions and invaluable feedback.

Funding Statement

This work was supported in part by grants from the National Science Foundation ( https://www.nsf.gov/div/index.jsp?div=MCB ) grant NSF-MCB-1517588 (PI: EOV), NSF-MCB-1615373 (PI: Diana Downs) and the National Institute of Environmental Health Sciences ( https://www.niehs.nih.gov/ ) grant NIH-2P30ES019776-05 (PI: Carmen Marsit). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Bookmark this page

Translate this page from English...

*Machine translated pages not guaranteed for accuracy. Click Here for our professional translations.

Defining Critical Thinking


Everyone thinks; it is our nature to do so. But much of our thinking, left to itself, is biased, distorted, partial, uninformed or down-right prejudiced. Yet the quality of our life and that of what we produce, make, or build depends precisely on the quality of our thought. Shoddy thinking is costly, both in money and in quality of life. Excellence in thought, however, must be systematically cultivated.


Critical thinking is that mode of thinking - about any subject, content, or problem - in which the thinker improves the quality of his or her thinking by skillfully taking charge of the structures inherent in thinking and imposing intellectual standards upon them.



Foundation for Critical Thinking Press, 2008)

Teacher’s College, Columbia University, 1941)



Encyclopedia Britannica

  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • Games & Quizzes
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center

flow chart of scientific method

scientific method

Our editors will review what you’ve submitted and determine whether to revise the article.

  • University of Nevada, Reno - College of Agriculture, Biotechnology and Natural Resources Extension - The Scientific Method
  • World History Encyclopedia - Scientific Method
  • LiveScience - What Is Science?
  • Verywell Mind - Scientific Method Steps in Psychology Research
  • WebMD - What is the Scientific Method?
  • Chemistry LibreTexts - The Scientific Method
  • National Center for Biotechnology Information - PubMed Central - Redefining the scientific method: as the use of sophisticated scientific methods that extend our mind
  • Khan Academy - The scientific method
  • Simply Psychology - What are the steps in the Scientific Method?
  • Stanford Encyclopedia of Philosophy - Scientific Method

flow chart of scientific method

scientific method , mathematical and experimental technique employed in the sciences . More specifically, it is the technique used in the construction and testing of a scientific hypothesis .

The process of observing, asking questions, and seeking answers through tests and experiments is not unique to any one field of science. In fact, the scientific method is applied broadly in science, across many different fields. Many empirical sciences, especially the social sciences , use mathematical tools borrowed from probability theory and statistics , together with outgrowths of these, such as decision theory , game theory , utility theory, and operations research . Philosophers of science have addressed general methodological problems, such as the nature of scientific explanation and the justification of induction .

difference between scientific method and critical thinking

The scientific method is critical to the development of scientific theories , which explain empirical (experiential) laws in a scientifically rational manner. In a typical application of the scientific method, a researcher develops a hypothesis , tests it through various means, and then modifies the hypothesis on the basis of the outcome of the tests and experiments. The modified hypothesis is then retested, further modified, and tested again, until it becomes consistent with observed phenomena and testing outcomes. In this way, hypotheses serve as tools by which scientists gather data. From that data and the many different scientific investigations undertaken to explore hypotheses, scientists are able to develop broad general explanations, or scientific theories.

See also Mill’s methods ; hypothetico-deductive method .

* Note: it is page 43 in the 6th edition

Dany S. Adams, Department of Biology, Smith College, Northampton, MA 01063



Sidelight in Gilbert's (2000, Sinauer Associates); that is, I harp on correlation, necessity, and sufficiency, and the kinds of experiments required to gather each type of evidence. In my own class, an upper division Developmental Biology lecture class, I use these techniques, which include both verbal and written reinforcement, to encourage students to evaluate claims about cause and effect, that is, to distinguish between correlation and causation; however, I believe that with very slight modifications, these tricks can be applied in a much greater array of situations.







. I am impressed over an over again by the improvement in my students' ability to UNDERSTAND the primary literature, to ASSESS the validity of claims, and to THINK critically about how to answer questions.



for one of my other classes and am reading this book on the microbes. I came across this paragraph, part of which I have to share with you!! It talks about how... 'the intimin of was shown to be NECESSARY BUT NOT SUFFICIENT to induce lesions.' I just thought it was so cool that I am reading this highly scientific book and can make sense of concepts that would have been so foreign to me not all that long ago!!"



warning the students that they will be asked to think about the experimental basis of knowledge. I read this out loud during the first class. Difference: it takes an extra two minutes.

. ". Every time a technique is mentioned in class, we pull out the toolbox and write notes about the technique in the appropriate box. Difference: by the end of the semester, the students have been introduced to, and thought about how to use,, an impressive number of techniques, and they UNDERSTAND the power and the limitations of those techniques. On a very practical level, they end up with a list of techniques and controls they can consult in the future.

. Difference: students actually UNDERSTAND controls.

, always worth 50%, that asks the students to make a hypothesis about an unfamiliar observation then design experiments to test the hypothesis:

       
Ion      
DNA      
RNA      
Protein Immunocytochemistry Western Blot w/ pure protein Stain known positive cells Pre-immune serum
2nd Ab only
Cell      
Tissue      

       
Ion      
DNA      
RNA      
Protein      
Cell      
Tissue Remove tissue Stain for marker
Histology
Remove then return

       
Ion      
DNA Transfect gene (w/inducible promoter & reporter) Look for reporter; northern &/or western Transfect with neutral DNA
RNA      
Protein      
Cell      
Tissue      

Posted on the SDB Web Site Monday, July 26, 1999, Modified Wednesday, December 27, 2000

Advertisement

Advertisement

An empirical analysis of the relationship between nature of science and critical thinking through science definitions and thinking skills

  • Original Paper
  • Open access
  • Published: 08 December 2022
  • Volume 2 , article number  270 , ( 2022 )

Cite this article

You have full access to this open access article

difference between scientific method and critical thinking

  • María Antonia Manassero-Mas   ORCID: orcid.org/0000-0002-7804-7779 1 &
  • Ángel Vázquez-Alonso   ORCID: orcid.org/0000-0001-5830-7062 2  

2227 Accesses

2 Citations

Explore all metrics

Critical thinking (CRT) skills transversally pervade education and nature of science (NOS) knowledge is a key component of science literacy. Some science education researchers advocate that CRT skills and NOS knowledge have a mutual impact and relationship. However, few research studies have undertaken the empirical confirmation of this relationship and most fail to match the two terms of the relationship adequately. This paper aims to test the relationship by applying correlation, regression and ANOVA procedures to the students’ answers to two tests that measure thinking skills and science definitions. The results partly confirm the hypothesised relationship, which displays some complex features: on the one hand, the relationship is positive and significant for the NOS variables that express adequate ideas about science. However, it is non-significant when the NOS variables depict misinformed ideas about science. Furthermore, the comparison of the two student cohorts reveals that two years of science instruction do not seem to contribute to advancing students’ NOS conceptions. Finally, some interpretations and consequences of these results for scientific literacy, teaching NOS (paying attention both to informed and misinformed ideas), for connecting NOS with general epistemic knowledge, and assessing CRT skills are discussed.

Similar content being viewed by others

difference between scientific method and critical thinking

Philosophical Inquiry and Critical Thinking in Primary and Secondary Science Education

difference between scientific method and critical thinking

Scientific Thinking and Critical Thinking in Science Education

difference between scientific method and critical thinking

Using Socioscientific Issues to Promote the Critical Thinking Skills of Year 10 Science Students in Diverse School Contexts

Avoid common mistakes on your manuscript.

Introduction

Among other objectives, school science education perennially aims to improve scientific literacy for all, which involves being useful and functional for making adequate and sound personal and social daily life decisions. An essential component of scientific literacy is the knowledge “about” science, that is, knowledge about how science works and validates its knowledge and intervenes in the world (along with technology). This study focuses on the knowledge about science, which is often referred to in the literature as nature of science (NOS), scientific practice, ideas about science, etc., in turn, related to a continuous innovative teaching tradition (Vesterinen et al., 2014 ; Khishfe, 2012 ; Lederman, 2007 ; Matthews, 2012 ; McComas, 1996 ; Olson, 2018 ; among others).

On the other hand, some international reports and experts state that critical thinking (CRT) skills are key and transversal competencies for all educational levels, subjects and jobs in the 21st century. For instance, the European Union ( 2014 ) proposes seven key competencies that require developing a set of transversal skills, namely CRT, creativity, initiative, problem-solving, risk assessment, decision-making, communication and constructive management of emotions. In the same vein, the National Research Council ( 2012 ) proposes the transferable knowledge and skills for life and work, which explicitly details the following skills: argumentation, problem-solving, decision-making, analysis, interpretation, creativity, and others. In short, these and many other proposals converge in pointing out that teaching students to think and educating in CRT skills is an innovative and significant challenge for 21st century education and, of course, for science education. The CRT construct has been widely developed within psychological research. Yet, the field is complex, and terminologically bewildering (i.e., higher-order skills, cognitive skills, thinking skills, CRT, and other terms are used interchangeably), and some controversies are still unresolved. For instance, scholars do not agree on a common definition of CRT, and the most appropriate set of skills and dispositions to depict CRT is also disputed. As the differences among scholars still persist, the term CRT will be adopted hereafter to generally describe the variety of higher-order thinking skills that are usually associated in the CRT literature.

Further, some science education research currently suggests connections between NOS and CRT, arguing that CRT skills and NOS knowledge are related. Some claim that thinking skills are key to learning NOS (Erduran & Kaya, 2018 ; Ford & Yore, 2014 ; García-Mila & Andersen, 2008 ; Simonneaux, 2014 ), and specifically, that argumentation skills may enhance NOS understanding (Khishfe et al., 2017 ). In contrast, as argumentation skills are a key competence for the construction and validation of scientific knowledge, other studies claim that NOS knowledge (i.e., understanding the differences between data and claims) is also key to learning CRT skills such as argumentation (Allchin & Zemplén, 2020 ; Greene et al., 2016 ; Settlage & Southerland, 2020 ). Both directions of this intuitive relationship between CRT skills and NOS are fruitful ways to enhance scientific literacy and general learning. Hence, this study aims to empirically explore the NOS-CRT relationship, as the prior literature is somewhat mystifying and its contributions are limited, as will be shown below.

Theoretical contextualization

This study copes with two different, vast and rich realms of research, namely NOS and CRT, and their theoretical frameworks: the interdisciplinary context of philosophy, sociology, and history of science and science education for NOS; and psychology and general education for CRT skills. Both frameworks are summarized below to meet the journal space limitations.

Under the NOS label, science education has developed a fertile and vast realm of “knowledge about scientific knowledge and knowing”, which is obviously a particular case of human thinking, and probably the most developed to date. NOS represents the meta-cognitive, multifaceted and dynamic knowledge about what science is and how science works as a social way of knowing and explaining the natural world (knowledge construction and validation). This knowledge has been interdisciplinarily elaborated from history, philosophy, sociology of science and technology, and other disciplines. Scholars raised many and varied NOS issues (Matthews, 2012 ), which are relevant to scientific research and widely surpass the reduced consensus view (Lederman, 2007 ). Despite NOS complexity, it has been systematized across two broad dimensions: epistemological and social (Erduran & Dagher, 2014 ; Manassero-Mass & Vazquez-Alonso, 2019 ). The epistemological dimension refers to the principles and values underlying knowledge construction and validation, which are often described as the scientific method, empirical basis, observation, data and inference, tentativeness, theory and law, creativity, subjectivity, demarcation, and many others. The social dimension refers to the social construction of scientific knowledge and its social impact. It often deals with the scientific community and institutions, social influences, and general science-technology-society interactions (peer evaluation, communication, gender, innovation, development, funding, technology, psychology, etc.).

From its beginning, NOS research agrees that students (and teachers) hold inadequate and misinformed beliefs on NOS issues across different educational levels and contexts. Further, researchers agree that effective NOS teaching requires explicit and reflective methods to overcome the many learning barriers (Bennássarr et al., 2010 ; García et al., 2011 ; Cofré et al., 2019 ; Deng et al., 2011 ). These barriers relate to the basic processes of gathering (observation) and elaborating (analysis) data, decision-making in science, and specifically, the inability to differentiate facts and explanations and adequately coordinate evidence, justifications, arguments and conclusions; the lack of elementary meta-cognitive and self-regulation skills (i.e., the quick jump to conclusions as self-evident); the introduction of personal opinions, inferences, and reinterpretations and the dismissal of the counter-arguments or evidence that may contradict personal ideas (García-Mila & Andersen, 2008 ; McDonald & McRobbie, 2012 ).

As these barriers point directly to the general abilities involved in thinking (observation, analysis, answering questions, solving problems, decision-making and the like), researchers attribute those difficulties to the lack of the cognitive skills involved in the adequate management of the barriers, whose higher-order cognitive nature corresponds to many CRT skills (Kolstø, 2001 ; Zeidler et al., 2002 ). Thus, the solutions to overcome the barriers imply mastering the CRT skills, and, consequently, achieving successful NOS learning (Ford & Yore, 2014 ; McDonald & McRobbie, 2012 ; Simonneaux, 2014 ). Erduran and Kaya ( 2018 ) argue that the perennial aim of developing students’ and teachers’ NOS epistemic insights still remains a challenge for science education, despite decades of NOS research, due to the many aspects involved. They conclude that NOS knowledge critically demands higher-order cognitive skills. The paragraphs below elaborate on these higher-order cognitive skills or CRT skills.

Critical thinking

As previously stated, the CRT field shows many differences in scholarly knowledge on the conceptualization and composition of CRT. Ennis’ ( 1996 ) simple definition of CRT as reasonable reflective thinking focused on deciding what to believe or do is likely the most celebrated definition among many others. A Delphi panel of experts defined CRT as an intentional and self-regulated judgment, which results in interpretation, analysis, evaluation and inference, as well as the explanation of the evidentiary, conceptual, methodological, criterial or contextual considerations on which that judgment is based (American Psychological Association 1990 ).

However, the varied set of skills associated with CRT is controversial (Fisher, 2009 ). For instance, Ennis ( 2019 ) developed an extensive conception of CRT through a broad set of dispositions and abilities. Similarly, Madison ( 2004 ) proposed an extensive and comprehensive list of skills (Table 1 ).

The development of CRT tests has contributed to clarifying the relevance of the many CRT skills, as the test’s functionality requires concentrating on a few skills. For instance, Halpern’s ( 2010 ) questionnaire assesses, through everyday situations, problem-solving, verbal reasoning, probability and uncertainty, hypothesis-testing, argument analysis and decision-making. Watson and Glaser’s ( 2002 ) instrument assesses deduction, recognition of assumptions, interpretation, inference, and evaluation of arguments. The California Critical Thinking Skills Test assesses analysis, evaluation, inference, deduction and induction (Facione et al., 1998 ). It is also worth mentioning that most CRT tests target adults, although the Cornell Critical Thinking Tests (Ennis & Millman, 2005 ) were developed for a variety of young people and address several CRT skills (X test, induction, deduction, credibility, and identification of assumptions; Class Test, classical logical reasoning from premises to conclusion, etc.). The large number of CRT skills led scholars to perform efforts of synthesis and refinement that are summarized through some exemplary proposals (Table 1 ).

The CRT psychological framework presented above places the complex set of skills within the high-level cognitive constructs whose practice involves a self-directed, self-disciplined, self-supervised, and self-corrective way of thinking that presupposes conscious mastery of skills and conformity with rigorous quality standards. In addition to skills, CRT also involves effective communication and attitudinal commitment to intellectual standards to overcome the natural tendencies to fallacy and bias (self-centeredness and socio-centrism).

Science education and thinking skills

CRT skills mirror the scientific reasoning skills of scientific practice, and vice versa, based on their similar contents. This intuitive resemblance may launch expectations of their mutual relationship. Science education research has increased attention to CRT skills as promotors of meaningful learning, especially when involving NOS and understanding of socio-scientific issues (Vieira et al., 2011 ; Torres & Solbes, 2016 ; Vázquez-Alonso & Manassero-Mas, 2018 ; Yacoubian & Khishfe, 2018 , among others). Furthermore, Yacoubian ( 2015 ) elaborated several reasons to consider CRT a fundamental pillar for NOS learning.

Some authors stress the convergence between science and CRT based on the word critical , as thinking and science are both critical. Critical approaches have always been considered consubstantial to science (and likely a key factor of its success), as their range spreads from specific critical social issues (i.e., scientific controversies, social acceptance of scientific knowledge, social coping with a virus pandemic) to the socially organized scepticism of science (i.e., peer evaluation, scientific communication). The latter is considered a universal value of scientific practice to guarantee the validity of knowledge (Merton, 1968 ; Osborne, 2014 ). In the context of CRT research, the term critical involves normative ways to ensure the quality of good thinking, such as open-minded abilities and a disposition for relentless scrutiny of ideas, criteria for evaluating the goodness of thinking, adherence to the norms, standards of excellence, and avoidance of errors and fallacies (traits of poor thinking). These obviously also apply to scientific knowledge through peer evaluation practice, which represents a superlative form of good normative thinking (Bailin, 2002 ; Paul & Elder, 2008 ).

Another important feature of the convergence of CRT and science is the broad set of common skills sharing the same semantic content in both fields, despite that their names may seem different. Induction, deduction, abduction, and, in general, all kinds of argumentation skills, as well as problem-solving and decision-making, exemplify key tools of scientific practice to validate and defend ideas and develop controversies, discussions, and debates. Concurrently, they, too, are CRT skills (Sprod, 2014 ; Vieira et al., 2011 ; Yacoubian & Kishfe, 2018 ). In addition, Santos’ ( 2017 ) review suggests the following tentative list of skills: observation, exploration, research, problem-solving, decision-making, information-gathering, critical questions, reliable knowledge-building, evaluation, rigorous checks, acceptance and rejection of hypotheses, clarification of meanings, and true conclusions. Beyond skill names and focusing on their semantic content, (Manassero-Mas & Vázquez-Alonso, 2020a ) developed a deeper analysis of the skills usually attributed to scientific thinking and critical thinking, concluding that their constituent skills are deeply intertwined and much more coincident than different. This suggests that scientific and critical thinking may be considered equivalent concepts across the many shared skills they put into practice. However, equivalence does not mean identity, as important differences may still exist. For instance, the evaluation and judgment of ideas involved in organized scientific skepticism (i.e., peer evaluation) are much more demanding and deeper in scientific practice than in daily life thinking realms.

In sum, research on the CRT and NOS constructs is plural, as they draw from two different fields and traditions, general education and cognitive psychology, and science education, respectively. However, CRT and NOS share many skills, processes, and thinking strategies, as they both pursue the same general goal, namely, to establish the true value of knowledge claims. These shared features provide further reasons to investigate the possible relationships between NOS and CRT skills.

Research involving nature of science and thinking skills

The research involving both constructs is heterogeneous, as the operationalisations and methods are quite varied, given the pluralized nature of NOS and thinking. For example, Yang and Tsai ( 2012 ) reviewed 37 empirical studies on the relationship between personal epistemologies and science learning, concluding that research was heterogeneous along different NOS orientations: applications of Kuhn’s ( 2012 ) evolutionary epistemic categories, use of general epistemic knowledge categories, studies on epistemological beliefs about science (empiricism, tentativeness, etc.), and applications of other epistemic frameworks. The studies dealing with the epistemological beliefs about science were a minority. Another example of heterogeneity comes from Koray and Köksal’s ( 2009 ) study about the effect of laboratory instruction versus traditional teaching on creativity and logical thinking in prospective primary school teachers, where the laboratory group showed a significant effect in comparison to the traditional group. However, the NOS contents involved in laboratory instruction are still unclear. Dowd et al. ( 2018 ) examined the relationship between written scientific reasoning and eight specific CRT skills, finding that only three aspects of reasoning were significantly related to one skill (inference) and negatively to argument.

A series of studies suggest implicit relationships between NOS and thinking skills. Yang and Tsai ( 2010 ) interviewed sixth-graders to examine two uncertain science-related issues, finding that children who developed more complex (multiplistic) NOS knowledge displayed better reflective thinking and coordination of theory and evidence. Dogan et al. ( 2020 ) compared the impact of two epistemic-based methodologies (problem-based and history of science) on the creativity skills of prospective primary school teachers, finding that the problem-solving approach was more effective in increasing students’ creative thinking. Khishfe ( 2012 ) and Khishfe et al. ( 2017 ) found no differences in decision-making and argumentation in socio-scientific issues regarding NOS knowledge, but more participants in the treatment groups referred their post-decision-making factors to NOS than the other groups. Other studies found relationships between NOS understanding and variables that do not match CRT skills precisely. For instance, Bogdan ( 2020 ) found that inference and tentativeness relate to attitudes toward the role of science in social progress, but creativity does not, and the same applies to the acceptance of the evolution theory (Cofré et al., 2017 ; Sinatra et al., 2003 ).

Another set of studies comes from science education research on argumentation, which is based on the rationale that argumentation is a key scientific skill for validating knowledge in scientific practice. Thus, reasoning skills should be related to NOS understanding. Students who viewed science as dynamic and changeable were likely to develop more complex arguments (Stathopoulou & Vosnidou, 2007 ). In a floatation experience, Zeineddin and Abd-El-Khalick ( 2010 ) found that the stronger the epistemic commitments, the greater the quality of the scientific reasoning produced by the individuals. Accordingly, the term epistemic cognition of scientific argumentation has been coined, although specific research on argumentation and epistemic cognition is still relatively scarce (He et al., 2020 ).

Weinstock’s ( 2006 ) review suggested that people’s argumentation skills develop in proportion to their epistemic development, which Noroozi ( 2016 ) also confirmed. Further, Mason and Scirica ( 2006 ) studied the contribution of general epistemological comprehension to argumentation skills in two readings, finding that participants at the highest level of epistemic comprehension (evaluative) generated better quality arguments than participants at the previous multiplistic stage (Kuhn, 2012 ). In addition, the review of Rapanta et al. ( 2013 ) on argumentative competence proposed a three-dimensional hierarchical framework, where the highest level is epistemological (the ability to evaluate the relevance, sufficiency, and acceptability of arguments). Again, Henderson et al. ( 2018 ) discussed the key challenges of argumentation research and pointed to students’ shifting epistemologies about what might count as a claim or evidence or what might make an argument persuasive or convincing, as well as developing valid and reliable assessments of argumentation. On the contrary, Yang et al. ( 2019 ) found no significant associations between general epistemic knowledge and the performance of scientific reasoning in a controversial case with undergraduates.

From science education, González‐Howard and McNeill ( 2020 ) analysed middle-school classroom interactions in critique argumentation when an epistemic agency is incorporated, indicating that the development of students’ epistemic agency shows multiple and conflating approaches to address the tensions inherent to critiquing practices and to fostering equitable learning environments. This idea is further developed in the special section on epistemic tools of Science Education (2020), which highlights the continual need to accommodate and adapt the epistemic tools and agencies of scientific practices within classrooms while taking into account teaching, engineering, sustainability, equity and justice (González‐Howard & McNeill, 2020 ; Settlage & Southerland, 2020 ).

Finally, some of the above-mentioned research used a noteworthy concept of epistemic knowledge (EK) as “knowledge about knowledge and knowing” (Hofer & Pintrich, 1997 ), which has been developed in mainstream general education research and involves some meta-cognitions about human knowledge that research has largely connected to general learning and CRT skills (Greene et al., 2016 ). Obviously, EK and NOS knowledge share many common aspects (epistemic), suggesting a considerable overlap between them. However, it is noteworthy that NOS research is oriented toward CRT skills impacting NOS learning, while EK research orientates toward EK impacting CRT skills and general learning.

Regarding the Likert formats for research tools, test makers are concerned about the control of response biases that cause a lack of true reflection on the statement content and may damage the fidelity of data and correlations. Respondents’ tendency to agree with statements (acquiescence bias) is widespread. Further, neutrality bias and polarity bias reflect respondents’ propensity to choose fixed score points of the scale, either the midpoints (neutrality) or the extreme scores (polarity), either extreme high scores (positive bias) or extreme low scores (negative bias). To mitigate biases, experts recommend avoiding the exclusive use of positively worded statements within the instruments and combining positive and reversed items. This recommendation has been implemented here using three categories for NOS phrases that operationalize positive, intermediate and reversed statements (Vázquezr et al., 2006 ; Kreitchmann et al., 2019 ; Suárez-Alvarez et al., 2018 ; Vergara & Balluerka, 2000 ). However, the use of varied styles for phrases harms the instrument’s reliability and validity, and reliability is underestimated (Suárez-Alvarez et al., 2018 ).

All in all, the theoretical framework is twofold: CRT and NOS research. The above-mentioned research shares the hypothesis that the relationship between NOS and CRT skills matters. However, it displays a broad heterogeneity of research methods, variables, instruments and mixed results on the NOS-CRT relationship that do not allow a common methodological standpoint. Further, mainstream research focuses on college students and argumentation skills. In this regard, this study aims to empirically research the NOS-CRT relationship by applying standardized assessment tools for both constructs. This promotes comparability among researchers and provides quick diagnostic tools for teachers. Secondly, this study addresses younger students, which involves the creation of NOS and CRT tools adapted to young participants, for which some test validity and reliability data are provided. The research questions within this framework are: Do NOS knowledge and CRT skills correlate? What are the traits and limits conditions of this relationship, if any?

Materials and methods

The data gathering took place in Spain in the year 2018. At this time, the enacted school curriculum missed the international standards and specific curriculum proposals about CRT and NOS issues, so NOS issues could be implicitly related to some curricular contents about scientific research. Despite this lack of curricular emphasis, the principals of the participant Spanish schools expressed interest in diagnosing students’ thinking skills and NOS knowledge and agreed with the authors on the specific CRT and NOS-skills to be tested. As the Spanish school curriculum does not emphasize CRT and NOS issues, the students are expected to be equally trained, and this context conditioned the design of tentative tests through simple contents and an open-ended format, as they are cheap and easy to administer and interpret.

Participants

The participant schools (17) included some public (4) and state-funded private schools (13) that spread across mixed socio-cultural contexts and large, medium, and small Spanish townships. The participant students were tested in their natural school classes (29) of the two target grades. The valid convenience samples are two cohorts of students, each representing students of 6 th grade of Primary Education (PE6) ( n  = 434; 54.8% girls and 45.2% boys; mean age 11.3 years) and 8th grade of Secondary Compulsory Education (SCE8) ( n  = 347; 48.5% girls and 51.5% boys; mean age 13.3 years). In Spain, 6 th grade is the last year of the primary stage (11–12-year-old students), and the 8 th grade is the second year of the lower secondary compulsory stage (13–14-year-old students).

Instruments

Two assessment tools were tailored by researchers (a CRT skill test and a NOS scenario) to operationalise CRT and NOS to empirically check their relationships. As the Spanish school curriculum lacks CRT standards, the specific thinking skills that represent the CRT construct were agreed upon between principals and researchers. The design of the tool to assess NOS knowledge took into account that NOS was not explicitly taught in Spanish schools. Both tools were designed to match the schools’ interests and the students’ developmental level; the latter particularly led to choosing a simple NOS issue (definition of science) to match the primary students’ capabilities better.

Thinking challenge tests

Two CRT thinking skill test were developed for the two participant cohorts (PE6 and SCE8). The design aligns with the tradition of most CRT standardised tests that concentrate assessment on a few selected thinking skills (i.e., Ennis & Millman, 2005 ; Halpern, 2010 ). The test for the 6th-graders (PE6) assesses five skills: prediction, comparison and contrast, classification, problem-solving and logical reasoning. The test for the 8th-graders (SCE8) assesses causal explanation, decision-making, parts-all relationships, sequence and logical reasoning.

As most CRT tests are designed for adults, many tests and item pools were reviewed to select suitable items for younger students. The selection criteria were the fit of the items’ cognitive demand with students’ age, the addressed skill and the motivational challenge for students. Moreover, items must be readable, understandable, adequate, and interesting for the participant students. Then, two 45-item and 38-item tests were agreed on and piloted. Their results are described elsewhere (Manassero-Mas & Vázquez-Alonso, 2020b ). The items were examined by the authors according to their reliability, correlation and factor analysis to eliminate unfair items. Again, the former criteria were used to add new items to conform the two new 35-item Thinking Challenge Tests (TCT) to assess the CRT skills of this study.

The items of the first two skills were drawn from the Cornell (Nicoma) test, which evaluates four CRT skills through the information provided by a fictional story about some explorers of the Nicoma planet and asks questions about the story. Some items from prediction and comparison skills were drawn for the 6th-grade TCT (PE6), and some items from causal explanation and decision-making skills were drawn for the 8th-grade TCT (SCE8). The two TCT include three additional items on logical reasoning that were selected from the 78-item Class-Reasoning Cornell Test (Ennis & Millman, 2005 ). One item was also drawn from the 25-situation Halpern CRT test (Halpern, 2010 ) for the problem-solving skill of the PE6 test. The authors adapted the remaining figurative items (Table 2 ) to enhance students’ challenge, understanding, and motivation and make the TCT free of school knowledge (Appendix).

Overall, the TCT items pose authentic culture-free challenges, as their contents and cognitive demands are not related to or anchored in any prior school curricular knowledge, especially language and mathematics. Therefore, the TCT are intended to assess culture-free thinking skills.

The item formats involve multiple-choice and Likert scales with appropriate ranges and rubrics that facilitate quick and objective scoring and the elaboration of increasing adjustment between items’ cognitive demand and their corresponding skill, thereby leading to further revision based on validity and reliability improvement. This format also allows setting standardised baselines for hypothesis-testing through comparisons of research, educational programs, and teaching methodologies.

Nature of science assessment

A scenario on science definitions is used to assess the participants’ NOS understanding because this simple issue may better fit the lack of explicit NOS teaching and the developmental stage of the young students, especially the youngest 6th-graders. The scenario provides nine phrases that convey an epistemic, plural and varied range of science definitions, and respondents rate their agreement-disagreement with the phrases on a 9-point Likert scale (1 =  strongly disagree , 9 =  strongly agree ) to allow better nuancing of their NOS beliefs and avoid psychometric objections to the scale intervals. The scenario is drawn from the “Views on Science-Technology-Society” (VOSTS) pool that Aikenhead and Ryan ( 1992 ) developed empirically by synthesizing many students’ interviews and open answers into some scenarios, written in simple, understandable, and non-technical language. They consider that VOSTS items have intrinsic validity due to their empirical development, as the scenario phrases come from students, not from researchers or a particular philosophy, thus avoiding the immaculate perception bias and ensuring students’ understanding. Lederman et al. ( 1998 ) also consider VOSTS a valid and reliable tool for investigating NOS conceptions. Manassero et al. ( 2003 ) adapted the scenarios into the Spanish language and contexts, and developed a multiple-rating assessment rubric, based on the phrase scaling achieved through expert judges’ consensus. The rubric assigns indices whose empirical reliability has been presented elsewhere (Vázquezr et al., 2006 ; Bennássar et al., 2010 ).

The students completed the two tests through digital devices led by their teachers within their natural school classroom groups during 2018–19. To enhance students’ effort and motivation, the applications were infused into curricular learning activities, where students were encouraged to ask about problems and difficulties. During applications students did not ask questions to teacher that may reflect some difficulty to understand the tests. The database was processed with SPSS 25 and Factor program (Baglin, 2014 ) for exploratory and confirmatory factor analysis through polychoric correlations and Robust Unweighted Least Squares (RULS) method that lessen conditions on the score distribution of variables. Effect size statistics use a cut-off point ( d  = 0.30) to discriminate relevant differences.

There was no time limit for students to complete the tests, and the applications took between 25 and 50 min. Correct answers score one point, incorrect answers zero points, and no random corrections were applied. The skill scores were computed by adding the scores of the items that belong to each skill, which are independent. The addition of the five skill scores makes up a test score (thinking total) that estimates students’ global CRT competence and is dependent on the skill scores (Table 2 ).

The different types of validity maintain a reciprocal influence and represent the various parts of a whole, so they are not mutually independent. The Thinking Challenge tests’ validity relies on the quality of the CRT pools and tests examined by the authors, their agreement to choose the items that best matched the criteria, and the reviewed pilot results (Manassero-Mas & Vázquez-Alonso, 2020b ). The Factor program computes several reliability statistics (Cronbach alpha, EAP, Omega, etc.).

Nature of science scenario

The nine phrases describe different science definitions, and students rated each one on a 1–9 agreement scale. According to the experts’ current views on NOS, a panel of qualified judges reached a 2/3-consensus to categorize each phrase within a 3-level scheme (Adequate, Plausible, Naive), which has been widely used in NOS assessment (Khishfe, 2012 ; Liang et al., 2008 ; Rubba et al., 1996 ). The scheme means the phrases express informed (Adequate), partially informed (Plausible), or uninformed (Naive) NOS knowledge (see Appendix). According to this scheme, an evaluation rubric transforms the students’ direct ratings (1–9) into an index [− 1 to + 1], which is proportionally higher when the person agrees with an Adequate phrase, partially agrees with a Plausible phrase, or disagrees with a Naive phrase. All the rubric indices balance positive and negative scores, which are symmetrical for Adequate and Naïve phrases, but plausible indices are somewhat loaded toward agreement, as higher agreement would be expected. The index unifies the NOS measurements to make them homogeneous (positive indices mean informed conceptions), invariant (measurement independent of scenario/phrase/category), and standardised (all measures within the same interval [− 1, + 1]). The index proportionally values the adjustment of students’ NOS knowledge to the current views of science: the higher (or lower) the index, the better (or worse) informed is their NOS knowledge (Vázquezr et al., 2006 ).

Three category variables (Adequate, Plausible, and Naïve) are computed by averaging their phrase indices, which are mutually independent. The average of the three category variables computes a global NOS index representing the student’s overall NOS knowledge (Global). The use of three categories aligns with test makers’ recommendations to avoid using only positively worded phrases in order to elude the acquiescence bias, which harms reliability and validity (Suárez-Alvarez et al., 2018 ).

The links between thinking skills and NOS are empirically explored through correlational methods and one-way ANOVA procedures of the variables of the Thinking Challenge test and science definitions.

The results include the descriptive statistics of the target variables, twelve thinking variables (five skills plus thinking total for each group) and four variables of the science definitions (adequate, plausible, naive, and global), the analysis of the correlations, a linear regression analysis among these variables, and a comparison of thinking skills between NOS categorical groups through a one-way ANOVA.

Descriptive statistics

Most mean thinking variables scores fell near the midpoint of the scale range. Four skills (classification, problem-solving, causal explanation and sequence) scored above the midpoints of their ranges, whereas two variables (logical reasoning and decision) scored slightly below their midpoints. Overall, these results indicate the medium difficulty of the tests for the students, neither easy nor difficult, which means the CRT tests can be acceptable to assess young students’ thinking skills (Table 3 ).

The EAP reliability indices of classification, problem-solving, sequence, parts (mainly figurative items) and thinking scales were excellent, good for the remaining scales, but poor for logical reasoning. Low reliability indicates a need for item revision and limited applicability (i.e., inappropriate for individual diagnosis), but is insufficient to reject the test in research purposes (U.S. Department of Labor, 1999 ). As test reliability critically depends on the number of items, increasing the length of logical reasoning over its three current items will improve its reliability.

The descriptive results for the direct scores of the NOS variables (Table 4 ) showed a biased pattern toward agreement (average phrases between 4.9 and 7.4), which suggests some acquiescence bias in spite of presenting varied phrases. The average indices obtained positive scores for the adequate category, slightly negative ones for the naïve category, and close-to-zero for the plausible phrases (the effect size of the differences concerning a zero score was low). The overall weighted average index for the whole sample (global variable) was close-to-zero and slightly positive, meaning that the students’ overall epistemic conception of science definition was not significantly informed. The overall average index of Adequate phrases obtained the highest positive score for both samples of students, which means that most students agreed with the Adequate phrases (expressing informed beliefs about science). In contrast, the Naïve overall average index obtained the lowest negative mean score, indicating that the students agreed instead of disagreeing with phrases expressing uninformed views about science. The Plausible variable (phrases expressing partially informed beliefs, neither adequate nor naive) obtained a close-to-zero average score, meaning that the students’ beliefs about these variables were far from informed. Overall, the students presented slightly informed views on Adequate phrases, close-to-zero average indices scores (not informed views) for Plausible phrases and slightly uninformed views on Naive statements.

Polychoric correlations among NOS direct scores computed through Factor attained good scores on all NOS items, indicating a unidimensional structure (but Phrase I). The exploratory factor analysis (EFA) applied to phrase scores displayed a dominant eigenvalue, whose general factor had acceptable loadings for all phrases (only phrase I had low loading). The unidimensional model obtained fair statistics in the confirmatory factor analysis. These results suggest one general factor underlying students’ scores and justify a global score representing the variance of all the NOS phrases. The expected a posteriori (EAP) reliability scores for the entire NOS scale were good (Table 4 ).

The comparison of NOS scores between primary and secondary grades highlights that the four NOS variable scores on science definitions were significantly equal for both cohorts of students, despite the two years separation. So, the educational impact of the two-year period on NOS seems almost null, given the close-to-zero differences in science definitions. This result could be expected, as NOS is not explicitly planned in Spanish science curricula and is not usually taught in the classroom.

Both cohorts answered the same anchoring CRT item (see Appendix), whose correct answer rate (27% primary; 33% secondary) suggests a slight improvement in CRT skills that sharply contrasts with the former NOS comparison. Summing up, despite that CRT and NOS have not been taught to Spanish students, developmental learning may increase CRT skills but not improve NOS knowledge. This reinforces the claim for explicit and reflective teaching of NOS, as implicit developmental maturation alone seems ineffective.

Correlations between nature of science and thinking skills

The empirical analysis of the hypothesised relationships between thinking skills and NOS epistemic variables (Adequate, Plausible, Naive) was performed through correlational methods (Pearson’s bivariate correlation coefficients and linear regression analysis) and one-way analysis of variance.

The Pearson correlation coefficients revealed a pattern of the relationships between NOS and thinking skills (Table 5 ): all thinking skills positively correlated with the Adequate variable, and most were significant, except for prediction and logical reasoning in EP6, which were non-significant. However, the correlations with the Naive and Plausible variables were overall non-significant. However, there were some exceptions: first, the Plausible/problem-solving correlation in EP6 was significant (and negative); second, the correlations between Naïve and logical reasoning (positive in EP6) and also between decision-making, logical reasoning and the thinking total score (negative in SCE8) were significant.

Thus, the noteworthy pattern for the NOS-CRT relationship showed that the Adequate variable positively correlated with all the thinking variables and was mostly statistically significant (83%); the highest positive correlations corresponded to problem-solving (EP6), sequence and parts-all (ES8), and the thinking total skills for both groups ( p  < 0.01). This pattern means that students with higher (lower) thinking skill scores expressed higher (lower) agreement with Adequate phrases.

The correlation pattern between thinking skills and the Plausible and Naive variables was mainly non-significant (75%). Only two correlations were significant in the EP6 group; the Plausible-problem-solving correlation was negative (higher scorers on problem-solving did not recognize the intermediate value of Plausible science definitions), whereas the Naïve-logical reasoning correlation was positive (higher scorers on logical reasoning tended to disagree with Naive science definitions). Three Naïve correlations were significant and negative in the secondary group (SCE8): parts-all, logical reasoning skills and thinking total.

Overall, the positive and significant correlation pattern of the Adequate variable was stronger than the mainly non-significant and somewhat negative Naive and Plausible correlation pattern.

Linear regression analysis between nature of science and thinking skills

Regression analysis (RA) compares the power of a set of variables to predict a dependent variable and the common variance. Two linear regression analyses were carried out to test the mutual contribution of the CRT and NOS variables. The first RA uses the NOS variables (Adequate, Plausible, Naive and Global) as the dependent variables, and the five independent thinking skills as predictors (Table 6 ). The second RA (Table 7 ) reversed the roles of the variables, thus establishing the thinking skills as the dependent variables and the three independent NOS variables (Adequate, Plausible and Naive) as the predictors. Collinearity tests were negative for all RAs through tolerance, variance inflation factor and condition index statistics.

The first RA (Table 6 ) showed that the NOS Adequate variable achieved the highest proportion of common variance with thinking skill predictors at both educational levels (4.2% in PE6 and 9.2% in SCE8), whereas the other two NOS variables achieved much lower levels of explained variance. In PE6, the most significant predictor skill of NOS was problem-solving, whereas the other predictor skills did not reach statistical significance in any case. In SCE8, the most significant predictors were three skills (sequencing, reasoning, and parts-all), whereas the remaining skills did not reach statistical significance (the predictors of the Plausible variable were negative).

The second RA (Table 7 ) showed that the Adequate variable achieved the greatest predictive power, as most thinking skills displayed statistically significant standardised beta coefficients at the two educational levels, while Plausible and Naïve variables had a much lower predictive power, and Plausible standardised coefficients were non-significant for any skill predictor. The common variance displayed a similar amount to the first analysis; the thinking total variable displayed the largest variance at both educational levels (4.8% PE6; 9.6% SCE8), and the problem-solving skills at PE6 (5.3%) and parts-all at SCE8 (7.1%).

In summary, the Adequate variable and the classification and problem-solving skills (PE6) and sequencing and parts-all skills (SCE8) were the variables that presented the largest standardised coefficients and statistical significance regarding the research question raised in this study about the positive relationship between NOS and thinking skills.

Analysis of variance between nature of science and thinking skills

Further exploration of the NOS-skills relationship was conducted through one-way between-groups analysis of variance. According to performance on the Adequate, Plausible and Naive variables, the participants were allocated to four percentile groups (low group: 0–25%; medium–low: 25–50%; medium–high: 50–75%; high: 75–100%), which made up the independent variable of the ANOVA for testing the differences in thinking skills (dependent variable) among these four groups.

The Adequate groups yielded a statistically significant main effect for the thinking total in primary [ F (3, 429) = 7.745, p  = 0.000] and secondary education [ F (3, 343) = 2.607, p  = 0.052]. The effect size of the differences in the thinking total scores between the high and low groups was large for the primary ( d  = 0.69) and secondary ( d  = 0.86) cohorts. Furthermore, comparison, classification, and problem-solving skills also replicated this pattern of large differences between high-low groups that supports the NOS/CRT positive relationship. However, prediction ( p  = 0.069) and logical reasoning ( p  = 0.504) did not display differences among the Adequate groups.

Post-hoc comparisons (Scheffé test) showed that the low group achieved significantly lower scores than the other three Adequate groups. The Adequate low group scores on thinking total, comparison, classification, and problem-solving skills were significantly lower than the scores of the other three groups, whereas the differences among the Adequate groups on prediction and logical reasoning scores were non-significant.

The main effect of the Plausible groups on the thinking total variable did not reach statistical significance for the primary F (3, 430) = 1.805, p = 0.145] and secondary groups [ F (3, 343) = 2.607, p  = 0.052]. The effect size was small ( d  = − 0.31 primary; d  = − 0.32 secondary) and negative (the thinking total mean score of the low group was higher than that of the high group). Post-hoc comparisons (Scheffé test) confirmed the trend, as they did not yield significant differences among the Plausible groups, although the mean score of the Plausible high group was lower than the other three groups. Exceptionally, problem-solving skill (primary) displayed a statistically significant difference between the Plausible high group (the lowest mean score) and the remaining three groups.

The main effect of Naive groups on the thinking total variable did not reach statistical significance [ F (3, 430) = 1.075, p  = 0.367 primary; F (3, 343) = 1.642, p  = 0.179 secondary] and the effect size of the differences was small ( d  = 0.32 primary; d  = − 0.31 secondary). The opposite direction of the differences in primary (positive) and secondary education (negative) is noteworthy, as it means that the highest mean score corresponded to the Naive high group in primary (positive) or the Naive low group in secondary (negative). Post-hoc comparisons (Scheffé test) showed that there were no significant differences among the Naive groups. However, the league table of groups across the Naive groups revealed differences between primary and secondary cohorts. Overall, the primary Naive groups followed the pattern of the Adequate variable (the low group displayed the lowest score), whereas the secondary Naive groups followed the pattern of the Plausible variable (the high group tended to display the lowest score).

The empirical findings of this study quantify through correlations some significant and positive relationships between thinking skills and NOS beliefs about science definitions, as the main answer to the research question. However, the analysis shows a complex pattern of the relationship, which depends on the kind of the NOS variable under consideration: the NOS Adequate variable, which represents phrases expressing informed views on science, is positively and significantly related to most thinking skills, whereas the uninformed Naive and intermediate Plausible variables show a lower predictive power of thinking skills. Summing up, the positive significant CRT-NOS relationship is not displayed by all NOS variables, as it is limited to those NOS variables that express an Adequate view of science, while the other NOS variables do not significantly correlate with CRT skills.

The implications of this study for research are twofold. On the one hand, the variables of this study specifically operationalise the two constructs under investigation, namely, CRT skills and NOS knowledge, which has been a challenge throughout their mixed operationalisation in the reviewed research. On the other hand, via Pearson correlations and regression analysis, this study quantifies the amount of the common variance between specific CRT skills and specific NOS knowledge, which is significant in many cases. Both contributions improve the features of previous studies, as most of them investigated the relationship from varied methodological frameworks: some reported group comparison, fewer analysed correlations, and most of the latter used a diversity of variables, which often did not match either CRT skills or NOS variables. For instance, Vieira et al. ( 2011 ) correlated thinking skills with science literacy (not NOS) and reported Pearson correlations that were lower than the correlations obtained herein, even though they used a smaller sample, which favours higher correlations.

The findings reveal the complexity of the NOS-CRT relationship, which limits the positive and relevant relationship to the NOS Adequate variables about science definitions, but not to the Plausible or Naive conceptualizations, which mainly display non-significant and somewhat negative correlations. The positive relationship between thinking and Adequate science definitions is a remarkable finding, which empirically supports the hypothesis that better thinking skills involve better NOS knowledge and confirms the concomitant intuitions and claims of some studies about the importance of thinking skills for learning NOS epistemic topics (Erduran & Kaya, 2018 ; Ford & Yore, 2014 ; Simonneaux, 2014 ; Torres & Solbes, 2016 ; Yacoubian, 2015 ). The findings also contribute to establishing the limit of the significant relationship, which applies when the NOS is conveyed by informed statements (Adequate phrases) and does not apply for non-adequate NOS statements, which are a minority in the face of most NOS literature, which conveys informed statements on NOS (Cofré et al., 2019 ).

The implications of the collateral finding on the lack of differences in science definitions between primary and secondary cohorts deserve further comments. Obviously, the finding confirms that two educational years have a scarce impact on improving Spanish students’ understanding of science definitions; that is, NOS teaching seems ineffective and stagnated, probably due to poor curriculum development and the lack of teacher training and educational resources. Besides, the students’ higher performance on adequate phrases than on plausible and naïve phrases also suggests that Spanish students may achieve some mild knowledge about the informed traits of science because they are implicitly displayed in teaching, textbooks and media. However, plausible and naïve knowledge is not usually available from those sources, as it requires explicit and reflective teaching, which Spanish students usually lack. Both findings suggest the need for further attention to misinformed NOS knowledge to invigorate explicit and reflective NOS teaching (Cofré et al., 2019 ; McDonald & McRobbie, 2012 ).

The unexpected non-significant/negative relationships between thinking and Plausible and Naive variables may need some elaboration due to the complexity of students’ NOS conceptions. For instance, Bennássar et al. ( 2010 ) described the students’ inconsistent agreements when rating opposite statements. Bogdan ( 2020 ) found that epistemic conceptions of science creativity did not relate to attitudes to science, and Khishfe ( 2012 ) reported complex relationships between epistemic aspects of science and decision-making about genetically modified organisms or the acceptance of the evolution theory (Cofré et al., 2017 ; Sinatra, et al., 2003 ). Thus, a tentative interpretation of those paradoxical relationships is elaborated.

Higher-thinking-skill students might develop better quality reflections that elicit more confident and higher scores on NOS phrases than lower-thinking-skill students. The latter tend toward less confident and low-quality reflection, which may elicit intermediate, less polarized scores. On average, this differential pattern explains the complex pattern of relationships between CRT and NOS variables. For the Adequate phrases (where the rubric assigns the best indices to the highest scores), higher-thinking students will achieve higher NOS indices than lower-thinking students, explaining the observed positive CRT-NOS correlations in the Adequate variables and the ANOVA results. On the other hand, when Naive and, especially, Plausible phrases are involved (which obtain their highest indices at low and intermediate scores, respectively), the differential response pattern would lead the lower-thinking students to achieve higher NOS indices than the higher-thinking students, thus shifting to the observed non-significant or negative correlations for Naive and Plausible phrases. In short, unconfident/confident and lower/higher quality reflection on NOS knowledge of the lower-/higher-thinking students would explain the shift from the positive and significant relationship of CRT-Adequate phrases to the non-significant correlations of Plausible and Naive phrases. This interpretation agrees with the striking finding of O’Brien et al. ( 2021 ) about a similar unexpected higher adherence to pseudoscientific claims in students with higher trust in science, which the authors attributed to the acritical acceptation of any scientific contents. Similarly, mastery of CRT skills is a desirable learning outcome, but it may make master students vulnerable to positive polarization in science definitions. However, further research is needed to confirm the non-significant correlations and the interpretation of the differential response pattern.

As the previous reference suggests, the findings about the complex CRT-NOS relationship connect with some pending controversies about NOS teaching, namely, the marginalized attention paid to misinformed ideas or myths about science, in favour of the informed ideas, which reveal implicit and non-reflective NOS teaching, as obviously misinformed ideas contribute to triggering more reflection than informed ideas (Acevedo et al., 2007 ; McComas, 1996 ). The effect of this under-exposure is students’ under-training about misinformed NOS ideas, which may act as obstacles to authentic NOS epistemic learning, explaining the differences presented herein. The remedy to this situation and the unconfident bias may lie in devoting more time and explicit attention to uninformed or incomplete NOS claims through reflective teaching.

This study is determined and limited by the contextual conditions of its correlational methodology. First, the research question implied measurements of thinking skills and NOS knowledge; second, the young participants (12–14-year-olds) required measurement tools appropriate to this age; third, the thinking skill tests had to match the thinking skills demanded by the participant school; fourth, the selected NOS tool was conditioned by the students’ age and the lack of appropriate NOS assessment tools. Thus, further suggestions to overcome these limitations are focused on expanding empirical support for the NOS-CRT relationship. On the one hand, some new NOS issues, such as additional epistemological and social aspects of science, should be explored to extend the representativeness of NOS knowledge. Similar reflections apply to including new skills to expand the scope of the CRT tool. Furthermore, the number of items of the logical reasoning scale should be increased to improve its reliability. Overall, the perennial debate between open-ended and closed formats is also noteworthy for future research, as quantitative methods could be complemented with qualitative methods (such as students’ interviews and the like).

Finally, the main educational implication of this study is that students may need to master some competence in CRT skills to learn NOS knowledge or general epistemic knowledge. Conversely, mastery of CRT skills may foster learning NOS knowledge. Although this study focuses on epistemic NOS knowledge drawn from science education, educational research has parallelly elaborated the epistemic knowledge (EK) construct for general education (Hofer & Pintrich, 1997 ), which opens further prospective research developments for NOS comprehension and CRT skills. On the one hand, the study of the NOS-EK relationship may shed light on convergent epistemic teaching and learning, both in science and in general education. On the other hand, the importance of CRT skills for NOS, and vice versa, may help coordinate teaching NOS-EK issues (Erduran & Kaya, 2018 ; Ford & Yore, 2014 ; McDonald & McRobbie, 2012 ; Simonneaux, 2014 ). This joint prospective of NOS-EK elaboration may also provide new answers to two aspects: the mutual connections between CRT skills and NOS-EK issues and the EK assessment tools that may also contribute to advancing the evaluation of CRT skills and NOS.

Data availability

The Spanish State Research Agency and the University of the Balearic Islands hold the property of all data and materials of this study, which may be available under reasonable request to them.

Code availability

Not applicable.

Acevedo JA, Vázquez A, Manassero MA, Acevedo P (2007) Consensus on the nature of science: epistemological aspects. Revista Eureka sobre Enseñanza y Divulgación de las Ciencias 4:202–225. http://www.apac-eureka.org/revista/Larevista.htm

Aikenhead GS, Ryan AG (1992) The development of a new instrument: “Views on Science-Technology-Society” (VOSTS). Sci Educ 76:477–491

Article   Google Scholar  

Allchin D, Zemplén GÁ (2020) Finding the place of argumentation in science education: Epistemics and whole science. Sci Educ 104(5):907–933. https://doi.org/10.1002/sce.21589

American Psychological Association (1990) Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction. Executive Summary “The Delphi Report”. www.insightassessment.com/dex.html

Baglin J (2014) Improving your exploratory factor analysis for ordinal data: a demonstration using factor. Pract Assess Res Eval 19(5):2

Google Scholar  

Bailin S (2002) Critical thinking and science education. Sci Educ 11:361–375

Bennássar A, Vázquez A, Manassero MA, García-Carmona A (Coor.). (2010) Ciencia, tecnología y sociedad en Iberoamérica [Science, technology society in Latin America]. Organización de Estados Iberoamericanos. http://www.oei.es/salactsi/DOCUMENTO5vf.pdf

Bogdan R (2020) Understanding of epistemic aspects of NOS and appreciation of its social dimension. Revista Eureka sobre Enseñanza y Divulgación de las Ciencias, 17, Article 2303. https://doi.org/10.25267/Rev_Eureka_ensen_divulg_cienc.2020.v17.i2.2303

Cofré H, Cuevas E, Becerra B (2017) The relationship between biology teachers’ understanding of the NOS and the understanding and acceptance of the theory of evolution. Int J Sci Educ 39:2243–2260. https://doi.org/10.1080/09500693.2017.1373410

Cofré H, Nuñez P, Santibáñez D, Pavez JM, Valencia M, Vergara C (2019) A critical review of students’ and teachers’ understandings of NOS. Sci Educ 28:205–248. https://doi.org/10.1007/s11191-019-00051-3

Deng F, Chen D-T, Tsai C-C, Chai C-S (2011) Students’ views of the NOS: a critical review of research. Sci Educ 95:961–999

Dogan N, Manassero MA, Vázquez A (2020) Creative thinking in prospective science teachers: effects of problem and history of science based learning, 48. https://doi.org/10.17227/ted.num48-10926

Dowd JE, Thompson RJ Jr, Schiff LA, Reynolds JA (2018) Understanding the complex relationship between critical thinking and science reasoning among undergraduate thesis writers. CBE Life Sci Educ. https://doi.org/10.1187/cbe.17-03-0052

Ennis RH (1996) Critical thinking. Prentice, Hoboken

Ennis RH, Millman J (2005) Cornell Critical Thinking Test Level X. The Critical Thinking Company.

Ennis, R. H. (2019). Long definition of critical thinking . http://criticalthinking.net/definition/long-definition-of-critical-thinking/

Erduran S, y Dagher, Z. R. (eds) (2014) Reconceptualizing the Nature of Science for Science Education. Scientific Knowledge, Practices and Other Family Categories. Springer, Dordrecht

Erduran S, Kaya E (2018) Drawing nature of science in pre-service science teacher education: epistemic insight through visual representations. Res Sci Educ 48(6):1133–1149. https://doi.org/10.1007/s11165-018-9773-0

European Union (2014). Key competence development in school education in Europe. KeyCoNet’s review of the literature: A summary . http://keyconet.eun.org

Facione PA, Facione RN, Blohm SW, Howard K, Giancarlo CAF (1998) California Critical Thinking Skills Test: Manual (Revised). California Academic Press, California

Fisher A (2009) Critical thinking An introduction. Cambridge University Press, Cambridge

Fisher A (2021) What critical thinking is. In: Blair JA (ed) Studies in critical thinking, 2nd edn. University of Windsor, Canada, pp 7–26

Ford CL, Yore LD (2014) Toward convergence of critical thinking, metacognition, and reflection: Illustrations from natural and social sciences, teacher education, and classroom practice. In: Zohar A, Dori YJ (eds) Metacognition in science education. Springer, Berlin, pp 251–271

García-Mila M, Andersen C (2008) Cognitive foundations of learning argumentation. In: Erduran S, Jiménez-Aleixandre MP (eds) Argumentation in science education: perspectives from classroom-based research. Springer, Berlin, pp 29–45

García-Carmona A, Vázquez A, Manassero MA (2011) Current status and perspectives on teaching the nature of science: a review of teachers’ beliefs obstacles. Enseñanza de las Ciencias 28:403–412

González-Howard M, McNeill KL (2020) Acting with epistemic agency: characterizing student critique during argumentation discussions. Sci Educ 104:953–982

Greene JA, Sandoval WA, Bråten I (2016) Handbook of epistemic cognition. Routledge, London

Book   Google Scholar  

Halpern DF (2010) Halpern Critical Thinking Assessment. Schuhfried, Modling

He X, Deng Y, Saisai Y, Wang H (2020) The influence of context on the large-scale assessment of high school students’ epistemic cognition of scientific argumentation. Sci Educ 29:7–41. https://doi.org/10.1007/s11191-019-00088-4

Henderson JB, McNeill KL, Gonzalez-Howard M, Close K, Evans M (2018) Key challenges and future directions for educational research on scientific argumentation. J Res Sci Teach 55(1):5–18. https://doi.org/10.1002/tea.21412

Hofer BK, Pintrich PR (1997) The development of epistemological theories: beliefs about knowledge and knowing and their relation to learning. Rev Educ Res 67:88–140. https://doi.org/10.3102/00346543067001088

Khishfe R (2012) Nature of science and decision-making. Int J Sci Educ 34:67–100. https://doi.org/10.1080/09500693.2011.559490

Khishfe R, Alshaya FS, BouJaoude S, Mansour N, Alrudiyan KI (2017) Students’ understandings of nature of science and their arguments in the context of four socio-scientific issues. Int J Sci Educ 39:299–334

Kolstø SD (2001) Scientific literacy for citizenship: Tools for dealing with the science dimension of controversial socio-scientific issues. Sci Educ 85:291–310

Koray Ö, Köksal MS (2009) The effect of creative and critical thinking based laboratory applications on creative logical thinking abilities of prospective teachers. Asia-Pacific Forum Sci Learn Teach 10, Article 2. https://www.eduhk.hk/apfslt/download/v10_issue1_files/koksal.pdf

Kreitchmann RS, Abad FJ, Ponsoda V, Nieto MD, Morillo D (2019) Controlling for response biases in self-report scales: Forced-choice vs psychometric modeling of Likert items. Front Psychol. https://doi.org/10.3389/fpsyg.2019.02309

Kuhn D (2012) Enseñar a pensar [Education for thinking]. Amorrortu, Argentina

Lederman NG (2007) Nature of science: past, present, and future. In: Abell SK, Lederman NG (eds) Handbook of research on science education. Lawrence Erlbaum Associates, USA, pp 831–879

Lederman NG, Wade PD, Bell RL (1998) Assessing understanding of the NOS: A historical perspective. In: McComas WF (ed) The NOS in science education: Rationales and strategies. Kluwer, Netherland, pp 331–350

Liang LL, Chen S, Chen X, Kaya ON, Adams AD, Macklin M, Ebenezer J (2008) Assessing preservice elementary teachers’ views on the nature of scientific knowledge: a dual-response instrument. Asia- Pacific Forum Sci Learn Teach 9(1). http://www.ied.edu.hk/apfslt/v9_issue1/liang/index.htm

Madison, J. (2004). James Madison Critical Thinking Course . The Critical Thinking Co. https://www.criticalthinking.com/james-madison-critical-thinking-course.html

Manassero MA, Vázquez A, Acevedo JA (2003) Cuestionario de opiniones sobre ciencia, tecnologia y sociedad (COCTS) [Questionnaire of opinions on science, technology and society]. Educational Testing Service. https://store.ets.org/store/ets/en_US/pd/ThemeID.12805600/productID.39407800

Manassero-Mas MA, Vázquez-Alonso A (2019) Conceptualization and taxonomy to structure knowledge about science. Revista Eureka sobre Enseñanza y Divulgación de las Ciencias 16(3):3104. http://www.10.25267/Rev_Eureka_ensen_divulg_cienc.2019.v16.i3.3104

Manassero-Mas M, Vázquez-Alonso Á (2020a) Scientific thinking and critical thinking: transversal competences for learning. Indag Didact 12(4):401–420. https://doi.org/10.34624/id.v12i4.21808

Manassero-Mas MA, Vásquez-Alonso Á (2020b) Assessment of critical thinking skills: validation of free-culture tools. Tecné, Epistemé y Didaxis, 47:15–32. https://doi.org/10.17227/ted.num47-9801

Mason L, Scirica F (2006) Prediction of students’ argumentation skills about controversial topics by epistemological understanding. Learn Instr 16:492–509. https://doi.org/10.1016/j.learninstruc.2006.09.007

Matthews MR (2012) Changing the focus: From nature of science (NOS) to features of science (FOS). In: Khine MS (ed) Advances in nature of science research Concepts and methodologies. Springer, Berlin, pp 3–26

Chapter   Google Scholar  

McComas WF (1996) Ten myths of science: reexamining what we think we know about the NOS. Sch Sci Math 96:10–16

McDonald CV, McRobbie CJ (2012) Utilising argumentation to teach NOS. In: Fraser BJ, Tobin KG, McRobbie CJ (eds) Second international handbook of science education. Springer, Berlin, pp 969–986

Merton RK (1968) Social theory and social structure. Simon and Schuster, Newyork

National Research Council (2012) Education for life and work: Developing transferable knowledge and skills in the 21st century. The National Academies Press, USA

Noroozi O (2016) Considering students’ epistemic beliefs to facilitate their argumentative discourse and attitudinal change with a digital dialogue game. Innov Educ Teach Int 55(3):357–365. https://doi.org/10.1080/14703297.2016.1208112

O’Brien TC, Palmer R, Albarracin D (2021) Misplaced trust: When trust in science fosters belief in pseudoscience and the benefits of critical evaluation. J Exp Soc Psychol 96:104184. https://doi.org/10.1016/J.JESP.2021.104184

Olson JK (2018) The inclusion of the NOS in nine recent international science education standards documents. Sci Educ 27:637–660. https://doi.org/10.1007/s11191-018-9993-8

Osborne J (2014) Teaching critical thinking? new directions in science education. Sch Sci Rev 95:53–62

Paul R, Elder L (2008) The miniature guide to critical thinking: concepts and tools (5th ed.). Foundation for Critical Thinking Press

Rapanta C, Garcia-Mila M, Gilabert S (2013) What is meant by argumentative competence? an integrative review of methods of analysis and assessment in education. Rev Educ Res 83:483–520

Rubba PA, Schoneweg CS, Harkness WL (1996) A new scoring procedure for the views on science-technology-society instrument. Int J Sci Educ 18(4):387–400. https://doi.org/10.1080/0950069960180401

Santos LF (2017) The role of critical thinking in science education. J Educ Pract 8:159–173

Settlage J, Southerland SA (2020) Epistemic tools for science classrooms: the continual need to accommodate and adapt. Sci Educ 103(4):1112–1119. https://doi.org/10.1002/sce.21510

Simonneaux L (2014) From promoting the techno-sciences to activism – A variety of objectives involved in the teaching of SSIS. In: Bencze L, Alsop S (eds) Activist science and technology education. Springer, Berlin, pp 99–112

Sinatra GM, Southerland SA, McConaughy F, Demastes JW (2003) Intentions and beliefs in students’ understanding and acceptance of biological evolution. J Res Sci Teach 40:510–528. https://doi.org/10.1002/tea.10087

Sprod T (2014) Philosophical Inquiry and Critical Thinking in Science Education. In: Matthews MR (ed) International Handbook of Research in History, Philosophy and Science Teaching. Springer, Berlin, pp 1531–1564

Stathopoulou C, Vosnidou S (2007) Conceptual change in physics and physics-related epistemological beliefs: A relationship under scrutiny. In: Vosnidou S, Baltas A, Vamvakoussi X (eds) Re-framing the problem of conceptual change in learning and instruction. Elsevier, Amsterdam, pp 145–163

Suárez-Alvarez J, Pedrosa I, Lozano LM, García-Cueto E, Cuesta M, Muñiz J (2018) Using reversed items in likert scales: a questionable practice. Psicothema 30:149–158. https://doi.org/10.7334/psicothema2018.33

Torres N, Solbes J (2016) Contributions of a teaching intervention using socio-scientific issues to develop critical thinking. Enseñanza De Las Ciencias 34:43–65. https://doi.org/10.5565/rev/ensciencias.1638

U.S. Department of Labor Employment and Training Administration (1999). Understanding test quality-concepts of reliability and validity . https://hr-guide.com/Testing_and_Assessment/Reliability_and_Validity.htm

Vázquez-Alonso Á, Manassero-Mas MA (2018) Beyond science understanding: science education to develop thinking. Revista Electrónica de Enseñanza de las Ciencias 17:309–336. http://www.saum.uvigo.es/reec

Vázquez A, Manassero MA, Acevedo JA (2006) An analysis of complex multiple-choice science-technology-society items: Methodological development and preliminary results. Sci Educ 90: 681–706

Vergara AI, Balluerka N (2000) Methodology in cross-cultural research: current perspectives. Psicothema 12:557–562

Vesterinen VM, Manassero-Mas MA, Vázquez-Alonso Á (2014) History, philosophy, and sociology of science and science-technology-society traditions in science education: continuities and discontinuities. In Matthews MR (ed) International Handbook of Research in History, Philosophy and Science Teaching (pp 1895–1925). Springer

Vieira RM, Tenreiro-Vieira C, Martins IP (2011) Critical thinking: Conceptual clarification and its importance in science education. Sci Educ Int 22:43–54

Watson G, Glaser EM (2002) Watson-Glaser Critical Thinking Appraisal-II Form E. Pearson, London

Weinstock MP (2006) Psychological research and the epistemological approach to argumentation. Informal Logic 26:103–120

Yacoubian HA (2015) A framework for guiding future citizens to think critically about NOS and socioscientific issues. Can J Sci Math Technol Educ 15:248–260

Yacoubian HA, Khishfe R (2018) Argumentation, critical thinking, NOS and socioscientific issues: a dialogue between two researchers. Int J Sci Educ 40:796–807

Yang FY, Tsai CC (2010) Reasoning on the science-related uncertain issues and epistemological perspectives among children. Instr Sci 38:325–354

Yang FY, Tsai CC (2012) Personal epistemology and science learning: A review of studies. In: Fraser BJ, Tobin KG, McRobbie CJ (eds) Second international handbook of science education. Springer, Berlin, pp 259–280

Yang F-Y, Bhagat KK, Cheng C-H (2019) Associations of epistemic beliefs in science and scientific reasoning in university students from Taiwan and India. Int J Sci Educ 41:1347–1365. https://doi.org/10.1080/09500693.2019.1606960

Zeidler DL, Walker KA, Ackett WA, Simmons ML (2002) Tangled up in views: beliefs in the NOS and responses to socioscientific dilemmas. Sci Educ 86:343–367

Zeineddin A, Abd-El-Khalick F (2010) Scientific reasoning and epistemological commitments: coordination of theory and evidence among college science students. J Res Sci Teach 47:1064–1093. https://doi.org/10.1002/tea.20368

Download references

Acknowledgments

Grant EDU2015-64642-R of the Spanish State Research Agency and the European Regional Development Fund, European Union.

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This study is part of a research project funded by Grant No EDU2015-64642-R of the Spanish State Research Agency and the European Regional Development Fund, European Union.

Author information

Authors and affiliations.

Department of Psychology, University of the Balearic Islands, Palma, Spain

María Antonia Manassero-Mas

Centre for Postgraduate Studies, University of the Balearic Islands, Edificio Guillem Cifre de Colonya, Carretera de Valldemossa, Km. 7.5, 07122, Palma, Spain

Ángel Vázquez-Alonso

You can also search for this author in PubMed   Google Scholar

Contributions

Both authors declare their contribution to this study, their agreement with the content, their explicit consent to submit and that they obtained consent from the responsible authorities at the organization where the work has been carried out before the work was submitted. All authors contributed to the study conception and design, material preparation, data collection and analysis of the first draft of the manuscript, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ángel Vázquez-Alonso .

Ethics declarations

Conflict of interest.

The authors have no conflicts of interest or competing interests to declare regarding this article.

Ethical approval

This study was performed in accordance with the Declaration of Helsinki and the Ethics Committee of the University of the Balearic Islands approved the whole research project. Participants’informed consent was deemed not necessary because only the participants’ teachers developed the tasks involved in the study as ordinary learning classroom tasks, without any intervention of researchers. This manuscript is original, has not been published elsewhere and has not been submitted simultaneously to any other journal for consideration.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Manassero-Mas, M.A., Vázquez-Alonso, Á. An empirical analysis of the relationship between nature of science and critical thinking through science definitions and thinking skills. SN Soc Sci 2 , 270 (2022). https://doi.org/10.1007/s43545-022-00546-x

Download citation

Received : 11 December 2021

Accepted : 10 October 2022

Published : 08 December 2022

DOI : https://doi.org/10.1007/s43545-022-00546-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Nature of science
  • Critical thinking skills
  • Scientific literacy
  • Assessment of thinking skills
  • Epistemic assessment
  • Find a journal
  • Publish with us
  • Track your research

Mona S. Weissmark Ph.D.

The Power of Scientific Thinking in a Polarized World

Science is not just about facts. it's a way of thinking and interacting..

Posted March 20, 2023 | Reviewed by Davia Sills

  • Why Education Is Important
  • Take our ADHD Test
  • Find a Child Therapist
  • In today’s polarized society, conversations on so many topics often end up being debates, arguments, and politicized.
  • By contrast, scientific thinking is designed to facilitate conversations on contentious topics to foster understanding.
  • Students learn that the scientific thinking approach to contentious topics is a prerequisite for productive conversations.
  • The scientific thinking approach transforms polarizing views on diversity and social justice issues into mutual exploration and open dialogue.

Source: Qimono/Pixabay

The first woman scientist to win the Nobel Prize, Marie Curie, lived by the credo that scientific thinking could be of great value to society. She stressed that science education was the key to developing people's moral and intellectual strengths and that this would lead to a better national situation.

Curie and other Nobel Prize-winning scientists, such as Albert Einstein and Richard Feynman, held similar views on the value of science education to society and were concerned about the rise in militarism, fascism, and authoritarianism in World War II. They witnessed ethnic and national fanaticism bubbling around them and habits of thought familiar from ages past, reaching for control of people's minds. Because of such concerns, they stressed the humanizing power of scientific thinking and its vital role in a democratic society (Feynman, 1999).

Today, we are once again witnessing worries about the rise of militarism, fascism, and authoritarianism. National security experts have testified on Capitol Hill about the rise of authoritarianism to warn lawmakers that in some countries, leaders are seeking to gain power by undermining democratic systems.

Many scientists have been galvanized by what has been dubbed the "post-truth" era to speak out on the importance of science. Oxford Dictionaries defines "post-truth" as "Relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief" (Flood, 2016).

To counteract the effects of post-truth speech, disinformation, and misinformation, scientists are now urged to be trained in communication skills to convey public trust in science, rationality, and objective facts rather than appeals to emotions and beliefs. For the past few years, The National Academy of Sciences has held a series of colloquia in an effort to identify ways that might help scientists communicate more effectively with the public.

Most efforts to date have focused on improving the content, accessibility, and delivery of scientific communications. This communication skills approach relies on a "knowledge deficit model," the idea that the lack of support for science and "good policies" merely reflects a lack of scientific information.

Though it is important to supply the public with more scientific information on a range of topics, some evidence suggests that efforts to persuade the public often fail. For instance, communication strategies on vaccines and G.M.O.s intended to persuade the public that their religious or personal beliefs do not align with scientific facts often wind up backfiring. If people feel forced to accept scientific information, they may do the opposite to assert their autonomy (Weissmark, 2018a).

However, when areas of science are contentious, a missing fact is not the sole core of the problem. What is equally or more important is teaching the public to think scientifically (Weissmark, 2018b).

What is scientific thinking?

For nearly 20 years at Harvard, I have been teaching a course on advanced research methods and on the psychology of diversity, and, with my team of researchers, have been conducting research on the Science of Diversity using the scientific thinking approach. I have seen how attainable this skill is. Yet, when I first began to research scientific thinking, I was surprised by the striking gap between science education and scientific thinking education.

So, how is it different from everyday thinking?

Foremost, scientific thinking does not rely on armchair theorizing, political conviction, or personal opinion but on methods of empirical research (external observations) independently available to anyone as a means for opening up the world to scrutiny. All opinions are hypotheses to be tested empirically rather than appeals to emotion.

difference between scientific method and critical thinking

For instance, recent high-profile police shooting deaths of black men and women have raised contentious questions about the extent to which law enforcement officers are affected by racial biases. Some people think there are racial differences in police use of force due to racial bias and discrimination , whereas others think this can be explained by other factors. Researchers test these as working hypotheses.

Second, there is a feature of scientific thinking that is often not talked about explicitly. We might term this feature scientific integrity or honesty, and we just hope that students will catch on by example . When conducting a study, researchers are expected to report everything that might make it invalid or unreliable to give alternative interpretations of the data. Scientific thinking requires reporting specific details that could cast doubt on those interpretations and what could potentially be wrong with the conclusions.

For instance, if a researcher is reporting on police shootings and claims that studies have shown no racial bias in shootings and minorities are not in mortal danger from racist police, such a report would be incomplete because other studies have come to the opposite conclusion.

To encourage scientific thinking, the researcher would ask the public to consider the question: Why did studies on racial bias in police shootings reach such different conclusions?

This requires an awareness of the conflicting findings and reasons for the conflict. Guiding policy or activism by citing one-sided facts in support of an opinion without reference to conflicting data would then come to be seen as suspicious by the public (Weissmark, 2018b).

Third, scientific thinking considers all the facts and information to help others evaluate the value of the research, not simply the information that persuades judgment in one specific way. This approach encourages scientists to examine our assumptions and to be honest with themselves.

For instance, if I reported only on the studies showing that there is no evidence of racial bias in police shootings, why did I do so? We might term this principle of scientific thinking self-awareness —the ability to see our intentions and ourselves clearly.

Fourth, scientific thinking always remains tentative and refutable or subject to possible disconfirmation. The limitations of scientific thinking make us mindful of the errors in research—and the limitations of all human understanding.

Fifth, all scientific thinking is subject to error. It is better to study the causes and assess the importance of potential errors rather than to be unaware of the errors concealed in the data and the mind of the scientist.

Sixth, scientific thinking takes discipline and diligence . Thinking like a scientist keeps us constantly open to new ideas and questions before we reach conclusions. The scientific method encourages us to change our minds when the data suggest doing so and to be persistent in studying it again. When the results are not what we expected, we are pressed to find out why and to figure out a better approach.

In today's polarized society, conversations on so many topics often become debates, arguments, and politicized. By contrast, scientific thinking is designed to facilitate conversations on contentious topics between divergent viewpoints and foster understanding.

Data from many years of our course evaluations show that facilitated conversations using scientific thinking may have a transformational impact on people's lives.

When conversations in our classes on diversity get bogged down by opinions, we remind our students: "Let's use our scientific thinking life raft." It is an apt analogy. A raft can help us from sinking and becoming stuck in the workings of our own minds.

The list below highlights the differences between rational debate discourse and scientific thinking discourse. They are two different discourse approaches. Scientific thinking has the ability to transform one-dimensional, polarizing views into mutual understanding and open dialogue.

Rational Debating Discourse Versus Scientific Thinking Discourse

  • Argument versus hypothetical
  • Convince and persuade versus consider and investigate
  • Win/lose versus open to being wrong (chance findings)
  • Cherry-picking the data versus considering all the evidence
  • Eliminating contradictions versus calculating the contradictions (meta-analysis)
  • Final conclusion versus provisional
  • Claim versus suggest
  • Exaggerating versus citing the limitations
  • Personal viewpoint versus impersonal hypothesis
  • Belief versus doubt
  • Convinced versus skeptical
  • Seeking to prove a theory/belief/view versus disprove the null hypothesis
  • Come to the right "conclusion" versus do not jump to conclusions
  • One view/belief to prove versus holding that all hypothetical views have an equal value

In conclusion, it is a universal truth that diversity is a feature of nature. This is true of individuals, families, social classes, religious groups, ethnic groups, and nations. There will always be diverse polarized opinions with which people are passionately identified.

Scientific thinking is a fair, two-sided method for evaluating diverse views, fake news , misinformation, and disinformation and for engaging citizens in civic conversations to advance collective understanding. If the purpose of education is to increase our knowledge so we can get closer to the objective truth, then scientific thinking is a valuable tool.

Flood, A. (2016, November 15). ‘Post-truth’ named word of the year by Oxford dictionaries. The Guardian. https://www.theguardian.com/books/2016/nov/15/post-truth-named-word-of-the-year-by- oxford-dictionaries

Feynman, R.P. (1999). The pleasure of finding things out. Perseus Books Group.

Weissmark, M.S. (2018, February 7). Outlawing bias is doomed to fail. PsychologyToday. https://www.psychologytoday.com/us/blog/justice-matters/201802/outlawing-bias-is-doomed-fail

Weissmark, M.S. (2018, August 8). Evaluating psychology research. Psychology Today. https://www.psychologytoday.com/us/blog/justice-matters/201808/evaluating-psychology-research

Mona S. Weissmark Ph.D.

Mona Sue Weissmark, Ph.D. , is a psychology professor and founder of the Program Initiative for Global Mental Health Studies at Northwestern University.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Online Therapy
  • United States
  • Brooklyn, NY
  • Chicago, IL
  • Houston, TX
  • Los Angeles, CA
  • New York, NY
  • Portland, OR
  • San Diego, CA
  • San Francisco, CA
  • Seattle, WA
  • Washington, DC
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Self Tests NEW
  • Therapy Center
  • Diagnosis Dictionary
  • Types of Therapy

July 2024 magazine cover

Sticking up for yourself is no easy task. But there are concrete skills you can use to hone your assertiveness and advocate for yourself.

  • Emotional Intelligence
  • Gaslighting
  • Affective Forecasting
  • Neuroscience

COMMENTS

  1. The Relationship Between Scientific Method & Critical Thinking

    Critical thinking initiates the act of hypothesis. In the scientific method, the hypothesis is the initial supposition, or theoretical claim about the world, based on questions and observations. If critical thinking asks the question, then the hypothesis is the best attempt at the time to answer the question using observable phenomenon.

  2. Science, method and critical thinking

    The method, based on critical thinking, is embedded in the scientific method, named here the Critical Generative Method. Before illustrating the key requirements for critical thinking, one point must be made clear from the outset: thinking involves using language, and the depth of thought is directly related to the 'active' vocabulary ...

  3. Understanding the Complex Relationship between Critical Thinking and

    The purpose of this study was to better understand the relationship between students' critical-thinking skills and scientific reasoning skills as reflected in the genre of undergraduate thesis writing in biology departments at two research universities, the University of Minnesota and Duke University.

  4. Scientific Thinking and Critical Thinking in Science Education

    Once the differences, common aspects, and relationships between critical thinking and scientific thinking have been discussed, it would be relevant to establish some type of specific proposal to foster them in science classes. Table 5 includes a possible script to address various skills or processes of both types of thinking in an integrated ...

  5. PDF INTRODUCTION TO SCIENTIFIC THINKING

    one way of knowing about the world. The word science comes from th. Latin scientia, meaning knowledge. From a broad view, science is any systematic method of acqui. ing knowledge apart from ignorance. From a stricter view, though, science is specifically the. cquisition of knowledge using the scientific meth.

  6. Critical Thinking

    Critical Thinking. Critical thinking is a widely accepted educational goal. Its definition is contested, but the competing definitions can be understood as differing conceptions of the same basic concept: careful thinking directed to a goal. Conceptions differ with respect to the scope of such thinking, the type of goal, the criteria and norms ...

  7. Scientific Method

    The study of scientific method is the attempt to discern the activities by which that success is achieved. Among the activities often identified as characteristic of science are systematic observation and experimentation, inductive and deductive reasoning, and the formation and testing of hypotheses and theories.

  8. Understanding the Complex Relationship between Critical Thinking and

    Critical-thinking and scientific reasoning skills are core learning objectives of science education for all students, regardless of whether or not they intend to pursue a career ... examples of how critical thinking relates to the scientific method (Miri et al., 2007). In these examples, the important connection between writ-

  9. (PDF) Scientific thinking and critical thinking in ...

    For decades, scientific thinking and critical thinking have received particular atten- tion from different disciplines such as psyc hology, philosophy , pedagogy , and specific areas of this last ...

  10. Science and the Spectrum of Critical Thinking

    Both the scientific method and critical thinking are applications of logic and related forms of rationality that date to the Ancient Greeks. The full spectrum of critical/rational thinking includes logic, informal logic, and systemic or analytic thinking. This common core is shared by the natural sciences and other domains of inquiry share, and ...

  11. Critical Thinking

    Critical Thinking is the process of using and assessing reasons to evaluate statements, assumptions, and arguments in ordinary situations. The goal of this process is to help us have good beliefs, where "good" means that our beliefs meet certain goals of thought, such as truth, usefulness, or rationality. Critical thinking is widely ...

  12. Perspective: Dimensions of the scientific method

    The scientific method has been guiding biological research for a long time. It not only prescribes the order and types of activities that give a scientific study validity and a stamp of approval but also has substantially shaped how we collectively think about the endeavor of investigating nature. The advent of high-throughput data generation ...

  13. Conceptual review on scientific reasoning and scientific thinking

    Introduction. As part of high-order thinking processes, Scientific Reasoning (SR) and Scientific Thinking (ST) are concepts of great relevance for psychology and educational disciplines (Kuhn, 2009 ). The relevance of these concepts resides in two levels. First, the level of ontogenetical development (Zimmerman, 2007) reflected in the early ...

  14. 35 Scientific Thinking and Reasoning

    Abstract. Scientific thinking refers to both thinking about the content of science and the set of reasoning processes that permeate the field of science: induction, deduction, experimental design, causal reasoning, concept formation, hypothesis testing, and so on. Here we cover both the history of research on scientific thinking and the different approaches that have been used, highlighting ...

  15. PDF Science Literacy, Critical Thinking, and Scientific Literature

    The current science curriculum's de-emphasis of critical-thinking makes the goals of science literacy difficult to obtain. We intend to reintroduce critical thinking into learning science by providing guidelines on reading and evaluation of scientific literature: The ultimate goal is a more balanced, accurate depiction of the way in which ...

  16. Defining Critical Thinking

    Critical thinking is, in short, self-directed, self-disciplined, self-monitored, and self-corrective thinking. It presupposes assent to rigorous standards of excellence and mindful command of their use. It entails effective communication and problem solving abilities and a commitment to overcome our native egocentrism and sociocentrism.

  17. Scientific method

    The scientific method is critical to the development of scientific theories, which explain empirical (experiential) laws in a scientifically rational manner.In a typical application of the scientific method, a researcher develops a hypothesis, tests it through various means, and then modifies the hypothesis on the basis of the outcome of the tests and experiments.

  18. Teaching critical thinking in science

    Scientific inquiry includes three key areas: 1. Identifying a problem and asking questions about that problem. 2. Selecting information to respond to the problem and evaluating it. 3. Drawing conclusions from the evidence. Critical thinking can be developed through focussed learning activities. Students not only need to receive information but ...

  19. CRITICAL THINKING, THE SCIENTIFIC METHOD

    Because the scientific method is just a formalization of critical thinking, that means that the students become critical thinkers. And that is what I most want to teach. The basic idea: Explicitly discussing the logic and the thought processes that inform experimental methods works better than hoping students will "get it" if they hear enough ...

  20. Scientific method

    The scientific method is an empirical method for acquiring knowledge that has characterized the development of science since at ... careful collection of measurements or counts of relevant quantities is often the critical difference between pseudo-sciences, such as alchemy, and science, such as chemistry or biology. Scientific measurements are ...

  21. differences between scientific method and critical thinking

    The Relationship Between Scientific Method & Critical Thinking. What Is the Function of the Hypothesis? Critical thinking, that is the mind's ability to analyze claims about the

  22. An empirical analysis of the relationship between nature of ...

    Critical thinking (CRT) skills transversally pervade education and nature of science (NOS) knowledge is a key component of science literacy. Some science education researchers advocate that CRT skills and NOS knowledge have a mutual impact and relationship. However, few research studies have undertaken the empirical confirmation of this relationship and most fail to match the two terms of the ...

  23. The Power of Scientific Thinking in a Polarized World

    Scientific thinking is a fair, two-sided method for evaluating diverse views, fake news, misinformation, and disinformation and for engaging citizens in civic conversations to advance collective ...