Persuasive Writing In Three Steps: Thesis, Antithesis, Synthesis
April 14, 2021 by Ryan Law in
Great writing persuades. It persuades the reader that your product is right for them, that your process achieves the outcome they desire, that your opinion supersedes all other opinions. But spend an hour clicking around the internet and you’ll quickly realise that most content is passive, presenting facts and ideas without context or structure. The reader must connect the dots and create a convincing argument from the raw material presented to them. They rarely do, and for good reason: It’s hard work. The onus of persuasion falls on the writer, not the reader. Persuasive communication is a timeless challenge with an ancient solution. Zeno of Elea cracked it in the 5th century B.C. Georg Hegel gave it a lick of paint in the 1800s. You can apply it to your writing in three simple steps: thesis, antithesis, synthesis.
Use Dialectic to Find Logical Bedrock
“ Dialectic ” is a complicated-sounding idea with a simple meaning: It’s a structured process for taking two seemingly contradictory viewpoints and, through reasoned discussion, reaching a satisfactory conclusion. Over centuries of use the term has been burdened with the baggage of philosophy and academia. But at its heart, dialectics reflects a process similar to every spirited conversation or debate humans have ever had:
- Person A presents an idea: “We should travel to the Eastern waterhole because it’s closest to camp.”
- Person B disagrees and shares a counterargument: “I saw wolf prints on the Eastern trail, so we should go to the Western waterhole instead.”
- Person A responds to the counterargument , either disproving it or modifying their own stance to accommodate the criticism: “I saw those same wolf prints, but our party is large enough that the wolves won’t risk an attack.”
- Person B responds in a similar vein: “Ordinarily that would be true, but half of our party had dysentery last week so we’re not at full strength.”
- Person A responds: “They got dysentery from drinking at the Western waterhole.”
This process continues until conversational bedrock is reached: an idea that both parties understand and agree to, helped by the fact they’ve both been a part of the process that shaped it.
Dialectic is intended to help draw closer to the “truth” of an argument, tempering any viewpoint by working through and resolving its flaws. This same process can also be used to persuade.
Create Inevitability with Thesis, Antithesis, Synthesis
The philosopher Georg Hegel is most famous for popularizing a type of dialectics that is particularly well-suited to writing: thesis, antithesis, synthesis (also known, unsurprisingly, as Hegelian Dialectic ).
- Thesis: Present the status quo, the viewpoint that is currently accepted and widely held.
- Antithesis: Articulate the problems with the thesis. (Hegel also called this phase “the negative.”)
- Synthesis: Share a new viewpoint (a modified thesis) that resolves the problems.
Hegel’s method focused less on the search for absolute truth and more on replacing old ideas with newer, more sophisticated versions . That, in a nutshell, is the same objective as much of content marketing (and particularly thought leadership content ): We’re persuading the reader that our product, processes, and ideas are better and more useful than the “old” way of doing things. Thesis, antithesis, synthesis (or TAS) is a persuasive writing structure because it:
- Reduces complex arguments into a simple three-act structure. Complicated, nuanced arguments are simplified into a clear, concise format that anyone can follow. This simplification reflects well on the author: It takes mastery of a topic to explain it in it the simplest terms.
- Presents a balanced argument by “steelmanning” the best objection. Strong, one-sided arguments can trigger reactance in the reader: They don’t want to feel duped. TAS gives voice to their doubts, addressing their best objection and “giv[ing] readers the chance to entertain the other side, making them feel as though they have come to an objective conclusion.”
- Creates a sense of inevitability. Like a story building to a satisfying conclusion, articles written with TAS take the reader on a structured, logical journey that culminates in precisely the viewpoint we wish to advocate for. Doubts are voiced, ideas challenged, and the conclusion reached feels more valid and concrete as a result.
There are two main ways to apply TAS to your writing: Use it beef up your introductions, or apply it to your article’s entire structure.
Writing Article Introductions with TAS
Take a moment to scroll back to the top of this article. If I’ve done my job correctly, you’ll notice a now familiar formula staring back at you: The first three paragraphs are built around Hegel’s thesis, antithesis, synthesis structure. Here’s what the introduction looked like during the outlining process . The first paragraph shares the thesis, the accepted idea that great writing should be persuasive:
Next up, the antithesis introduces a complicating idea, explaining why most content marketing isn’t all that persuasive:
Finally, the synthesis shares a new idea that serves to reconcile the two previous paragraphs: Content can be made persuasive by using the thesis, antithesis, synthesis framework. The meat of the article is then focused on the nitty-gritty of the synthesis.
Introductions are hard, but thesis, antithesis, synthesis offers a simple way to write consistently persuasive opening copy. In the space of three short paragraphs, the article’s key ideas are shared , the entire argument is summarised, and—hopefully—the reader is hooked.
Best of all, most articles—whether how-to’s, thought leadership content, or even list content—can benefit from Hegelian Dialectic , for the simple reason that every article introduction should be persuasive enough to encourage the reader to stick around.
Structuring Entire Articles with TAS
Harder, but most persuasive, is to use thesis, antithesis, synthesis to structure your entire article. This works best for thought leadership content. Here, your primary objective is to advocate for a new idea and disprove the old, tired way of thinking—exactly the use case Hegel intended for his dialectic. It’s less useful for content that explores and illustrates a process, because the primary objective is to show the reader how to do something (like this article—otherwise, I would have written the whole darn thing using the framework). Arjun Sethi’s article The Hive is the New Network is a great example.
The article’s primary purpose is to explain why the “old” model of social networks is outmoded and offer a newer, better framework. (It would be equally valid—but less punchy—to publish this with the title “ Why the Hive is the New Network.”) The thesis, antithesis, synthesis structure shapes the entire article:
- Thesis: Facebook, Twitter, and Instagram grew by creating networks “that brought existing real-world relationships online.”
- Antithesis: As these networks grow, the less useful they become, skewing towards bots, “celebrity, meme and business accounts.”
- Synthesis: To survive continued growth, these networks need to embrace a new structure and become hives.
With the argument established, the vast majority of the article is focused on synthesis. After all, it requires little elaboration to share the status quo in a particular situation, and it’s relatively easy to point out the problems with a given idea. The synthesis—the solution that needs to reconcile both thesis and antithesis—is the hardest part to tackle and requires the greatest word count. Throughout the article, Arjun is systematically addressing the “best objections” to his theory and demonstrating why the “Hive” is the best solution:
- Antithesis: Why now? Why didn’t Hives emerge in the first place?
- Thesis: We were limited by technology, but today, we have the necessary infrastructure: “We’re no longer limited to a broadcast radio model, where one signal is received by many nodes. ...We sync with each other instantaneously, and all the time.”
- Antithesis: If the Hive is so smart, why aren’t our brightest and best companies already embracing it?
- Thesis: They are, and autonomous cars are a perfect example: “Why are all these vastly different companies converging on the autonomous car? That’s because for these companies, it’s about platform and hive, not just about roads without drivers.”
It takes bravery to tackle objections head-on and an innate understanding of the subject matter to even identify objections in the first place, but the effort is worthwhile. The end result is a structured journey through the arguments for and against the “Hive,” with the reader eventually reaching the same conclusion as the author: that “Hives” are superior to traditional networks.
Destination: Persuasion
Persuasion isn’t about cajoling or coercing the reader. Statistics and anecdotes alone aren’t all that persuasive. Simply sharing a new idea and hoping that it will trigger an about-turn in the reader’s beliefs is wishful thinking. Instead, you should take the reader on a journey—the same journey you travelled to arrive at your newfound beliefs, whether it’s about the superiority of your product or the zeitgeist-changing trend that’s about to break. Hegelian Dialectic—thesis, antithesis, synthesis— is a structured process for doing precisely that. It contextualises your ideas and explains why they matter. It challenges the idea and strengthens it in the process. Using centuries-old processes, it nudges the 21st-century reader onto a well-worn path that takes them exactly where they need to go.
Ryan is the Content Director at Ahrefs and former CMO of Animalz.
Why ‘Vertical Volatility’ Is the Missing Link in Your Keyword Strategy
Get insight and analysis on the world's top SaaS brands, each and every Monday morning.
Success! Now check your email to confirm your subscription.
There was an error submitting your subscription. Please try again.
- Scholarly Community Encyclopedia
- Log in/Sign up
Version | Summary | Created by | Modification | Content Size | Created at | Operation |
---|---|---|---|---|---|---|
1 | handwiki | -- | 1044 | 2022-10-31 01:38:44 |
Video Upload Options
- MDPI and ACS Style
- Chicago Style
In philosophy, the triad of thesis, antithesis, synthesis (German: These, Antithese, Synthese; originally: Thesis, Antithesis, Synthesis) is a progression of three ideas or propositions. The first idea, the thesis, is a formal statement illustrating a point; it is followed by the second idea, the antithesis, that contradicts or negates the first; and lastly, the third idea, the synthesis, resolves the conflict between the thesis and antithesis. It is often used to explain the dialectical method of German philosopher Georg Wilhelm Friedrich Hegel, but Hegel never used the terms himself; instead his triad was concrete, abstract, absolute. The thesis, antithesis, synthesis triad actually originated with Johann Fichte.
1. History of the Idea
Thomas McFarland (2002), in his Prolegomena to Coleridge's Opus Maximum , [ 1 ] identifies Immanuel Kant's Critique of Pure Reason (1781) as the genesis of the thesis/antithesis dyad. Kant concretises his ideas into:
- Thesis: "The world has a beginning in time, and is limited with regard to space."
- Antithesis: "The world has no beginning and no limits in space, but is infinite, in respect to both time and space."
Inasmuch as conjectures like these can be said to be resolvable, Fichte's Grundlage der gesamten Wissenschaftslehre ( Foundations of the Science of Knowledge , 1794) resolved Kant's dyad by synthesis, posing the question thus: [ 1 ]
- No synthesis is possible without a preceding antithesis. As little as antithesis without synthesis, or synthesis without antithesis, is possible; just as little possible are both without thesis.
Fichte employed the triadic idea "thesis–antithesis–synthesis" as a formula for the explanation of change. [ 2 ] Fichte was the first to use the trilogy of words together, [ 3 ] in his Grundriss des Eigentümlichen der Wissenschaftslehre, in Rücksicht auf das theoretische Vermögen (1795, Outline of the Distinctive Character of the Wissenschaftslehre with respect to the Theoretical Faculty ): "Die jetzt aufgezeigte Handlung ist thetisch, antithetisch und synthetisch zugleich." ["The action here described is simultaneously thetic, antithetic, and synthetic." [ 4 ] ]
Still according to McFarland, Schelling then, in his Vom Ich als Prinzip der Philosophie (1795), arranged the terms schematically in pyramidal form.
According to Walter Kaufmann (1966), although the triad is often thought to form part of an analysis of historical and philosophical progress called the Hegelian dialectic, the assumption is erroneous: [ 5 ]
Whoever looks for the stereotype of the allegedly Hegelian dialectic in Hegel's Phenomenology will not find it. What one does find on looking at the table of contents is a very decided preference for triadic arrangements. ... But these many triads are not presented or deduced by Hegel as so many theses, antitheses, and syntheses. It is not by means of any dialectic of that sort that his thought moves up the ladder to absolute knowledge.
Gustav E. Mueller (1958) concurs that Hegel was not a proponent of thesis, antithesis, and synthesis, and clarifies what the concept of dialectic might have meant in Hegel's thought. [ 6 ]
"Dialectic" does not for Hegel mean "thesis, antithesis, and synthesis." Dialectic means that any "ism" – which has a polar opposite, or is a special viewpoint leaving "the rest" to itself – must be criticized by the logic of philosophical thought, whose problem is reality as such, the "World-itself".
According to Mueller, the attribution of this tripartite dialectic to Hegel is the result of "inept reading" and simplistic translations which do not take into account the genesis of Hegel's terms:
Hegel's greatness is as indisputable as his obscurity. The matter is due to his peculiar terminology and style; they are undoubtedly involved and complicated, and seem excessively abstract. These linguistic troubles, in turn, have given rise to legends which are like perverse and magic spectacles – once you wear them, the text simply vanishes. Theodor Haering's monumental and standard work has for the first time cleared up the linguistic problem. By carefully analyzing every sentence from his early writings, which were published only in this century, he has shown how Hegel's terminology evolved – though it was complete when he began to publish. Hegel's contemporaries were immediately baffled, because what was clear to him was not clear to his readers, who were not initiated into the genesis of his terms. An example of how a legend can grow on inept reading is this: Translate "Begriff" by "concept," "Vernunft" by "reason" and "Wissenschaft" by "science" – and they are all good dictionary translations – and you have transformed the great critic of rationalism and irrationalism into a ridiculous champion of an absurd pan-logistic rationalism and scientism. The most vexing and devastating Hegel legend is that everything is thought in "thesis, antithesis, and synthesis." [ 7 ]
Karl Marx (1818–1883) and Friedrich Engels (1820–1895) adopted and extended the triad, especially in Marx's The Poverty of Philosophy (1847). Here, in Chapter 2, Marx is obsessed by the word "thesis"; [ 8 ] it forms an important part of the basis for the Marxist theory of history. [ 9 ]
2. Writing Pedagogy
In modern times, the dialectic of thesis, antithesis, and synthesis has been implemented across the world as a strategy for organizing expositional writing. For example, this technique is taught as a basic organizing principle in French schools: [ 10 ]
The French learn to value and practice eloquence from a young age. Almost from day one, students are taught to produce plans for their compositions, and are graded on them. The structures change with fashions. Youngsters were once taught to express a progression of ideas. Now they follow a dialectic model of thesis-antithesis-synthesis. If you listen carefully to the French arguing about any topic they all follow this model closely: they present an idea, explain possible objections to it, and then sum up their conclusions. ... This analytical mode of reasoning is integrated into the entire school corpus.
Thesis, Antithesis, and Synthesis has also been used as a basic scheme to organize writing in the English language. For example, the website WikiPreMed.com advocates the use of this scheme in writing timed essays for the MCAT standardized test: [ 11 ]
For the purposes of writing MCAT essays, the dialectic describes the progression of ideas in a critical thought process that is the force driving your argument. A good dialectical progression propels your arguments in a way that is satisfying to the reader. The thesis is an intellectual proposition. The antithesis is a critical perspective on the thesis. The synthesis solves the conflict between the thesis and antithesis by reconciling their common truths, and forming a new proposition.
- Samuel Taylor Coleridge: Opus Maximum. Princeton University Press, 2002, p. 89.
- Harry Ritter, Dictionary of Concepts in History. Greenwood Publishing Group (1986), p.114
- Williams, Robert R. (1992). Recognition: Fichte and Hegel on the Other. SUNY Press. p. 46, note 37.
- Fichte, Johann Gottlieb; Breazeale, Daniel (1993). Fichte: Early Philosophical Writings. Cornell University Press. p. 249.
- Walter Kaufmann (1966). "§ 37". Hegel: A Reinterpretation. Anchor Books. ISBN 978-0-268-01068-3. OCLC 3168016. https://archive.org/details/hegelreinterpret00kauf.
- Mueller, Gustav (1958). "The Hegel Legend of "Thesis-Antithesis-Synthesis"". Journal of the History of Ideas 19 (4): 411–414. doi:10.2307/2708045. https://dx.doi.org/10.2307%2F2708045
- Mueller 1958, p. 411.
- marxists.org: Chapter 2 of "The Poverty of Philosophy", by Karl Marx https://www.marxists.org/archive/marx/works/1847/poverty-philosophy/ch02.htm
- Shrimp, Kaleb (2009). "The Validity of Karl Marx's Theory of Historical Materialism". Major Themes in Economics 11 (1): 35–56. https://scholarworks.uni.edu/mtie/vol11/iss1/5/. Retrieved 13 September 2018.
- Nadeau, Jean-Benoit; Barlow, Julie (2003). Sixty Million Frenchmen Can't Be Wrong: Why We Love France But Not The French. Sourcebooks, Inc.. p. 62. https://archive.org/details/sixtymillionfren00nade_041.
- "The MCAT writing assignment.". Wisebridge Learning Systems, LLC. http://www.wikipremed.com/mcat_essay.php. Retrieved 1 November 2015.
- Terms and Conditions
- Privacy Policy
- Advisory Board
Literature Review Survival Library Guide: Thesis, antithesis and synthesis
- What is a literature review?
- Thesis, antithesis and synthesis
- 1. Choose your topic
- 2. Collect relevant material
- 3. Read/Skim articles
- 4. Group articles by themes
- 5. Use citation databases
- 6. Find agreement & disagreement
- Review Articles - A new option on Google Scholar
- How To Follow References
- Newspaper archives
- Aditi's Humanities Referencing Style Guide
- Referencing and RefWorks
- New-version RefWorks Demo
- Tracking Your Academic Footprint This link opens in a new window
- Finding Seminal Authors and Mapping the Shape of the Literature
- Types of Literature Review, including "Systematic Reviews in the Social Sciences: A Practical Guide"
- Research Data Management
- Tamzyn Suleiman's guide to Systematic Reviews
- Danielle Abrahamse's Search String Design and Search Template
Thesis, antithesis, synthesis
The classic pattern of academic arguments is:
Thesis, antithesis, synthesis.
An Idea (Thesis) is proposed, an opposing Idea (Antithesis) is proposed, and a revised Idea incorporating (Synthesis) the opposing Idea is arrived at. This revised idea sometimes sparks another opposing idea, another synthesis, and so on…
If you can show this pattern at work in your literature review, and, above all, if you can suggest a new synthesis of two opposing views, or demolish one of the opposing views, then you are almost certainly on the right track.
Next topic: Step 1: Choose your topic
- << Previous: What is a literature review?
- Next: 1. Choose your topic >>
- Last Updated: Apr 2, 2024 12:22 PM
- URL: https://libguides.lib.uct.ac.za/litreviewsurvival
- A-Z Databases
- Training Calendar
- Research Portal
Literature Review Survival Guide
- What is a literature review?
- Thesis, antithesis and synthesis
- 1. Choose your topic
- 2. Collect relevant material
- 3. Read/Skim articles
- 4. Group articles by themes
- 5. Use citation database
- 6. Find agreement & disagreement
CC-BY fostersartofchilling (flickr)
- << Previous: What is a literature review?
- Next: 1. Choose your topic >>
- Last Updated: Mar 18, 2020 8:44 PM
- URL: https://libguides.uwc.ac.za/litreviewsurvivalguide
UWC LIBRARY & INFORMATION SERVICES
Newest Articles
- Exploring the Role of Research Design and Methodology
- Writing Essays and Articles on Philosophy
- Intuitionism, Skepticism, and Agnosticism: A Comprehensive Overview
- Exploring Idealism: The History and Concepts of a Modern Philosophy
- Metaphysics
- Theory of Forms
- Epistemology
- Materialism
- Moral relativism
- Utilitarianism
- Virtue ethics
- Normative ethics
- Applied ethics
- Moral Psychology
- Philosophy of art
- Philosophy of language
- Philosophy of beauty
- Nature of Art
- Philosophy of Film
- Philosophy of Music
- Deductive reasoning
- Inductive reasoning
- Justification
- Perception and Knowledge
- Beliefs and Truth
- Modern philosophy
- Romanticism
- Analytic philosophy
- Enlightenment philosophy
- Existentialism
- Enlightenment
- Ancient philosophy
- Classical Greek philosophy
- Renaissance philosophy
- Medieval philosophy
- Pre-Socratic philosophy
- Hellenistic philosophy
- Presocratic philosophy
- Rationalism
- Scholasticism
- Jewish philosophy
- Early Islamic philosophy
- Reasoning and Argumentation
- Seeking Justice After a Tractor-Trailer Accident: Why You Need an Experienced Lawyer
- Critical Thinking
- Fallacies and logical errors
- Skepticism and doubt
- Creative Thinking
- Lateral thinking
- Thought experiments
- Argumentation and Logic
- Syllogisms and Deductive Reasoning
- Fallacies and Rebuttals
- Inductive Reasoning and Analogy
- Reasoning and Problem-Solving
- Critical Thinking and Decision Making
- Creative Thinking and Problem Solving
- Analytical Thinking and Reasoning
- Philosophical Writing and Analysis
- Argumentative Writing and Analysis
- Interpreting Philosophical Texts
- Philosophical Research Methods
- Qualitative Research Methods in Philosophy
- Quantitative Research Methods in Philosophy
- Research Design and Methodology
- Ethics and Morality
- Aesthetics and Beauty
- Metaphysical terms
- Ontological argument
- Ethical terms
- Aesthetic terms
- Metaphysical theories
- Kant's Categorical Imperative
- Aristotle's Four Causes
- Plato's Theory of Forms
- Hegel's Dialectic
- Ethical theories
- Aesthetic theories
- John Dewey's aesthetic theory
- Immanuel Kant's aesthetic theory
- Modern philosophical texts
- Foucault's The Order of Things
- Descartes' Meditations
- Nietzsche's Beyond Good and Evil
- Wittgenstein's Tractatus Logico-Philosophicus
- Ancient philosophical texts
- Kant's Critique of Pure Reason
- Hegel's Phenomenology of Spirit
- Aristotle's Nicomachean Ethics
- Plato's Republic
- Ancient philosophers
- Modern philosophers
- Modern philosophical schools
- German Idealism
- British Empiricism
- Ancient philosophical schools
- The Skeptic school
- The Cynic school
- The Stoic school
- The Epicurean school
- The Socratic school
- Philosophy of Language
- Semantics and Pragmatics of Language Usage
- Analytic-Synthetic Distinction
- Meaning of Words and Phrases
- Philosophy of Science
- Scientific Realism and Rationalism
- Induction and the Hypothetico-Deductive Model
- Theory-Ladenness and Underdetermination
- Philosophy of Mind
- Mind-Body Dualism and Emergentism
- Materialism and Physicalism
- Identity Theory and Personal Identity
- Philosophy of Religion
- Religious Pluralism and Exclusivism
- The Problem of Evil and Suffering
- Religious Experience and Faith
- Metaphysical Theories
- Idealism and Realism
- Determinism, Fatalism, and Libertarianism
- Phenomenalism and Nominalism
- Epistemological Theories
- Intuitionism, Skepticism, and Agnosticism
- Rationalism and Empiricism
- Foundationalism and Coherentism
- Aesthetic Theories
- Formalist Aesthetics, Emotional Aesthetics, Experiential Aesthetics
- Relational Aesthetics, Sociological Aesthetics, Historical Aesthetics
- Naturalistic Aesthetics, Immanent Aesthetics, Transcendental Aesthetics
- Ethical Theories
- Virtue Ethics, Utilitarianism, Deontology
- Subjectivism, Egoism, Hedonism
- Social Contract Theory, Natural Law Theory, Care Ethics
- Metaphysical Terms
- Cause, Necessity, Possibility, Impossibility
- Identity, Persistence, Time, Space
- Substance, Attribute, Essence, Accident
- Logic and Argumentation Terms
- Analogy, Syllogism, Deduction, Induction
- Inference, Validity, Soundness, Refutation
- Premise, Conclusion, Entailment, Contradiction
- Epistemological Terms
- Perception and Knowledge Claims
- Infallibility, Verifiability, Coherence Theory of Truth
- Justification, Beliefs and Truths
- Ethical Terms
- Modern Texts
- A Vindication of the Rights of Woman by Mary Wollstonecraft
- Thus Spoke Zarathustra by Friedrich Nietzsche
- The Critique of Pure Reason by Immanuel Kant
- Medieval Texts
- The Guide for the Perplexed by Moses Maimonides
- The Summa Theologiae by Thomas Aquinas
- The Incoherence of the Incoherence by Averroes
- Ancient Texts
- The Nicomachean Ethics by Aristotle
- The Art of Rhetoric by Cicero
- The Republic by Plato
- Hegel's Dialectic: A Comprehensive Overview
- Philosophical theories
Georg Wilhelm Friedrich Hegel's dialectic is one of the most influential philosophical theories of the modern era. It has been studied and debated for centuries, and its influence can be seen in many aspects of modern thought. Hegel's dialectic has been used to explain a wide range of topics from politics to art, from science to religion. In this comprehensive overview, we will explore the major tenets of Hegel's dialectic and its implications for our understanding of the world. Hegel's dialectic is based on the premise that all things have an inherent contradiction between their opposites.
It follows that any idea or concept can be understood through a synthesis of the two opposing forces. This synthesis creates a new and higher understanding, which then leads to further progress and development. Hegel's dialectic has been used in many different fields, from philosophy to economics, and it provides an important framework for understanding how our world works. In this article, we will explore the historical origins and development of Hegel's dialectic. We will also examine its application in various fields, from politics to art, from science to religion.
Finally, we will consider the implications of Hegel's dialectic for our understanding of the world today. Hegel's dialectic is a philosophical theory developed by German philosopher Georg Wilhelm Friedrich Hegel in the early 19th century. It is based on the concept of thesis, antithesis and synthesis , which are steps in the process of progress. The thesis is an idea or statement that is the starting point of an argument. The antithesis is a statement that contradicts or negates the thesis.
The synthesis is a combination of the two opposing ideas, which produces a new idea or statement. This process can be repeated multiple times, leading to an evolution of ideas. Hegel's dialectic has been used in many fields, such as politics and economics . It has been used to explain how ideas progress through debate and discussion.
In politics, it has been used to explain how different points of view can lead to compromise or resolution. In economics, it has been used to explain how different economic theories can lead to new solutions and strategies. Hegel's dialectic can also be applied to everyday life. For example, it can be used to resolve conflicts between people or groups.
Thesis, Antithesis and Synthesis
Thesis and antithesis are two conflicting ideas, while synthesis is the result of their interaction. The dialectic process is a way of understanding how the world works, as it helps to explain the constant flux of ideas and events. It also helps to explain how change and progress are possible. Thesis and antithesis can be thought of as two sides of a coin. One side represents an idea or opinion, while the other side represents its opposite.
When the two sides come together, they create a synthesis that incorporates both sides. This synthesis can then be used to create new ideas or opinions. The dialectic process can be applied in various contexts, such as politics and economics. In politics, it can be used to explain how different factions come together to create policies that are beneficial to all parties. In economics, it can be used to explain how supply and demand interact to create a stable market. Hegel's dialectic can also be used in everyday life.
Applications of Hegel's Dialectic
For example, in the political sphere, it can be used to explore how different ideologies can be reconciled or how compromises can be reached. In economics, Hegel's dialectic has been used to explain the process of economic growth and development. It can be seen as a way of understanding how different economic systems interact with each other and how different economic actors are affected by changes in the marketplace. For example, it can help to explain how different economic policies can lead to different outcomes. Hegel's dialectic has also been applied to other social sciences, such as sociology and anthropology. In particular, it has been used to explore how different social systems interact with each other and how different social groups are affected by changes in their environment.
Using Hegel's Dialectic in Everyday Life
This process can be used to explain how various aspects of life, such as career or relationships, evolve over time. Thesis represents an idea or concept, while antithesis represents the opposite of that idea or concept. Synthesis is the resolution between the two opposing forces. This process is repeated until a conclusion is reached.
For example, in a career conflict between two people, one might present an idea while the other presents the opposite idea. Through discussion and negotiation, the two parties can come to a synthesis that meets both their needs. Hegel's dialectic can also be used to resolve conflicts between groups of people. It involves each party presenting their ideas and opinions, then engaging in dialogue to reach a compromise or agreement.
This process can be applied to any area of life, from politics and economics to relationships and personal growth. It helps to create understanding and respect between different perspectives, allowing everyone to come together in a meaningful way. By understanding and applying Hegel's dialectic in everyday life, we can better navigate our relationships and interactions with others. Through dialogue, negotiation, and compromise we can work towards resolutions that benefit all parties involved.
In economics, it has been used to explain how market forces interact with each other and how different economic theories can be used to explain the same phenomenon. The dialectic has also been used in other fields such as philosophy, science, and psychology. In philosophy, it has been used to explain the relationship between theory and practice and how theories evolve over time. In science, it has been used to explain the relationship between empirical evidence and logical reasoning.
This theory can be applied to any area of life, from career to relationships. The core of Hegel's dialectic involves the concept of thesis, antithesis, and synthesis, which is a way of understanding how ideas evolve over time. In this way, the dialectic helps to identify contradictions in a situation and find a resolution through synthesis. In terms of its application to everyday life, the dialectic can be used to find common ground between two opposing sides. For example, if two people are in disagreement, the dialectic can help them identify the underlying issues and then work to resolve them.
Additionally, it can help individuals and groups identify areas where they have common interests, which can lead to more productive conversations and outcomes. The dialectic is also useful in understanding how different perspectives can lead to different solutions. By recognizing different points of view, individuals and groups can gain insight into why certain solutions may not work for everyone involved. This can help to create a more productive environment for collaboration. Finally, the dialectic can be used as a tool for self-reflection. By understanding how different ideas evolve over time and how different perspectives interact, individuals can gain insight into their own views and values.
For example, it can be used to explain the development of a new policy proposal or a new form of government. In economics, Hegel's dialectic can be used to explain the dynamics of supply and demand, or the emergence of a new economic system. In addition, Hegel's dialectic has been applied in other areas, such as education and religion. In education, this theory can be used to explain the process of learning and understanding new concepts. In religion, it can be used to explain the evolution of religious beliefs and practices over time.
This is followed by a synthesis of the two, which creates a new, higher form of understanding. This new understanding then forms the basis for further analysis, which can lead to further synthesis and resolution. Hegel's dialectic can be applied to any area of life, such as career or relationships. For example, if two people have different approaches to a problem, they can use the dialectic to work together to find a solution that works for both of them.
This could involve identifying their respective points of view and then looking for common ground where they can agree. As the synthesis forms, it can provide a basis for further discussion, which may eventually lead to a resolution. The same process can be used to resolve conflicts between groups, such as political parties or countries. By recognizing each side's point of view and then looking for common ground, it is possible to find ways to bridge the divide between them.
This can help create an atmosphere of mutual understanding and respect, which can lead to constructive dialogue and positive outcomes. Hegel's dialectic is a valuable tool for helping people and groups come to agreement and harmony despite their differences. By recognizing both sides' points of view and then looking for common ground, it is possible to create a synthesis that can provide a basis for further discussion and resolution. Hegel's dialectic is a powerful philosophical tool that helps to explain how ideas evolve over time. Through the concept of thesis, antithesis and synthesis, it provides a framework for understanding how opposing forces interact and ultimately create new ideas and solutions.
This theory has been applied to many areas, such as politics and economics, and can be used in everyday life. The article has provided a comprehensive overview of Hegel's dialectic and its various applications.
Top Articles
- Understanding Existentialism: A Brief Introduction
- Foundationalism and Coherentism: An Overview
- Lateral Thinking: An Overview
- Exploring Nietzsche's Beyond Good and Evil
- Understanding Utilitarianism: A Guide
- Exploring Plato's Theory of Forms
- Exploring the Philosophy of Immanuel Kant
- Exploring the Analytic-Synthetic Distinction in Philosophy of Language
- Philosophy of Art: Exploring Aesthetics and Beauty
- Exploring Expression: A Philosophical and Aesthetic Overview
- The Critique of Pure Reason by Immanuel Kant: A Comprehensive Overview
- Exploring the Philosophy of Beauty
- Induction and the Hypothetico-Deductive Model: A Comprehensive Overview
- Exploring the Life and Works of David Hume
- Philosophy of Film: Exploring Aesthetics and Types of Philosophy
- Understanding Inference, Validity, Soundness, and Refutation
- Materialism and Physicalism: Exploring the Philosophical Concepts
- Analytic Philosophy: A Comprehensive Overview
- Exploring Syllogisms and Deductive Reasoning
- Exploring Social Contract Theory, Natural Law Theory, and Care Ethics
- Analytic Philosophy: A Primer
- Exploring the Phenomenon: A Philosophical and Metaphysical Investigation
- Exploring the Theory of Forms: A Comprehensive Overview
- Exploring Aesthetic Theories: Formalism, Emotionalism and Experientialism
- The Stoic School: An Overview
- Exploring Cosmology: What We Know and What We Don't
- Virtue Ethics: What it is and How it Works
- Exploring Plato's Republic
- Understanding Virtue Ethics, Utilitarianism and Deontology
- Early Islamic Philosophy
- Idealism and Realism: A Philosophical Comparison
- A Comprehensive Overview of Kant's Categorical Imperative
- Exploring the History and Impact of Empiricism
- Exploring 'The Summa Theologiae' by Thomas Aquinas
- A Comprehensive Overview of Foucault's The Order of Things
- Aristotle: A Comprehensive Overview
- Understanding Inductive Reasoning
- Exploring Subjectivism, Egoism and Hedonism
- Exploring Skepticism and Doubt: A Philosophical and Critical Thinking Perspective
- Exploring Hellenistic Philosophy: An Introduction
- Exploring Rationalism and Empiricism
- Understanding Inductive Reasoning and Analogy
- Altruism: Exploring the Power of Selflessness
- Exploring Kant's Critique of Pure Reason
- An Introduction to Scholasticism and its Role in Medieval Philosophy
Exploring the Rationalism of Renaissance Philosophy
- Exploring Theology: A Comprehensive Overview
Understanding Fallacies and Rebuttals
- Exploring the Concept of Beauty
- Exploring Inference: A Philosophical Thinking Primer
- Exploring Theory-Ladenness and Underdetermination
- Understanding Utilitarianism
- Exploring Religious Pluralism and Exclusivism
- Exploring Hegel's Phenomenology of Spirit
- Exploring Moral Psychology: A Closer Look
- The Art of Rhetoric by Cicero: A Comprehensive Overview
- Exploring Pre-Socratic Philosophy: An Overview
- Understanding Fallacies and Logical Errors
- Exploring Virtue: A Philosophical and Ethical Perspective
- Deontology: An Introduction to an Ethical Theory
- Substance, Attribute, Essence, and Accident: A Philosophical and Metaphysical Overview
- Perception and Knowledge: An Overview
- Justification: A Comprehensive Overview
- Classical Greek Philosophy: A Comprehensive Overview
- Exploring Virtue Ethics: The Philosophical Theory
- Medieval Philosophy: An Overview
- Exploring Noumenon: A Philosophical and Metaphysical Overview
- A Comprehensive Overview of Presocratic Philosophy
- The Problem of Evil and Suffering: A Philosophical Exploration
- Epistemology: Understanding the Nature of Knowledge
- An Overview of Friedrich Nietzsche's Thus Spoke Zarathustra
- Exploring the Concept of Idealism
- Understanding Deontology: Ethics and Principles
- Exploring Identity, Persistence, Time, and Space
- Exploring the Skeptic School of Ancient Philosophy
- Exploring Aristotle's Four Causes
- The Cynic School: An In-depth Look
- Exploring Infallibility, Verifiability, and the Coherence Theory of Truth
- Exploring Egoism: What It Is and What It Means
- Understanding the Meaning of Words and Phrases
- Exploring the Socratic School: An Overview
- Explore The Epicurean School of Ancient Philosophy
- Exploring Jewish Philosophy
- Philosophy of Music: Exploring the Aesthetics of Sound
- Understanding Utilitarianism: An Overview
- Comparing Analogy, Syllogism, Deduction and Induction
Exploring Ontology: A Comprehensive Overview
- Exploring the Life and Work of Georg Wilhelm Friedrich Hegel
- Exploring Deductive Reasoning
New Articles
Which cookies do you want to accept?
- Anatomy & Physiology
- Astrophysics
- Earth Science
- Environmental Science
- Organic Chemistry
- Precalculus
- Trigonometry
- English Grammar
- U.S. History
- World History
... and beyond
- Socratic Meta
- Featured Answers
What is Hegel's concept of thesis, antithesis and synthesis, in simple terms?
- Table of Contents
- Random Entry
- Chronological
- Editorial Information
- About the SEP
- Editorial Board
- How to Cite the SEP
- Special Characters
- Advanced Tools
- Support the SEP
- PDFs for SEP Friends
- Make a Donation
- SEPIA for Libraries
- Entry Contents
Bibliography
Academic tools.
- Friends PDF Preview
- Author and Citation Info
- Back to Top
Hegel’s Dialectics
“Dialectics” is a term used to describe a method of philosophical argument that involves some sort of contradictory process between opposing sides. In what is perhaps the most classic version of “dialectics”, the ancient Greek philosopher, Plato (see entry on Plato ), for instance, presented his philosophical argument as a back-and-forth dialogue or debate, generally between the character of Socrates, on one side, and some person or group of people to whom Socrates was talking (his interlocutors), on the other. In the course of the dialogues, Socrates’ interlocutors propose definitions of philosophical concepts or express views that Socrates challenges or opposes. The back-and-forth debate between opposing sides produces a kind of linear progression or evolution in philosophical views or positions: as the dialogues go along, Socrates’ interlocutors change or refine their views in response to Socrates’ challenges and come to adopt more sophisticated views. The back-and-forth dialectic between Socrates and his interlocutors thus becomes Plato’s way of arguing against the earlier, less sophisticated views or positions and for the more sophisticated ones later.
“Hegel’s dialectics” refers to the particular dialectical method of argument employed by the 19th Century German philosopher, G.W.F. Hegel (see entry on Hegel ), which, like other “dialectical” methods, relies on a contradictory process between opposing sides. Whereas Plato’s “opposing sides” were people (Socrates and his interlocutors), however, what the “opposing sides” are in Hegel’s work depends on the subject matter he discusses. In his work on logic, for instance, the “opposing sides” are different definitions of logical concepts that are opposed to one another. In the Phenomenology of Spirit , which presents Hegel’s epistemology or philosophy of knowledge, the “opposing sides” are different definitions of consciousness and of the object that consciousness is aware of or claims to know. As in Plato’s dialogues, a contradictory process between “opposing sides” in Hegel’s dialectics leads to a linear evolution or development from less sophisticated definitions or views to more sophisticated ones later. The dialectical process thus constitutes Hegel’s method for arguing against the earlier, less sophisticated definitions or views and for the more sophisticated ones later. Hegel regarded this dialectical method or “speculative mode of cognition” (PR §10) as the hallmark of his philosophy and used the same method in the Phenomenology of Spirit [PhG], as well as in all of the mature works he published later—the entire Encyclopaedia of Philosophical Sciences (including, as its first part, the “Lesser Logic” or the Encyclopaedia Logic [EL]), the Science of Logic [SL], and the Philosophy of Right [PR].
Note that, although Hegel acknowledged that his dialectical method was part of a philosophical tradition stretching back to Plato, he criticized Plato’s version of dialectics. He argued that Plato’s dialectics deals only with limited philosophical claims and is unable to get beyond skepticism or nothingness (SL-M 55–6; SL-dG 34–5; PR, Remark to §31). According to the logic of a traditional reductio ad absurdum argument, if the premises of an argument lead to a contradiction, we must conclude that the premises are false—which leaves us with no premises or with nothing. We must then wait around for new premises to spring up arbitrarily from somewhere else, and then see whether those new premises put us back into nothingness or emptiness once again, if they, too, lead to a contradiction. Because Hegel believed that reason necessarily generates contradictions, as we will see, he thought new premises will indeed produce further contradictions. As he puts the argument, then,
the scepticism that ends up with the bare abstraction of nothingness or emptiness cannot get any further from there, but must wait to see whether something new comes along and what it is, in order to throw it too into the same empty abyss. (PhG-M §79)
Hegel argues that, because Plato’s dialectics cannot get beyond arbitrariness and skepticism, it generates only approximate truths, and falls short of being a genuine science (SL-M 55–6; SL-dG 34–5; PR, Remark to §31; cf. EL Remark to §81). The following sections examine Hegel’s dialectics as well as these issues in more detail.
1. Hegel’s description of his dialectical method
2. applying hegel’s dialectical method to his arguments, 3. why does hegel use dialectics, 4. is hegel’s dialectical method logical, 5. syntactic patterns and special terminology in hegel’s dialectics, english translations of key texts by hegel, english translations of other primary sources, secondary literature, other internet resources, related entries.
Hegel provides the most extensive, general account of his dialectical method in Part I of his Encyclopaedia of Philosophical Sciences , which is often called the Encyclopaedia Logic [EL]. The form or presentation of logic, he says, has three sides or moments (EL §79). These sides are not parts of logic, but, rather, moments of “every concept”, as well as “of everything true in general” (EL Remark to §79; we will see why Hegel thought dialectics is in everything in section 3 ). The first moment—the moment of the understanding—is the moment of fixity, in which concepts or forms have a seemingly stable definition or determination (EL §80).
The second moment—the “ dialectical ” (EL §§79, 81) or “ negatively rational ” (EL §79) moment—is the moment of instability. In this moment, a one-sidedness or restrictedness (EL Remark to §81) in the determination from the moment of understanding comes to the fore, and the determination that was fixed in the first moment passes into its opposite (EL §81). Hegel describes this process as a process of “self-sublation” (EL §81). The English verb “to sublate” translates Hegel’s technical use of the German verb aufheben , which is a crucial concept in his dialectical method. Hegel says that aufheben has a doubled meaning: it means both to cancel (or negate) and to preserve at the same time (PhG §113; SL-M 107; SL-dG 81–2; cf. EL the Addition to §95). The moment of understanding sublates itself because its own character or nature—its one-sidedness or restrictedness—destabilizes its definition and leads it to pass into its opposite. The dialectical moment thus involves a process of self -sublation, or a process in which the determination from the moment of understanding sublates itself , or both cancels and preserves itself , as it pushes on to or passes into its opposite.
The third moment—the “ speculative ” or “ positively rational ” (EL §§79, 82) moment—grasps the unity of the opposition between the first two determinations, or is the positive result of the dissolution or transition of those determinations (EL §82 and Remark to §82). Here, Hegel rejects the traditional, reductio ad absurdum argument, which says that when the premises of an argument lead to a contradiction, then the premises must be discarded altogether, leaving nothing. As Hegel suggests in the Phenomenology , such an argument
is just the skepticism which only ever sees pure nothingness in its result and abstracts from the fact that this nothingness is specifically the nothingness of that from which it results . (PhG-M §79)
Although the speculative moment negates the contradiction, it is a determinate or defined nothingness because it is the result of a specific process. There is something particular about the determination in the moment of understanding—a specific weakness, or some specific aspect that was ignored in its one-sidedness or restrictedness—that leads it to fall apart in the dialectical moment. The speculative moment has a definition, determination or content because it grows out of and unifies the particular character of those earlier determinations, or is “a unity of distinct determinations ” (EL Remark to §82). The speculative moment is thus “truly not empty, abstract nothing , but the negation of certain determinations ” (EL-GSH §82). When the result “is taken as the result of that from which it emerges”, Hegel says, then it is “in fact, the true result; in that case it is itself a determinate nothingness, one which has a content” (PhG-M §79). As he also puts it, “the result is conceived as it is in truth, namely, as a determinate negation [ bestimmte Negation]; a new form has thereby immediately arisen” (PhG-M §79). Or, as he says, “[b]ecause the result, the negation, is a determinate negation [bestimmte Negation ], it has a content ” (SL-dG 33; cf. SL-M 54). Hegel’s claim in both the Phenomenology and the Science of Logic that his philosophy relies on a process of “ determinate negation [ bestimmte Negation]” has sometimes led scholars to describe his dialectics as a method or doctrine of “determinate negation” (see entry on Hegel, section on Science of Logic ; cf. Rosen 1982: 30; Stewart 1996, 2000: 41–3; Winfield 1990: 56).
There are several features of this account that Hegel thinks raise his dialectical method above the arbitrariness of Plato’s dialectics to the level of a genuine science. First, because the determinations in the moment of understanding sublate themselves , Hegel’s dialectics does not require some new idea to show up arbitrarily. Instead, the movement to new determinations is driven by the nature of the earlier determinations and so “comes about on its own accord” (PhG-P §79). Indeed, for Hegel, the movement is driven by necessity (see, e.g., EL Remarks to §§12, 42, 81, 87, 88; PhG §79). The natures of the determinations themselves drive or force them to pass into their opposites. This sense of necessity —the idea that the method involves being forced from earlier moments to later ones—leads Hegel to regard his dialectics as a kind of logic . As he says in the Phenomenology , the method’s “proper exposition belongs to logic” (PhG-M §48). Necessity—the sense of being driven or forced to conclusions—is the hallmark of “logic” in Western philosophy.
Second, because the form or determination that arises is the result of the self-sublation of the determination from the moment of understanding, there is no need for some new idea to show up from the outside. Instead, the transition to the new determination or form is necessitated by earlier moments and hence grows out of the process itself. Unlike in Plato’s arbitrary dialectics, then—which must wait around until some other idea comes in from the outside—in Hegel’s dialectics “nothing extraneous is introduced”, as he says (SL-M 54; cf. SL-dG 33). His dialectics is driven by the nature, immanence or “inwardness” of its own content (SL-M 54; cf. SL-dG 33; cf. PR §31). As he puts it, dialectics is “the principle through which alone immanent coherence and necessity enter into the content of science” (EL-GSH Remark to §81).
Third, because later determinations “sublate” earlier determinations, the earlier determinations are not completely cancelled or negated. On the contrary, the earlier determinations are preserved in the sense that they remain in effect within the later determinations. When Being-for-itself, for instance, is introduced in the logic as the first concept of ideality or universality and is defined by embracing a set of “something-others”, Being-for-itself replaces the something-others as the new concept, but those something-others remain active within the definition of the concept of Being-for-itself. The something-others must continue to do the work of picking out individual somethings before the concept of Being-for-itself can have its own definition as the concept that gathers them up. Being-for-itself replaces the something-others, but it also preserves them, because its definition still requires them to do their work of picking out individual somethings (EL §§95–6).
The concept of “apple”, for example, as a Being-for-itself, would be defined by gathering up individual “somethings” that are the same as one another (as apples). Each individual apple can be what it is (as an apple) only in relation to an “other” that is the same “something” that it is (i.e., an apple). That is the one-sidedness or restrictedness that leads each “something” to pass into its “other” or opposite. The “somethings” are thus both “something-others”. Moreover, their defining processes lead to an endless process of passing back and forth into one another: one “something” can be what it is (as an apple) only in relation to another “something” that is the same as it is, which, in turn, can be what it is (an apple) only in relation to the other “something” that is the same as it is, and so on, back and forth, endlessly (cf. EL §95). The concept of “apple”, as a Being-for-itself, stops that endless, passing-over process by embracing or including the individual something-others (the apples) in its content. It grasps or captures their character or quality as apples . But the “something-others” must do their work of picking out and separating those individual items (the apples) before the concept of “apple”—as the Being-for-itself—can gather them up for its own definition. We can picture the concept of Being-for-itself like this:
Later concepts thus replace, but also preserve, earlier concepts.
Fourth, later concepts both determine and also surpass the limits or finitude of earlier concepts. Earlier determinations sublate themselves —they pass into their others because of some weakness, one-sidedness or restrictedness in their own definitions. There are thus limitations in each of the determinations that lead them to pass into their opposites. As Hegel says, “that is what everything finite is: its own sublation” (EL-GSH Remark to §81). Later determinations define the finiteness of the earlier determinations. From the point of view of the concept of Being-for-itself, for instance, the concept of a “something-other” is limited or finite: although the something-others are supposed to be the same as one another, the character of their sameness (e.g., as apples) is captured only from above, by the higher-level, more universal concept of Being-for-itself. Being-for-itself reveals the limitations of the concept of a “something-other”. It also rises above those limitations, since it can do something that the concept of a something-other cannot do. Dialectics thus allows us to get beyond the finite to the universal. As Hegel puts it, “all genuine, nonexternal elevation above the finite is to be found in this principle [of dialectics]” (EL-GSH Remark to §81).
Fifth, because the determination in the speculative moment grasps the unity of the first two moments, Hegel’s dialectical method leads to concepts or forms that are increasingly comprehensive and universal. As Hegel puts it, the result of the dialectical process
is a new concept but one higher and richer than the preceding—richer because it negates or opposes the preceding and therefore contains it, and it contains even more than that, for it is the unity of itself and its opposite. (SL-dG 33; cf. SL-M 54)
Like Being-for-itself, later concepts are more universal because they unify or are built out of earlier determinations, and include those earlier determinations as part of their definitions. Indeed, many other concepts or determinations can also be depicted as literally surrounding earlier ones (cf. Maybee 2009: 73, 100, 112, 156, 193, 214, 221, 235, 458).
Finally, because the dialectical process leads to increasing comprehensiveness and universality, it ultimately produces a complete series, or drives “to completion” (SL-dG 33; cf. SL-M 54; PhG §79). Dialectics drives to the “Absolute”, to use Hegel’s term, which is the last, final, and completely all-encompassing or unconditioned concept or form in the relevant subject matter under discussion (logic, phenomenology, ethics/politics and so on). The “Absolute” concept or form is unconditioned because its definition or determination contains all the other concepts or forms that were developed earlier in the dialectical process for that subject matter. Moreover, because the process develops necessarily and comprehensively through each concept, form or determination, there are no determinations that are left out of the process. There are therefore no left-over concepts or forms—concepts or forms outside of the “Absolute”—that might “condition” or define it. The “Absolute” is thus unconditioned because it contains all of the conditions in its content, and is not conditioned by anything else outside of it. This Absolute is the highest concept or form of universality for that subject matter. It is the thought or concept of the whole conceptual system for the relevant subject matter. We can picture the Absolute Idea (EL §236), for instance—which is the “Absolute” for logic—as an oval that is filled up with and surrounds numerous, embedded rings of smaller ovals and circles, which represent all of the earlier and less universal determinations from the logical development (cf. Maybee 2009: 30, 600):
Since the “Absolute” concepts for each subject matter lead into one another, when they are taken together, they constitute Hegel’s entire philosophical system, which, as Hegel says, “presents itself therefore as a circle of circles” (EL-GSH §15). We can picture the entire system like this (cf. Maybee 2009: 29):
Together, Hegel believes, these characteristics make his dialectical method genuinely scientific. As he says, “the dialectical constitutes the moving soul of scientific progression” (EL-GSH Remark to §81). He acknowledges that a description of the method can be more or less complete and detailed, but because the method or progression is driven only by the subject matter itself, this dialectical method is the “only true method” (SL-M 54; SL-dG 33).
So far, we have seen how Hegel describes his dialectical method, but we have yet to see how we might read this method into the arguments he offers in his works. Scholars often use the first three stages of the logic as the “textbook example” (Forster 1993: 133) to illustrate how Hegel’s dialectical method should be applied to his arguments. The logic begins with the simple and immediate concept of pure Being, which is said to illustrate the moment of the understanding. We can think of Being here as a concept of pure presence. It is not mediated by any other concept—or is not defined in relation to any other concept—and so is undetermined or has no further determination (EL §86; SL-M 82; SL-dG 59). It asserts bare presence, but what that presence is like has no further determination. Because the thought of pure Being is undetermined and so is a pure abstraction, however, it is really no different from the assertion of pure negation or the absolutely negative (EL §87). It is therefore equally a Nothing (SL-M 82; SL-dG 59). Being’s lack of determination thus leads it to sublate itself and pass into the concept of Nothing (EL §87; SL-M 82; SL-dG 59), which illustrates the dialectical moment.
But if we focus for a moment on the definitions of Being and Nothing themselves, their definitions have the same content. Indeed, both are undetermined, so they have the same kind of undefined content. The only difference between them is “something merely meant ” (EL-GSH Remark to §87), namely, that Being is an undefined content, taken as or meant to be presence, while Nothing is an undefined content, taken as or meant to be absence. The third concept of the logic—which is used to illustrate the speculative moment—unifies the first two moments by capturing the positive result of—or the conclusion that we can draw from—the opposition between the first two moments. The concept of Becoming is the thought of an undefined content, taken as presence (Being) and then taken as absence (Nothing), or taken as absence (Nothing) and then taken as presence (Being). To Become is to go from Being to Nothing or from Nothing to Being, or is, as Hegel puts it, “the immediate vanishing of the one in the other” (SL-M 83; cf. SL-dG 60). The contradiction between Being and Nothing thus is not a reductio ad absurdum , or does not lead to the rejection of both concepts and hence to nothingness—as Hegel had said Plato’s dialectics does (SL-M 55–6; SL-dG 34–5)—but leads to a positive result, namely, to the introduction of a new concept—the synthesis—which unifies the two, earlier, opposed concepts.
We can also use the textbook Being-Nothing-Becoming example to illustrate Hegel’s concept of aufheben (to sublate), which, as we saw, means to cancel (or negate) and to preserve at the same time. Hegel says that the concept of Becoming sublates the concepts of Being and Nothing (SL-M 105; SL-dG 80). Becoming cancels or negates Being and Nothing because it is a new concept that replaces the earlier concepts; but it also preserves Being and Nothing because it relies on those earlier concepts for its own definition. Indeed, it is the first concrete concept in the logic. Unlike Being and Nothing, which had no definition or determination as concepts themselves and so were merely abstract (SL-M 82–3; SL-dG 59–60; cf. EL Addition to §88), Becoming is a “ determinate unity in which there is both Being and Nothing” (SL-M 105; cf. SL-dG 80). Becoming succeeds in having a definition or determination because it is defined by, or piggy-backs on, the concepts of Being and Nothing.
This “textbook” Being-Nothing-Becoming example is closely connected to the traditional idea that Hegel’s dialectics follows a thesis-antithesis-synthesis pattern, which, when applied to the logic, means that one concept is introduced as a “thesis” or positive concept, which then develops into a second concept that negates or is opposed to the first or is its “antithesis”, which in turn leads to a third concept, the “synthesis”, that unifies the first two (see, e.g., McTaggert 1964 [1910]: 3–4; Mure 1950: 302; Stace, 1955 [1924]: 90–3, 125–6; Kosek 1972: 243; E. Harris 1983: 93–7; Singer 1983: 77–79). Versions of this interpretation of Hegel’s dialectics continue to have currency (e.g., Forster 1993: 131; Stewart 2000: 39, 55; Fritzman 2014: 3–5). On this reading, Being is the positive moment or thesis, Nothing is the negative moment or antithesis, and Becoming is the moment of aufheben or synthesis—the concept that cancels and preserves, or unifies and combines, Being and Nothing.
We must be careful, however, not to apply this textbook example too dogmatically to the rest of Hegel’s logic or to his dialectical method more generally (for a classic criticism of the thesis-antithesis-synthesis reading of Hegel’s dialectics, see Mueller 1958). There are other places where this general pattern might describe some of the transitions from stage to stage, but there are many more places where the development does not seem to fit this pattern very well. One place where the pattern seems to hold, for instance, is where the Measure (EL §107)—as the combination of Quality and Quantity—transitions into the Measureless (EL §107), which is opposed to it, which then in turn transitions into Essence, which is the unity or combination of the two earlier sides (EL §111). This series of transitions could be said to follow the general pattern captured by the “textbook example”: Measure would be the moment of the understanding or thesis, the Measureless would be the dialectical moment or antithesis, and Essence would be the speculative moment or synthesis that unifies the two earlier moments. However, before the transition to Essence takes place, the Measureless itself is redefined as a Measure (EL §109)—undercutting a precise parallel with the textbook Being-Nothing-Becoming example, since the transition from Measure to Essence would not follow a Measure-Measureless-Essence pattern, but rather a Measure-(Measureless?)-Measure-Essence pattern.
Other sections of Hegel’s philosophy do not fit the triadic, textbook example of Being-Nothing-Becoming at all, as even interpreters who have supported the traditional reading of Hegel’s dialectics have noted. After using the Being-Nothing-Becoming example to argue that Hegel’s dialectical method consists of “triads” whose members “are called the thesis, antithesis, synthesis” (Stace 1955 [1924]: 93), W.T. Stace, for instance, goes on to warn us that Hegel does not succeed in applying this pattern throughout the philosophical system. It is hard to see, Stace says, how the middle term of some of Hegel’s triads are the opposites or antitheses of the first term, “and there are even ‘triads’ which contain four terms!” (Stace 1955 [1924]: 97). As a matter of fact, one section of Hegel’s logic—the section on Cognition—violates the thesis-antithesis-synthesis pattern because it has only two sub-divisions, rather than three. “The triad is incomplete”, Stace complains. “There is no third. Hegel here abandons the triadic method. Nor is any explanation of his having done so forthcoming” (Stace 1955 [1924]: 286; cf. McTaggart 1964 [1910]: 292).
Interpreters have offered various solutions to the complaint that Hegel’s dialectics sometimes seems to violate the triadic form. Some scholars apply the triadic form fairly loosely across several stages (e.g. Burbidge 1981: 43–5; Taylor 1975: 229–30). Others have applied Hegel’s triadic method to whole sections of his philosophy, rather than to individual stages. For G.R.G. Mure, for instance, the section on Cognition fits neatly into a triadic, thesis-antithesis-synthesis account of dialectics because the whole section is itself the antithesis of the previous section of Hegel’s logic, the section on Life (Mure 1950: 270). Mure argues that Hegel’s triadic form is easier to discern the more broadly we apply it. “The triadic form appears on many scales”, he says, “and the larger the scale we consider the more obvious it is” (Mure 1950: 302).
Scholars who interpret Hegel’s description of dialectics on a smaller scale—as an account of how to get from stage to stage—have also tried to explain why some sections seem to violate the triadic form. J.N. Findlay, for instance—who, like Stace, associates dialectics “with the triad , or with triplicity ”—argues that stages can fit into that form in “more than one sense” (Findlay 1962: 66). The first sense of triplicity echoes the textbook, Being-Nothing-Becoming example. In a second sense, however, Findlay says, the dialectical moment or “contradictory breakdown” is not itself a separate stage, or “does not count as one of the stages”, but is a transition between opposed, “but complementary”, abstract stages that “are developed more or less concurrently” (Findlay 1962: 66). This second sort of triplicity could involve any number of stages: it “could readily have been expanded into a quadruplicity, a quintuplicity and so forth” (Findlay 1962: 66). Still, like Stace, he goes on to complain that many of the transitions in Hegel’s philosophy do not seem to fit the triadic pattern very well. In some triads, the second term is “the direct and obvious contrary of the first”—as in the case of Being and Nothing. In other cases, however, the opposition is, as Findlay puts it, “of a much less extreme character” (Findlay 1962: 69). In some triads, the third term obviously mediates between the first two terms. In other cases, however, he says, the third term is just one possible mediator or unity among other possible ones; and, in yet other cases, “the reconciling functions of the third member are not at all obvious” (Findlay 1962: 70).
Let us look more closely at one place where the “textbook example” of Being-Nothing-Becoming does not seem to describe the dialectical development of Hegel’s logic very well. In a later stage of the logic, the concept of Purpose goes through several iterations, from Abstract Purpose (EL §204), to Finite or Immediate Purpose (EL §205), and then through several stages of a syllogism (EL §206) to Realized Purpose (EL §210). Abstract Purpose is the thought of any kind of purposiveness, where the purpose has not been further determined or defined. It includes not just the kinds of purposes that occur in consciousness, such as needs or drives, but also the “internal purposiveness” or teleological view proposed by the ancient Greek philosopher, Aristotle (see entry on Aristotle ; EL Remark to §204), according to which things in the world have essences and aim to achieve (or have the purpose of living up to) their essences. Finite Purpose is the moment in which an Abstract Purpose begins to have a determination by fixing on some particular material or content through which it will be realized (EL §205). The Finite Purpose then goes through a process in which it, as the Universality, comes to realize itself as the Purpose over the particular material or content (and hence becomes Realized Purpose) by pushing out into Particularity, then into Singularity (the syllogism U-P-S), and ultimately into ‘out-thereness,’ or into individual objects out there in the world (EL §210; cf. Maybee 2009: 466–493).
Hegel’s description of the development of Purpose does not seem to fit the textbook Being-Nothing-Becoming example or the thesis-antithesis-synthesis model. According to the example and model, Abstract Purpose would be the moment of understanding or thesis, Finite Purpose would be the dialectical moment or antithesis, and Realized Purpose would be the speculative moment or synthesis. Although Finite Purpose has a different determination from Abstract Purpose (it refines the definition of Abstract Purpose), it is hard to see how it would qualify as strictly “opposed” to or as the “antithesis” of Abstract Purpose in the way that Nothing is opposed to or is the antithesis of Being.
There is an answer, however, to the criticism that many of the determinations are not “opposites” in a strict sense. The German term that is translated as “opposite” in Hegel’s description of the moments of dialectics (EL §§81, 82)— entgegensetzen —has three root words: setzen (“to posit or set”), gegen , (“against”), and the prefix ent -, which indicates that something has entered into a new state. The verb entgegensetzen can therefore literally be translated as “to set over against”. The “ engegengesetzte ” into which determinations pass, then, do not need to be the strict “opposites” of the first, but can be determinations that are merely “set against” or are different from the first ones. And the prefix ent -, which suggests that the first determinations are put into a new state, can be explained by Hegel’s claim that the finite determinations from the moment of understanding sublate (cancel but also preserve) themselves (EL §81): later determinations put earlier determinations into a new state by preserving them.
At the same time, there is a technical sense in which a later determination would still be the “opposite” of the earlier determination. Since the second determination is different from the first one, it is the logical negation of the first one, or is not -the-first-determination. If the first determination is “e”, for instance, because the new determination is different from that one, the new one is “not-e” (Kosek 1972: 240). Since Finite Purpose, for instance, has a definition or determination that is different from the definition that Abstract Purpose has, it is not -Abstract-Purpose, or is the negation or opposite of Abstract Purpose in that sense. There is therefore a technical, logical sense in which the second concept or form is the “opposite” or negation of—or is “not”—the first one—though, again, it need not be the “opposite” of the first one in a strict sense.
Other problems remain, however. Because the concept of Realized Purpose is defined through a syllogistic process, it is itself the product of several stages of development (at least four, by my count, if Realized Purpose counts as a separate determination), which would seem to violate a triadic model. Moreover, the concept of Realized Purpose does not, strictly speaking, seem to be the unity or combination of Abstract Purpose and Finite Purpose. Realized Purpose is the result of (and so unifies) the syllogistic process of Finite Purpose, through which Finite Purpose focuses on and is realized in a particular material or content. Realized Purpose thus seems to be a development of Finite Purpose, rather than a unity or combination of Abstract Purpose and Finite Purpose, in the way that Becoming can be said to be the unity or combination of Being and Nothing.
These sorts of considerations have led some scholars to interpret Hegel’s dialectics in a way that is implied by a more literal reading of his claim, in the Encyclopaedia Logic , that the three “sides” of the form of logic—namely, the moment of understanding, the dialectical moment, and the speculative moment—“are moments of each [or every; jedes ] logically-real , that is each [or every; jedes ] concept” (EL Remark to §79; this is an alternative translation). The quotation suggests that each concept goes through all three moments of the dialectical process—a suggestion reinforced by Hegel’s claim, in the Phenomenology , that the result of the process of determinate negation is that “a new form has thereby immediately arisen” (PhG-M §79). According to this interpretation, the three “sides” are not three different concepts or forms that are related to one another in a triad—as the textbook Being-Nothing-Becoming example suggests—but rather different momentary sides or “determinations” in the life, so to speak, of each concept or form as it transitions to the next one. The three moments thus involve only two concepts or forms: the one that comes first, and the one that comes next (examples of philosophers who interpret Hegel’s dialectics in this second way include Maybee 2009; Priest 1989: 402; Rosen 2014: 122, 132; and Winfield 1990: 56).
For the concept of Being, for example, its moment of understanding is its moment of stability, in which it is asserted to be pure presence. This determination is one-sided or restricted however, because, as we saw, it ignores another aspect of Being’s definition, namely, that Being has no content or determination, which is how Being is defined in its dialectical moment. Being thus sublates itself because the one-sidedness of its moment of understanding undermines that determination and leads to the definition it has in the dialectical moment. The speculative moment draws out the implications of these moments: it asserts that Being (as pure presence) implies nothing. It is also the “unity of the determinations in their comparison [ Entgegensetzung ]” (EL §82; alternative translation): since it captures a process from one to the other, it includes Being’s moment of understanding (as pure presence) and dialectical moment (as nothing or undetermined), but also compares those two determinations, or sets (- setzen ) them up against (- gegen ) each other. It even puts Being into a new state (as the prefix ent - suggests) because the next concept, Nothing, will sublate (cancel and preserve) Being.
The concept of Nothing also has all three moments. When it is asserted to be the speculative result of the concept of Being, it has its moment of understanding or stability: it is Nothing, defined as pure absence, as the absence of determination. But Nothing’s moment of understanding is also one-sided or restricted: like Being, Nothing is also an undefined content, which is its determination in its dialectical moment. Nothing thus sublates itself : since it is an undefined content , it is not pure absence after all, but has the same presence that Being did. It is present as an undefined content . Nothing thus sublates Being: it replaces (cancels) Being, but also preserves Being insofar as it has the same definition (as an undefined content) and presence that Being had. We can picture Being and Nothing like this (the circles have dashed outlines to indicate that, as concepts, they are each undefined; cf. Maybee 2009: 51):
In its speculative moment, then, Nothing implies presence or Being, which is the “unity of the determinations in their comparison [ Entgegensetzung ]” (EL §82; alternative translation), since it both includes but—as a process from one to the other—also compares the two earlier determinations of Nothing, first, as pure absence and, second, as just as much presence.
The dialectical process is driven to the next concept or form—Becoming—not by a triadic, thesis-antithesis-synthesis pattern, but by the one-sidedness of Nothing—which leads Nothing to sublate itself—and by the implications of the process so far. Since Being and Nothing have each been exhaustively analyzed as separate concepts, and since they are the only concepts in play, there is only one way for the dialectical process to move forward: whatever concept comes next will have to take account of both Being and Nothing at the same time. Moreover, the process revealed that an undefined content taken to be presence (i.e., Being) implies Nothing (or absence), and that an undefined content taken to be absence (i.e., Nothing) implies presence (i.e., Being). The next concept, then, takes Being and Nothing together and draws out those implications—namely, that Being implies Nothing, and that Nothing implies Being. It is therefore Becoming, defined as two separate processes: one in which Being becomes Nothing, and one in which Nothing becomes Being. We can picture Becoming this way (cf. Maybee 2009: 53):
In a similar way, a one-sidedness or restrictedness in the determination of Finite Purpose together with the implications of earlier stages leads to Realized Purpose. In its moment of understanding, Finite Purpose particularizes into (or presents) its content as “ something-presupposed ” or as a pre-given object (EL §205). I go to a restaurant for the purpose of having dinner, for instance, and order a salad. My purpose of having dinner particularizes as a pre-given object—the salad. But this object or particularity—e.g. the salad—is “inwardly reflected” (EL §205): it has its own content—developed in earlier stages—which the definition of Finite Purpose ignores. We can picture Finite Purpose this way:
In the dialectical moment, Finite Purpose is determined by the previously ignored content, or by that other content. The one-sidedness of Finite Purpose requires the dialectical process to continue through a series of syllogisms that determines Finite Purpose in relation to the ignored content. The first syllogism links the Finite Purpose to the first layer of content in the object: the Purpose or universality (e.g., dinner) goes through the particularity (e.g., the salad) to its content, the singularity (e.g., lettuce as a type of thing)—the syllogism U-P-S (EL §206). But the particularity (e.g., the salad) is itself a universality or purpose, “which at the same time is a syllogism within itself [ in sich ]” (EL Remark to §208; alternative translation), in relation to its own content. The salad is a universality/purpose that particularizes as lettuce (as a type of thing) and has its singularity in this lettuce here—a second syllogism, U-P-S. Thus, the first singularity (e.g., “lettuce” as a type of thing)—which, in this second syllogism, is the particularity or P —“ judges ” (EL §207) or asserts that “ U is S ”: it says that “lettuce” as a universality ( U ) or type of thing is a singularity ( S ), or is “this lettuce here”, for instance. This new singularity (e.g. “this lettuce here”) is itself a combination of subjectivity and objectivity (EL §207): it is an Inner or identifying concept (“lettuce”) that is in a mutually-defining relationship (the circular arrow) with an Outer or out-thereness (“this here”) as its content. In the speculative moment, Finite Purpose is determined by the whole process of development from the moment of understanding—when it is defined by particularizing into a pre-given object with a content that it ignores—to its dialectical moment—when it is also defined by the previously ignored content. We can picture the speculative moment of Finite Purpose this way:
Finite Purpose’s speculative moment leads to Realized Purpose. As soon as Finite Purpose presents all the content, there is a return process (a series of return arrows) that establishes each layer and redefines Finite Purpose as Realized Purpose. The presence of “this lettuce here” establishes the actuality of “lettuce” as a type of thing (an Actuality is a concept that captures a mutually-defining relationship between an Inner and an Outer [EL §142]), which establishes the “salad”, which establishes “dinner” as the Realized Purpose over the whole process. We can picture Realized Purpose this way:
If Hegel’s account of dialectics is a general description of the life of each concept or form, then any section can include as many or as few stages as the development requires. Instead of trying to squeeze the stages into a triadic form (cf. Solomon 1983: 22)—a technique Hegel himself rejects (PhG §50; cf. section 3 )—we can see the process as driven by each determination on its own account: what it succeeds in grasping (which allows it to be stable, for a moment of understanding), what it fails to grasp or capture (in its dialectical moment), and how it leads (in its speculative moment) to a new concept or form that tries to correct for the one-sidedness of the moment of understanding. This sort of process might reveal a kind of argument that, as Hegel had promised, might produce a comprehensive and exhaustive exploration of every concept, form or determination in each subject matter, as well as raise dialectics above a haphazard analysis of various philosophical views to the level of a genuine science.
We can begin to see why Hegel was motivated to use a dialectical method by examining the project he set for himself, particularly in relation to the work of David Hume and Immanuel Kant (see entries on Hume and Kant ). Hume had argued against what we can think of as the naïve view of how we come to have scientific knowledge. According to the naïve view, we gain knowledge of the world by using our senses to pull the world into our heads, so to speak. Although we may have to use careful observations and do experiments, our knowledge of the world is basically a mirror or copy of what the world is like. Hume argued, however, that naïve science’s claim that our knowledge corresponds to or copies what the world is like does not work. Take the scientific concept of cause, for instance. According to that concept of cause, to say that one event causes another is to say that there is a necessary connection between the first event (the cause) and the second event (the effect), such that, when the first event happens, the second event must also happen. According to naïve science, when we claim (or know) that some event causes some other event, our claim mirrors or copies what the world is like. It follows that the necessary, causal connection between the two events must itself be out there in the world. However, Hume argued, we never observe any such necessary causal connection in our experience of the world, nor can we infer that one exists based on our reasoning (see Hume’s A Treatise of Human Nature , Book I, Part III, Section II; Enquiry Concerning Human Understanding , Section VII, Part I). There is nothing in the world itself that our idea of cause mirrors or copies.
Kant thought Hume’s argument led to an unacceptable, skeptical conclusion, and he rejected Hume’s own solution to the skepticism (see Kant’s Critique of Pure Reason , B5, B19–20). Hume suggested that our idea of causal necessity is grounded merely in custom or habit, since it is generated by our own imaginations after repeated observations of one sort of event following another sort of event (see Hume’s A Treatise of Human Nature , Book I, Section VI; Hegel also rejected Hume’s solution, see EL §39). For Kant, science and knowledge should be grounded in reason, and he proposed a solution that aimed to reestablish the connection between reason and knowledge that was broken by Hume’s skeptical argument. Kant’s solution involved proposing a Copernican revolution in philosophy ( Critique of Pure Reason , Bxvi). Nicholas Copernicus was the Polish astronomer who said that the earth revolves around the sun, rather than the other way around. Kant proposed a similar solution to Hume’s skepticism. Naïve science assumes that our knowledge revolves around what the world is like, but, Hume’s criticism argued, this view entails that we cannot then have knowledge of scientific causes through reason. We can reestablish a connection between reason and knowledge, however, Kant suggested, if we say—not that knowledge revolves around what the world is like—but that knowledge revolves around what we are like . For the purposes of our knowledge, Kant said, we do not revolve around the world—the world revolves around us. Because we are rational creatures, we share a cognitive structure with one another that regularizes our experiences of the world. This intersubjectively shared structure of rationality—and not the world itself—grounds our knowledge.
However, Kant’s solution to Hume’s skepticism led to a skeptical conclusion of its own that Hegel rejected. While the intersubjectively shared structure of our reason might allow us to have knowledge of the world from our perspective, so to speak, we cannot get outside of our mental, rational structures to see what the world might be like in itself. As Kant had to admit, according to his theory, there is still a world in itself or “Thing-in-itself” ( Ding an sich ) about which we can know nothing (see, e.g., Critique of Pure Reason , Bxxv–xxvi). Hegel rejected Kant’s skeptical conclusion that we can know nothing about the world- or Thing-in-itself, and he intended his own philosophy to be a response to this view (see, e.g., EL §44 and the Remark to §44).
How did Hegel respond to Kant’s skepticism—especially since Hegel accepted Kant’s Copernican revolution, or Kant’s claim that we have knowledge of the world because of what we are like, because of our reason? How, for Hegel, can we get out of our heads to see the world as it is in itself? Hegel’s answer is very close to the ancient Greek philosopher Aristotle’s response to Plato. Plato argued that we have knowledge of the world only through the Forms. The Forms are perfectly universal, rational concepts or ideas. Because the world is imperfect, however, Plato exiled the Forms to their own realm. Although things in the world get their definitions by participating in the Forms, those things are, at best, imperfect copies of the universal Forms (see, e.g., Parmenides 131–135a). The Forms are therefore not in this world, but in a separate realm of their own. Aristotle argued, however, that the world is knowable not because things in the world are imperfect copies of the Forms, but because the Forms are in things themselves as the defining essences of those things (see, e.g., De Anima [ On the Soul ], Book I, Chapter 1 [403a26–403b18]; Metaphysics , Book VII, Chapter 6 [1031b6–1032a5] and Chapter 8 [1033b20–1034a8]).
In a similar way, Hegel’s answer to Kant is that we can get out of our heads to see what the world is like in itself—and hence can have knowledge of the world in itself—because the very same rationality or reason that is in our heads is in the world itself . As Hegel apparently put it in a lecture, the opposition or antithesis between the subjective and objective disappears by saying, as the Ancients did,
that nous governs the world, or by our own saying that there is reason in the world, by which we mean that reason is the soul of the world, inhabits it, and is immanent in it, as it own, innermost nature, its universal. (EL-GSH Addition 1 to §24)
Hegel used an example familiar from Aristotle’s work to illustrate this view:
“to be an animal”, the kind considered as the universal, pertains to the determinate animal and constitutes its determinate essentiality. If we were to deprive a dog of its animality we could not say what it is. (EL-GSH Addition 1 to §24; cf. SL-dG 16–17, SL-M 36-37)
Kant’s mistake, then, was that he regarded reason or rationality as only in our heads, Hegel suggests (EL §§43–44), rather than in both us and the world itself (see also below in this section and section 4 ). We can use our reason to have knowledge of the world because the very same reason that is in us, is in the world itself as it own defining principle. The rationality or reason in the world makes reality understandable, and that is why we can have knowledge of, or can understand, reality with our rationality. Dialectics—which is Hegel’s account of reason—characterizes not only logic, but also “everything true in general” (EL Remark to §79).
But why does Hegel come to define reason in terms of dialectics, and hence adopt a dialectical method? We can begin to see what drove Hegel to adopt a dialectical method by returning once again to Plato’s philosophy. Plato argued that we can have knowledge of the world only by grasping the Forms, which are perfectly universal, rational concepts or ideas. Because things in the world are so imperfect, however, Plato concluded that the Forms are not in this world, but in a realm of their own. After all, if a human being were perfectly beautiful, for instance, then he or she would never become not-beautiful. But human beings change, get old, and die, and so can be, at best, imperfect copies of the Form of beauty—though they get whatever beauty they have by participating in that Form. Moreover, for Plato, things in the world are such imperfect copies that we cannot gain knowledge of the Forms by studying things in the world, but only through reason, that is, only by using our rationality to access the separate realm of the Forms (as Plato argued in the well-known parable of the cave; Republic , Book 7, 514–516b).
Notice, however, that Plato’s conclusion that the Forms cannot be in this world and so must be exiled to a separate realm rests on two claims. First, it rests on the claim that the world is an imperfect and messy place—a claim that is hard to deny. But it also rests on the assumption that the Forms—the universal, rational concepts or ideas of reason itself—are static and fixed, and so cannot grasp the messiness within the imperfect world. Hegel is able to link reason back to our messy world by changing the definition of reason. Instead of saying that reason consists of static universals, concepts or ideas, Hegel says that the universal concepts or forms are themselves messy . Against Plato, Hegel’s dialectical method allows him to argue that universal concepts can “overgrasp” (from the German verb übergreifen ) the messy, dialectical nature of the world because they, themselves, are dialectical . Moreover, because later concepts build on or sublate (cancel, but also preserve) earlier concepts, the later, more universal concepts grasp the dialectical processes of earlier concepts. As a result, higher-level concepts can grasp not only the dialectical nature of earlier concepts or forms, but also the dialectical processes that make the world itself a messy place. The highest definition of the concept of beauty, for instance, would not take beauty to be fixed and static, but would include within it the dialectical nature or finiteness of beauty, the idea that beauty becomes, on its own account, not-beauty. This dialectical understanding of the concept of beauty can then overgrasp the dialectical and finite nature of beauty in the world, and hence the truth that, in the world, beautiful things themselves become not-beautiful, or might be beautiful in one respect and not another. Similarly, the highest determination of the concept of “tree” will include within its definition the dialectical process of development and change from seed to sapling to tree. As Hegel says, dialectics is “the principle of all natural and spiritual life” (SL-M 56; SL-dG 35), or “the moving soul of scientific progression” (EL §81). Dialectics is what drives the development of both reason as well as of things in the world. A dialectical reason can overgrasp a dialectical world.
Two further journeys into the history of philosophy will help to show why Hegel chose dialectics as his method of argument. As we saw, Hegel argues against Kant’s skepticism by suggesting that reason is not only in our heads, but in the world itself. To show that reason is in the world itself, however, Hegel has to show that reason can be what it is without us human beings to help it. He has to show that reason can develop on its own, and does not need us to do the developing for it (at least for those things in the world that are not human-created). As we saw (cf. section 1 ), central to Hegel’s dialectics is the idea that concepts or forms develop on their own because they “self-sublate”, or sublate (cancel and preserve) themselves , and so pass into subsequent concepts or forms on their own accounts, because of their own, dialectical natures. Thus reason, as it were, drives itself, and hence does not need our heads to develop it. Hegel needs an account of self-driving reason to get beyond Kant’s skepticism.
Ironically, Hegel derives the basic outlines of his account of self-driving reason from Kant. Kant divided human rationality into two faculties: the faculty of the understanding and the faculty of reason. The understanding uses concepts to organize and regularize our experiences of the world. Reason’s job is to coordinate the concepts and categories of the understanding by developing a completely unified, conceptual system, and it does this work, Kant thought, on its own, independently of how those concepts might apply to the world. Reason coordinates the concepts of the understanding by following out necessary chains of syllogisms to produce concepts that achieve higher and higher levels of conceptual unity. Indeed, this process will lead reason to produce its own transcendental ideas, or concepts that go beyond the world of experience. Kant calls this necessary, concept-creating reason “speculative” reason (cf. Critique of Pure Reason , Bxx–xxi, A327/B384). Reason creates its own concepts or ideas—it “speculates”—by generating new and increasingly comprehensive concepts of its own, independently of the understanding. In the end, Kant thought, reason will follow out such chains of syllogisms until it develops completely comprehensive or unconditioned universals—universals that contain all of the conditions or all of the less-comprehensive concepts that help to define them. As we saw (cf. section 1 ), Hegel’s dialectics adopts Kant’s notion of a self-driving and concept-creating “speculative” reason, as well as Kant’s idea that reason aims toward unconditioned universality or absolute concepts.
Ultimately, Kant thought, reasons’ necessary, self-driving activity will lead it to produce contradictions—what he called the “antinomies”, which consist of a thesis and antithesis. Once reason has generated the unconditioned concept of the whole world, for instance, Kant argued, it can look at the world in two, contradictory ways. In the first antinomy, reason can see the world (1) as the whole totality or as the unconditioned, or (2) as the series of syllogisms that led up to that totality. If reason sees the world as the unconditioned or as a complete whole that is not conditioned by anything else, then it will see the world as having a beginning and end in terms of space and time, and so will conclude (the thesis) that the world has a beginning and end or limit. But if reason sees the world as the series, in which each member of the series is conditioned by the previous member, then the world will appear to be without a beginning and infinite, and reason will conclude (the antithesis) that the world does not have a limit in terms of space and time (cf. Critique of Pure Reason , A417–18/B445–6). Reason thus leads to a contradiction: it holds both that the world has a limit and that it does not have a limit at the same time. Because reason’s own process of self-development will lead it to develop contradictions or to be dialectical in this way, Kant thought that reason must be kept in check by the understanding. Any conclusions that reason draws that do not fall within the purview of the understanding cannot be applied to the world of experience, Kant said, and so cannot be considered genuine knowledge ( Critique of Pure Reason , A506/B534).
Hegel adopts Kant’s dialectical conception of reason, but he liberates reason for knowledge from the tyranny of the understanding. Kant was right that reason speculatively generates concepts on its own, and that this speculative process is driven by necessity and leads to concepts of increasing universality or comprehensiveness. Kant was even right to suggest—as he had shown in the discussion of the antinomies—that reason is dialectical, or necessarily produces contradictions on its own. Again, Kant’s mistake was that he fell short of saying that these contradictions are in the world itself. He failed to apply the insights of his discussion of the antinomies to “ things in themselves ” (SL-M 56; SL-dG 35; see also section 4 ). Indeed, Kant’s own argument proves that the dialectical nature of reason can be applied to things themselves. The fact that reason develops those contradictions on its own, without our heads to help it , shows that those contradictions are not just in our heads, but are objective, or in the world itself. Kant, however, failed to draw this conclusion, and continued to regard reason’s conclusions as illusions. Still, Kant’s philosophy vindicated the general idea that the contradictions he took to be illusions are both objective—or out there in the world—and necessary. As Hegel puts it, Kant vindicates the general idea of “the objectivity of the illusion and the necessity of the contradiction which belongs to the nature of thought determinations” (SL-M 56; cf. SL-dG 35), or to the nature of concepts themselves.
The work of Johann Gottlieb Fichte (see entry on Fichte ) showed Hegel how dialectics can get beyond Kant—beyond the contradictions that, as Kant had shown, reason (necessarily) develops on its own, beyond the reductio ad absurdum argument (which, as we saw above, holds that a contradiction leads to nothingness), and beyond Kant’s skepticism, or Kant’s claim that reason’s contradictions must be reined in by the understanding and cannot count as knowledge. Fichte argued that the task of discovering the foundation of all human knowledge leads to a contradiction or opposition between the self and the not-self (it is not important, for our purposes, why Fichte held this view). The kind of reasoning that leads to this contradiction, Fichte said, is the analytical or antithetical method of reasoning, which involves drawing out an opposition between elements (in this case, the self and not-self) that are being compared to, or equated with, one another. While the traditional reductio ad absurdum argument would lead us to reject both sides of the contradiction and start from scratch, Fichte argued that the contradiction or opposition between the self and not-self can be resolved. In particular, the contradiction is resolved by positing a third concept—the concept of divisibility—which unites the two sides ( The Science of Knowledge , I: 110–11; Fichte 1982: 108–110). The concept of divisibility is produced by a synthetic procedure of reasoning, which involves “discovering in opposites the respect in which they are alike ” ( The Science of Knowledge , I: 112–13; Fichte 1982: 111). Indeed, Fichte argued, not only is the move to resolve contradictions with synthetic concepts or judgments possible, it is necessary . As he says of the move from the contradiction between self and not-self to the synthetic concept of divisibility,
there can be no further question as to the possibility of this [synthesis], nor can any ground for it be given; it is absolutely possible, and we are entitled to it without further grounds of any kind. ( The Science of Knowledge , I: 114; Fichte 1982: 112)
Since the analytical method leads to oppositions or contradictions, he argued, if we use only analytic judgments, “we not only do not get very far, as Kant says; we do not get anywhere at all” ( The Science of Knowledge , I: 113; Fichte 1982: 112). Without the synthetic concepts or judgments, we are left, as the classic reductio ad absurdum argument suggests, with nothing at all. The synthetic concepts or judgments are thus necessary to get beyond contradiction without leaving us with nothing.
Fichte’s account of the synthetic method provides Hegel with the key to moving beyond Kant. Fichte suggested that a synthetic concept that unifies the results of a dialectically-generated contradiction does not completely cancel the contradictory sides, but only limits them. As he said, in general, “[t]o limit something is to abolish its reality, not wholly , but in part only” ( The Science of Knowledge , I: 108; Fichte 1982: 108). Instead of concluding, as a reductio ad absurdum requires, that the two sides of a contradiction must be dismissed altogether, the synthetic concept or judgment retroactively justifies the opposing sides by demonstrating their limit, by showing which part of reality they attach to and which they do not ( The Science of Knowledge , I: 108–10; Fichte 1982: 108–9), or by determining in what respect and to what degree they are each true. For Hegel, as we saw (cf. section 1 ), later concepts and forms sublate—both cancel and preserve —earlier concepts and forms in the sense that they include earlier concepts and forms in their own definitions. From the point of view of the later concepts or forms, the earlier ones still have some validity, that is, they have a limited validity or truth defined by the higher-level concept or form.
Dialectically generated contradictions are therefore not a defect to be reigned in by the understanding, as Kant had said, but invitations for reason to “speculate”, that is, for reason to generate precisely the sort of increasingly comprehensive and universal concepts and forms that Kant had said reason aims to develop. Ultimately, Hegel thought, as we saw (cf. section 1 ), the dialectical process leads to a completely unconditioned concept or form for each subject matter—the Absolute Idea (logic), Absolute Spirit (phenomenology), Absolute Idea of right and law ( Philosophy of Right ), and so on—which, taken together, form the “circle of circles” (EL §15) that constitutes the whole philosophical system or “Idea” (EL §15) that both overgrasps the world and makes it understandable (for us).
Note that, while Hegel was clearly influenced by Fichte’s work, he never adopted Fichte’s triadic “thesis—antithesis—synthesis” language in his descriptions of his own philosophy (Mueller 1958: 411–2; Solomon 1983: 23), though he did apparently use it in his lectures to describe Kant’s philosophy (LHP III: 477). Indeed, Hegel criticized formalistic uses of the method of “ triplicity [Triplizität]” (PhG-P §50) inspired by Kant—a criticism that could well have been aimed at Fichte. Hegel argued that Kantian-inspired uses of triadic form had been reduced to “a lifeless schema” and “an actual semblance [ eigentlichen Scheinen ]” (PhG §50; alternative translation) that, like a formula in mathematics, was simply imposed on top of subject matters. Instead, a properly scientific use of Kant’s “triplicity” should flow—as he said his own dialectical method did (see section 1 )—out of “the inner life and self-movement” (PhG §51) of the content.
Scholars have often questioned whether Hegel’s dialectical method is logical. Some of their skepticism grows out of the role that contradiction plays in his thought and argument. While many of the oppositions embedded in the dialectical development and the definitions of concepts or forms are not contradictions in the strict sense, as we saw ( section 2 , above), scholars such as Graham Priest have suggested that some of them arguably are (Priest 1989: 391). Hegel even holds, against Kant (cf. section 3 above), that there are contradictions, not only in thought, but also in the world. Motion, for instance, Hegel says, is an “ existent contradiction”. As he describes it:
Something moves, not because now it is here and there at another now, but because in one and the same now it is here and not here, because in this here, it is and is not at the same time. (SL-dG 382; cf. SL-M 440)
Kant’s sorts of antinomies (cf. section 3 above) or contradictions more generally are therefore, as Hegel puts it in one place, “in all objects of all kinds, in all representations, concepts and ideas” (EL-GSH Remark to §48). Hegel thus seems to reject, as he himself explicitly claims (SL-M 439–40; SL-dG 381–82), the law of non-contradiction, which is a fundamental principle of formal logic—the classical, Aristotelian logic (see entries on Aristotle’s Logic and Contradiction ) that dominated during Hegel’s lifetime as well as the dominant systems of symbolic logic today (cf. Priest 1989: 391; Düsing 2010: 97–103). According to the law of non-contradiction, something cannot be both true and false at the same time or, put another way, “x” and “not-x” cannot both be true at the same time.
Hegel’s apparent rejection of the law of non-contradiction has led some interpreters to regard his dialectics as illogical, even “absurd” (Popper 1940: 420; 1962: 330; 2002: 443). Karl R. Popper, for instance, argued that accepting Hegel’s and other dialecticians’ rejection of the law of non-contradiction as part of both a logical theory and a general theory of the world “would mean a complete breakdown of science” (Popper 1940: 408; 1962: 317; 2002: 426). Since, according to today’s systems of symbolic logic, he suggested, the truth of a contradiction leads logically to any claim (any claim can logically be inferred from two contradictory claims), if we allow contradictory claims to be valid or true together, then we would have no reason to rule out any claim whatsoever (Popper 1940: 408–410; 1962: 317–319; 2002: 426–429).
Popper was notoriously hostile toward Hegel’s work (cf. Popper 2013: 242–289; for a scathing criticism of Popper’s analysis see Kaufmann 1976 [1972]), but, as Priest has noted (Priest 1989: 389–91), even some sympathetic interpreters have been inspired by today’s dominant systems of symbolic logic to hold that the kind of contradiction that is embedded in Hegel’s dialectics cannot be genuine contradiction in the strict sense. While Dieter Wandschneider, for instance, grants that his sympathetic theory of dialectic “is not presented as a faithful interpretation of the Hegelian text” (Wandschneider 2010: 32), he uses the same logical argument that Popper offered in defense of the claim that “dialectical contradiction is not a ‘normal’ contradiction, but one that is actually only an apparent contradiction” (Wandschneider 2010: 37). The suggestion (by the traditional, triadic account of Hegel’s dialectics, cf. section 2 , above) that Being and Nothing (or non-being) is a contradiction, for instance, he says, rests on an ambiguity. Being is an undefined content, taken to mean being or presence, while Nothing is an undefined content, taken to mean nothing or absence ( section 2 , above; cf. Wandschneider 2010: 34–35). Being is Nothing (or non-being) with respect to the property they have as concepts, namely, that they both have an undefined content. But Being is not Nothing (or non-being) with respect to their meaning (Wandschneider 2010: 34–38). The supposed contradiction between them, then, Wandschneider suggests, takes place “in different respects ”. It is therefore only an apparent contradiction. “Rightly understood”, he concludes, “there can be no talk of contradiction ” (Wandschneider 2010: 38).
Inoue Kazumi also argues that dialectical contradiction in the Hegelian sense does not violate the law of non-contradiction (Inoue 2014: 121–123), and he rejects Popper’s claim that Hegel’s dialectical method is incompatible with good science. A dialectical contradiction, Inoue says, is a contradiction that arises when the same topic is considered from different vantage points, but each vantage point by itself does not violate the law of non-contradiction (Inoue 2014: 120). The understanding leads to contradictions, as Hegel said (cf. section 3 above), because it examines a topic from a fixed point of view; reason embraces contradictions because it examines a topic from multiple points of view (Inoue 2014: 121). The geocentric theory that the sun revolves around the Earth and the heliocentric theory that the Earth revolves around the sun, for instance, Inoue suggests, are both correct from certain points of view. We live our everyday lives from a vantage point in which the sun makes a periodic rotation around the Earth roughly every 24 hours. Astronomers make their observations from a geocentric point of view and then translate those observations into a heliocentric one. From these points of view, the geocentric account is not incorrect. But physics, particularly in its concepts of mass and force, requires the heliocentric account. For science—which takes all these points of view into consideration—both theories are valid: they are dialectically contradictory, though neither theory, by itself, violates the law of non-contradiction (Inoue 2014: 126–127). To insist that the Earth really revolves around the sun is merely an irrational, reductive prejudice, theoretically and practically (Inoue 2014: 126). Dialectical contradictions, Inoue says, are, as Hegel said, constructive: they lead to concepts or points of view that grasp the world from ever wider and more encompassing perspectives, culminating ultimately in the “Absolute” (Inoue 2014: 121; cf. section 1 , above). Hegel’s claim that motion violates the law of non-contradiction, Inoue suggests, is an expression of the idea that contradictory claims can be true when motion is described from more than one point of view (Inoue 2014: 123). (For a similar reading of Hegel’s conception of dialectical contradiction, which influenced Inoue’s account [Inoue 2014: 121], see Düsing 2010: 102–103.)
Other interpreters, however, have been inspired by Hegel’s dialectics to develop alternative systems of logic that do not subscribe to the law of non-contradiction. Priest, for instance, has defended Hegel’s rejection of the law of non-contradiction (cf. Priest 1989; 1997 [2006: 4]). The acceptance of some contradictions, he has suggested, does not require the acceptance of all contradictions (Priest 1989: 392). Popper’s logical argument is also unconvincing. Contradictions lead logically to any claim whatsoever, as Popper said, only if we presuppose that nothing can be both true and false at the same time (i.e. only if we presuppose that the law of non-contradiction is correct), which is just what Hegel denies. Popper’s logical argument thus assumes what it is supposed to prove or begs the question (Priest 1989: 392; 1997 [2006: 5–6]), and so is not convincing. Moreover, consistency (not allowing contradictions), Priest suggests, is actually “a very weak constraint” (Priest 1997 [2006: 104]) on what counts as a rational inference. Other principles or criteria—such as being strongly disproved (or supported) by the data—are more important for determining whether a claim or inference is rational (Priest 1997 [2006: 105]). And, as Hegel pointed out, Priest says, the data—namely, “the world as it appears ” (as Hegel puts it in EL) or “ordinary experience itself” (as Hegel puts it in SL)—suggest that there are indeed contradictions (EL Remark to §48; SL-dG 382; cf. SL-M 440; Priest 1989: 389, 399–400). Hegel is right, for instance, Priest argues, that change, and motion in particular, are examples of real or existing contradictions (Priest 1985; 1989: 396–97; 1997 [2006: 172–181, 213–15]). What distinguishes motion, as a process, from a situation in which something is simply here at one time and then some other place at some other time is the embodiment of contradiction: that, in a process of motion, there is one (span of) time in which something is both here and not here at the same time (in that span of time) (Priest 1985: 340–341; 1997 [2006: 172–175, 213–214]). A system of logic, Priest suggests, is always just a theory about what good reasoning should be like (Priest 1989: 392). A dialectical logic that admits that there are “dialetheia” or true contradictions (Priest 1989: 388), he says, is a broader theory or version of logic than traditional, formal logics that subscribe to the law of non-contradiction. Those traditional logics apply only to topics or domains that are consistent, primarily domains that are “static and changeless” (Priest 1989: 391; cf. 395); dialectical/dialetheic logic handles consistent domains, but also applies to domains in which there are dialetheia. Thus Priest, extending Hegel’s own concept of aufheben (“to sublate”; cf. section 1 , above), suggests that traditional “formal logic is perfectly valid in its domain, but dialectical (dialetheic) logic is more general” (Priest 1989: 395). (For an earlier example of a logical system that allows contradiction and was inspired in part by Hegel [and Marx], see Jaśkowski 1999: 36 [1969: 143] [cf. Inoue 2014: 128–129]. For more on dialetheic logic generally, see the entry on Dialetheism .)
Worries that Hegel’s arguments fail to fit his account of dialectics (see section 2 , above) have led some interpreters to conclude that his method is arbitrary or that his works have no single dialectical method at all (Findlay 1962: 93; Solomon 1983: 21). These interpreters reject the idea that there is any logical necessity to the moves from stage to stage. “[T]he important point to make here, and again and again”, Robert C. Solomon writes, for instance,
is that the transition from the first form to the second, or the transition from the first form of the Phenomenology all the way to the last, is not in any way a deductive necessity. The connections are anything but entailments, and the Phenomenology could always take another route and other starting points. (Solomon 1983: 230)
In a footnote to this passage, Solomon adds “that a formalization of Hegel’s logic, however ingenious, is impossible” (Solomon 1983: 230).
Some scholars have argued that Hegel’s necessity is not intended to be logical necessity. Walter Kaufmann suggested, for instance, that the necessity at work in Hegel’s dialectic is a kind of organic necessity. The moves in the Phenomenology , he said, follow one another “in the way in which, to use a Hegelian image from the preface, bud, blossom and fruit succeed each other” (Kaufmann 1965: 148; 1966: 132). Findlay argued that later stages provide what he called a “ higher-order comment ” on earlier stages, even if later stages do not follow from earlier ones in a trivial way (Findlay 1966: 367). Solomon suggested that the necessity that Hegel wants is not “‘necessity’ in the modern sense of ‘logical necessity,’” (Solomon 1983: 209), but a kind of progression (Solomon 1983: 207), or a “necessity within a context for some purpose ” (Solomon 1983: 209). John Burbidge defines Hegel’s necessity in terms of three senses of the relationship between actuality and possibility, only the last of which is logical necessity (Burbidge 1981: 195–6).
Other scholars have defined the necessity of Hegel’s dialectics in terms of a transcendental argument. A transcendental argument begins with uncontroversial facts of experience and tries to show that other conditions must be present—or are necessary—for those facts to be possible. Jon Stewart argues, for instance, that “Hegel’s dialectic in the Phenomenology is a transcendental account” in this sense, and thus has the necessity of that form of argument (Stewart 2000: 23; cf. Taylor 1975: 97, 226–7; for a critique of this view, see Pinkard 1988: 7, 15).
Some scholars have avoided these debates by interpreting Hegel’s dialectics in a literary way. In his examination of the epistemological theory of the Phenomenology , for instance, Kenneth R. Westphal offers “a literary model” of Hegel’s dialectics based on the story of Sophocles’ play Antigone (Westphal 2003: 14, 16). Ermanno Bencivenga offers an interpretation that combines a narrative approach with a concept of necessity. For him, the necessity of Hegel’s dialectical logic can be captured by the notion of telling a good story—where “good” implies that the story is both creative and correct at the same time (Bencivenga 2000: 43–65).
Debate over whether Hegel’s dialectical logic is logical may also be fueled in part by discomfort with his particular brand of logic. Unlike today’s symbolic logics, Hegel’s logic is not only syntactic, but also semantic (cf. Berto 2007; Maybee 2009: xx–xxv; Margolis 2010: 193–94). Hegel’s interest in semantics appears, for instance, in the very first stages of his logic, where the difference between Being and Nothing is “something merely meant ” (EL-GSH Remark to §87; cf. section 2 above). While some of the moves from stage to stage are driven by syntactic necessity, other moves are driven by the meanings of the concepts in play. Indeed, Hegel rejected what he regarded as the overly formalistic logics that dominated the field during his day (EL Remark to §162; SL-M 43–44; SL-dG 24). A logic that deals only with the forms of logical arguments and not the meanings of the concepts used in those argument forms will do no better in terms of preserving truth than the old joke about computer programs suggests: garbage in, garbage out. In those logics, if we (using today’s versions of formal, symbolic logic) plug in something for the P or Q (in the proposition “if P then Q ” or “ P → Q ”, for instance) or for the “ F ”, “ G ”, or “ x ” (in the proposition “if F is x , then G is x ” or “ F x → G x ”, for instance) that means something true, then the syntax of formal logics will preserve that truth. But if we plug in something for those terms that is untrue or meaningless (garbage in), then the syntax of formal logic will lead to an untrue or meaningless conclusion (garbage out). Today’s versions of prepositional logic also assume that we know what the meaning of “is” is. Against these sorts of logics, Hegel wanted to develop a logic that not only preserved truth, but also determined how to construct truthful claims in the first place. A logic that defines concepts (semantics) as well as their relationships with one another (syntax) will show, Hegel thought, how concepts can be combined into meaningful forms. Because interpreters are familiar with modern logics focused on syntax, however, they may regard Hegel’s syntactic and semantic logic as not really logical (cf. Maybee 2009: xvii–xxv).
In Hegel’s other works, the moves from stage to stage are often driven, not only by syntax and semantics—that is, by logic (given his account of logic)—but also by considerations that grow out of the relevant subject matter. In the Phenomenology , for instance, the moves are driven by syntax, semantics, and by phenomenological factors. Sometimes a move from one stage to the next is driven by a syntactic need—the need to stop an endless, back-and-forth process, for instance, or to take a new path after all the current options have been exhausted (cf. section 5 ). Sometimes, a move is driven by the meaning of a concept, such as the concept of a “This” or “Thing”. And sometimes a move is driven by a phenomenological need or necessity—by requirements of consciousness , or by the fact that the Phenomenology is about a consciousness that claims to be aware of (or to know) something. The logic of the Phenomenology is thus a phenomeno -logic, or a logic driven by logic—syntax and semantics—and by phenomenological considerations. Still, interpreters such as Quentin Lauer have suggested that, for Hegel,
phenomeno-logy is a logic of appearing, a logic of implication, like any other logic, even though not of the formal entailment with which logicians and mathematicians are familiar. (Lauer 1976: 3)
Lauer warns us against dismissing the idea that there is any implication or necessity in Hegel’s method at all (Lauer 1976: 3). (Other scholars who also believe there is a logical necessity to the dialectics of the Phenomenology include Hyppolite 1974: 78–9 and H.S. Harris 1997: xii.)
We should also be careful not to exaggerate the “necessity” of formal, symbolic logics. Even in these logics, there can often be more than one path from some premises to the same conclusion, logical operators can be dealt with in different orders, and different sets of operations can be used to reach the same conclusions. There is therefore often no strict, necessary “entailment” from one step to the next, even though the conclusion might be entailed by the whole series of steps, taken together. As in today’s logics, then, whether Hegel’s dialectics counts as logical depends on the degree to which he shows that we are forced—necessarily—from earlier stages or series of stages to later stages (see also section 5 ).
Although Hegel’s dialectics is driven by syntax, semantics and considerations specific to the different subject matters ( section 4 above), several important syntactic patterns appear repeatedly throughout his works. In many places, the dialectical process is driven by a syntactic necessity that is really a kind of exhaustion: when the current strategy has been exhausted, the process is forced, necessarily, to employ a new strategy. As we saw ( section 2 ), once the strategy of treating Being and Nothing as separate concepts is exhausted, the dialectical process must, necessarily, adopt a different strategy, namely, one that takes the two concepts together. The concept of Becoming captures the first way in which Being and Nothing are taken together. In the stages of Quantum through Number, the concepts of One and Many take turns defining the whole quantity as well as the quantitative bits inside that make it up: first, the One is the whole, while the Many are the bits; then the whole and the bits are all Ones; then the Many is the whole, while the bits are each a One; and finally the whole and the bits are all a Many. We can picture the development like this (cf. Maybee 2009, xviii–xix):
Since One and Many have been exhausted, the next stage, Ratio, must, necessarily, employ a different strategy to grasp the elements in play. Just as Being-for-itself is a concept of universality for Quality and captures the character of a set of something-others in its content (see section 1 ), so Ratio (the whole rectangle with rounded corners) is a concept of universality for Quantity and captures the character of a set of quantities in its content (EL §105–6; cf. Maybee 2009, xviii–xix, 95–7). In another version of syntactic necessity driven by exhaustion, the dialectical development will take account of every aspect or layer, so to speak, of a concept or form—as we saw in the stages of Purpose outlined above, for instance ( section 2 ). Once all the aspects or layers of a concept or form have been taken account of and so exhausted, the dialectical development must also, necessarily, employ a different strategy in the next stage to grasp the elements in play.
In a second, common syntactic pattern, the dialectical development leads to an endless, back-and-forth process—a “bad” (EL-BD §94) or “spurious” (EL-GSH §94) infinity—between two concepts or forms. Hegel’s dialectics cannot rest with spurious infinities. So long as the dialectical process is passing endlessly back and forth between two elements, it is never finished, and the concept or form in play cannot be determined. Spurious infinities must therefore be resolved or stopped, and they are always resolved by a higher-level, more universal concept. In some cases, a new, higher-level concept is introduced that stops the spurious infinity by grasping the whole, back-and-forth process. Being-for-itself (cf. section 1 ), for instance, is introduced as a new, more universal concept that embraces—and hence stops—the whole, back-and-forth process between “something-others”. However, if the back-and-forth process takes place between a concept and its own content—in which case the concept already embraces the content—then that embracing concept is redefined in a new way that grasps the whole, back-and-forth process. The new definition raises the embracing concept to a higher level of universality—as a totality (an “all”) or as a complete and completed concept. Examples from logic include the redefinition of Appearance as the whole World of Appearance (EL §132; cf. SL-M 505–7, SL-dG 443–4), the move in which the endless, back-and-forth process of Real Possibility redefines the Condition as a totality (EL §147; cf. SL-M 547, SL-dG 483), and the move in which a back-and-forth process created by finite Cognition and finite Willing redefines the Subjective Idea as Absolute Idea (EL §§234–5; cf. SL-M 822–3, SL-dG 733–4).
Some of the most famous terms in Hegel’s works—“in itself [ an sich ]”, “for itself [ für sich ]” and “in and for itself [ an und für sich ]”—capture other, common, syntactic patterns. A concept or form is “in itself” when it has a determination that it gets by being defined against its “other” (cf. Being-in-itself, EL §91). A concept or form is “for itself” when it is defined only in relation to its own content, so that, while it is technically defined in relation to an “other”, the “other” is not really an “other” for it. As a result, it is really defined only in relation to itself. Unlike an “in itself” concept or form, then, a “for itself” concept or form seems to have its definition on its own, or does not need a genuine “other” to be defined (like other concepts or forms, however, “for itself” concepts or forms turn out to be dialectical too, and hence push on to new concepts or forms). In the logic, Being-for-itself (cf. section 1 ), which is defined by embracing the “something others” in its content, is the first, “for itself” concept or form.
A concept or form is “in and for itself” when it is doubly “for itself”, or “for itself” not only in terms of content —insofar as it embraces its content—but also in terms of form or presentation, insofar as it also has the activity of presenting its content. It is “for itself” (embraces its content) for itself (through its own activity), or not only embraces its content (the “for itself” of content) but also presents its content through its own activity (the “for itself” of form). The second “for itself” of form provides the concept with a logical activity (i.e., presenting its content) and hence a definition that goes beyond—and so is separate from—the definition that its content has. Since it has a definition of its own that is separate from the definition of its content, it comes to be defined—in the “in itself” sense— against its content, which has become its “other”. Because this “other” is still its own content, however, the concept or form is both “in itself” but also still “for itself” at the same time, or is “in and for itself” (EL §§148–9; cf. Maybee 2009: 244–6). The “in and for itself” relationship is the hallmark of a genuine Concept (EL §160), and captures the idea that a genuine concept is defined not only from the bottom up by its content, but also from the top down through its own activity of presenting its content. The genuine concept of animal, for instance, is not only defined by embracing its content (namely, all animals) from the bottom up, but also has a definition of its own, separate from that content, that leads it to determine (and so present), from the top down, what counts as an animal.
Other technical, syntactic terms include aufheben (“to sublate”), which we already saw ( section 1 ), and “abstract”. To say that a concept or form is “abstract” is to say that it is only a partial definition. Hegel describes the moment of understanding, for instance, as abstract (EL §§79, 80) because it is a one-sided or restricted definition or determination ( section 1 ). Conversely, a concept or form is “concrete” in the most basic sense when it has a content or definition that it gets from being built out of other concepts or forms. As we saw ( section 2 ), Hegel regarded Becoming as the first concrete concept in the logic.
Although Hegel’s writing and his use of technical terms can make his philosophy notoriously difficult, his work can also be very rewarding. In spite of—or perhaps because of—the difficulty, there are a surprising number of fresh ideas in his work that have not yet been fully explored in philosophy.
- [EL], The Encyclopedia Logic [Enzyklopädie der philosophischen Wissenschaften I] . Because the translations of EL listed below use the same section numbers as well as sub-paragraphs (“Remarks”) and sub-sub-paragraphs (“Additions”), citations simply to “EL” refer to either translation. If the phrasing in English is unique to a specific translation, the translators’ initials are added.
- [EL-BD], Encyclopedia of the Philosophical Sciences in Basic Outline Part I: Science of Logic [Enzyklopädie der philosophischen Wissenschaften I] , translated by Klaus Brinkmann and Daniel O. Dahlstrom, Cambridge: Cambridge University Press, 2010.
- [EL-GSH], The Encyclopedia Logic: Part 1 of the Encyclopaedia of Philosophical Sciences [Enzyklopädie der philosophischen Wissenschaften I] , translated by T.F. Geraets, W.A. Suchting, and H.S. Harris, Indianapolis: Hackett, 1991.
- [LHP], Lectures on the History of Philosophy [Geschichte der Philosophie] , in three volumes, translated by E.S. Haldane and Frances H. Simson, New Jersey: Humanities Press, 1974.
- [PhG], Phenomenology of Spirit [Phänomenologie des Geistes] . Because the translations of PhG listed below use the same section numbers, citations simply to “PhG” refer to either translation. If the phrasing in English is unique to a specific translation, the translator’s initial is added.
- [PhG-M], Hegel’s Phenomenology of Spirit [Phänomenologie des Geistes] , translated by A.V. Miller, Oxford: Oxford University Press, 1977.
- [PhG-P], Georg Wilhelm Friedrich Hegel: The Phenomenology of Spirit [Phänomenologie des Geistes] , translated and edited by Terry Pinkard, Cambridge: Cambridge University Press, 2018.
- [PR], Elements of the Philosophy of Right [Philosophie des Rechts] , edited by Allen W. Wood and translated by H.B. Nisbet, Cambridge: Cambridge University Press, 1991.
- [SL-dG], Georg Wilhelm Friedrich Hegel: The Science of Logic [Wissenschaft der Logik] , translated by George di Giovanni, New York: Cambridge University Press, 2010.
- [SL-M], Hegel’s Science of Logic [Wissenschaft der Logik] , translated by A.V. Miller, Oxford: Oxford University Press, 1977.
- Aristotle, 1954, The Complete Works of Aristotle: The Revised Oxford Translation (in two volumes), edited by Jonathan Barnes. Princeton: Princeton University Press. (Citations to Aristotle’s text use the Bekker numbers, which appear in the margins of many translations of Aristotle’s works.)
- Fichte, J.G., 1982 [1794/95], The Science of Knowledge , translated by Peter Heath and John Lachs, Cambridge: Cambridge University Press. (Citations to Fichte’s work include references to the volume and page number in the German edition of Fichte’s collected works edited by I.H Fichte, which are used in the margins of many translations of Fichte’s works.)
- Kant, Immanuel, 1999 [1781], Critique of Pure Reason , translated and edited by Paul Guyer and Allen Wood. Cambridge: Cambridge University Press. (Citations to Kant’s text use the “Ak.” numbers, which appear in the margins of many translations of Kant’s works.)
- Plato, 1961, The Collected Dialogues of Plato: Including the Letters , edited by Edith Hamilton and Huntington Cairns. Princeton: Princeton University Press. (Citations to Plato’s text use the Stephanus numbers, which appear in the margins of many translations of Plato’s works.)
- Bencivenga, Ermanno, 2000, Hegel’s Dialectical Logic , New York: Oxford University Press.
- Berto, Francesco, 2007, “Hegel’s Dialectics as a Semantic Theory: An Analytic Reading”, European Journal of Philosophy , 15(1): 19–39.
- Burbidge, John, 1981, On Hegel’s Logic: Fragments of a Commentary , Atlantic Highlands, NJ: Humanities Press.
- Düsing, Klaus, 2010, “Ontology and Dialectic in Hegel’s Thought”, translated by Andrés Colapinto, in The Dimensions of Hegel’s Dialectic , Nectarios G. Limmnatis (ed.), London: Continuum, pp. 97–122.
- Findlay, J.N., 1962, Hegel: A Re-Examination , New York: Collier Books.
- –––, 1966, Review of Hegel: Reinterpretation, Texts, and Commentary , by Walter Kaufmann. The Philosophical Quarterly , 16(65): 366–68.
- Forster, Michael, 1993, “Hegel’s Dialectical Method”, in The Cambridge Companion to Hegel , Frederick C. Beiser (ed.), Cambridge: Cambridge University Press, pp. 130–170.
- Fritzman, J.M., 2014, Hegel , Cambridge: Polity Press.
- Harris, Errol E., 1983, An Interpretation of the Logic of Hegel , Lanham, MD: University Press of America.
- Harris, H.S. (Henry Silton), 1997, Hegel’s Ladder (in two volumes: vol. I, The Pilgrimage of Reason , and vol. II, The Odyssey of Spirit ), Indianapolis, IN: Hackett).
- Hyppolite, Jean, 1974, Genesis and Structure of Hegel’s “Phenomenology of Spirit ”, Evanston, IL: Northwestern University Press.
- Inoue, Kazumi, 2014, “Dialectical Contradictions and Classical Formal Logic”, International Studies in the Philosophy of Science , 28(2), 113–132.
- Jaśkowski, Stanislaw, 1999 [1969], “A Propositional Calculus for Inconsistent Deductive Systems”, translated by Olgierd Wojtasiewicz and A. Pietruszczak, Logic and Logical Philosophy (7)7: 35–56. (This article is a republication, with some changes, of a 1969 translation by Wojtasiewicz entitled “Propositional Calculus for Contradictory Deductive Systems (Communicated at the Meeting of March 19, 1948)”, published in Studia Logica , 24, 143–160.)
- Kaufmann, Walter Arnold, 1965, Hegel: Reinterpretation, Texts, and Commentary , Garden City, NY: Doubleday and Company Inc.
- –––, 1966, A Reinterpretation , Garden City, NY: Anchor Books. (This is a republication of the first part of Hegel: Reinterpretation, Texts, and Commentary .)
- –––, 1976 [1972], “The Hegel Myth and its Method”, in Hegel: A Collection of Critical Essays , Alasdair MacIntyre (ed.), Notre Dame, IN: University of Notre Dame Press: 21–60. (This is a republication of the 1972 Anchor Books/Doubleday edition.)
- Kosok, Michael, 1972, “The Formalization of Hegel’s Dialectical Logic: Its Formal Structure, Logical Interpretation and Intuitive Foundation”, in Hegel: A Collection of Critical Essays , Alisdair MacIntyre (ed.), Notre Dame, IN: University of Notre Dame Press: 237–87.
- Lauer, Quentin, 1976, A Reading of Hegel’s “Phenomenology of Spirit” , New York: Fordham University Press.
- Margolis, Joseph, 2010, “The Greening of Hegel’s Dialectical Logic”, in The Dimensions of Hegel’s Dialectic , Nectarios G. Limmnatis (ed.), London: Continuum, pp. 193–215.
- Maybee, Julie E., 2009, Picturing Hegel: An Illustrated Guide to Hegel’s “Encyclopaedia Logic” , Lanham, MD: Lexington Books.
- McTaggart, John McTaggart Ellis, 1964 [1910], A Commentary of Hegel’s Logic , New York: Russell and Russell Inc. (This edition is a reissue of McTaggart’s book, which was first published in 1910.)
- Mueller, Gustav, 1958, “The Hegel Legend of ‘Synthesis-Antithesis-Thesis’”, Journal of the History of Ideas , 19(3): 411–14.
- Mure, G.R.G., 1950, A Study of Hegel’s Logic , Oxford: Oxford University Press.
- Pinkard, Terry, 1988, Hegel’s Dialectic: The Explanation of a Possibility , Philadelphia: Temple University Press.
- Priest, Graham, 1985, “Inconsistencies in Motion”, American Philosophical Quarterly , 22(4): 339–346.
- –––, 1989, “Dialectic and Dialetheic”, Science and Society , 53(4): 388–415.
- –––, 1997 [2006], In Contradiction: A Study of the Transconsistent , expanded edition, Oxford: Oxford University Press; first edition, Martinus Nijhoff, 1997.
- Popper, Karl R., 1940, “What is Dialectic?”, Mind , 49(196): 403–426. (This article was reprinted, with some changes, in two different editions of Conjectures and Refutations: The Growth of Scientific Knowledge , listed below.)
- –––, 1962, Conjectures and Refutations: The Growth of Scientific Knowledge , New York: Basic Books.
- –––, 2002, Conjectures and Refutations: The Growth of Scientific Knowledge , second edition, London: Routledge Classics.
- –––, 2013, The Open Society and its Enemies , Princeton: Princeton University Press. (This is a one-volume republication of the original, two-volume edition first published by Princeton University Press in 1945.)
- Rosen, Michael, 1982, Hegel’s Dialectic and its Criticism , Cambridge: Cambridge University Press.
- Rosen, Stanley, 2014, The Idea of Hegel’s “Science of Logic” , Chicago: University of Chicago Press.
- Singer, Peter, 1983, Hegel , Oxford: Oxford University Press.
- Solomon, Robert C., 1983, In the Spirit of Hegel: A Study of G.W.F. Hegel’s “Phenomenology of Spirit” , New York: Oxford University Press.
- Stace, W.T., 1955 [1924], The Philosophy of Hegel: A Systematic Exposition , New York: Dover Publications. (This edition is a reprint of the first edition, published in 1924.)
- Stewart, Jon, 1996, “Hegel’s Doctrine of Determinate Negation: An Example from ‘Sense-certainty’ and ‘Perception’”, Idealistic Studies , 26(1): 57–78.
- –––, 2000, The Unity of Hegel’s “Phenomenology of Spirit”: A Systematic Interpretation , Evanston, IL: Northwestern University Press.
- Taylor, Charles, 1975, Hegel , Cambridge: Cambridge University Press.
- Wandschneider, Dieter, 2010, “Dialectic as the ‘Self-Fulfillment’ of Logic”, translated by Anthony Jensen, in The Dimensions of Hegel’s Dialectic , Nectarios G. Limmnatis (ed.), London: Continuum, pp. 31–54.
- Westphal, Kenneth R., 2003, Hegel’s Epistemology: A Philosophical Introduction to the “Phenomenology of Spirit” , Indianapolis, IN: Hackett Publishing Company.
- Winfield, Richard Dien, 1990, “The Method of Hegel’s Science of Logic ”, in Essays on Hegel’s Logic , George di Giovanni (ed.), Albany, NY: State University of New York, pp. 45–57.
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
- Hegel on Dialectic , Philosophy Bites podcast interview with Robert Stern
- Hegel , Philosophy Talks preview video, interview notes and recorded radio interview with Allen Wood, which includes a discussion of Hegel’s dialectics
Aristotle | Aristotle, General Topics: logic | Fichte, Johann Gottlieb | Hegel, Georg Wilhelm Friedrich | Hume, David | Kant, Immanuel | Plato
Copyright © 2020 by Julie E. Maybee < julie . maybee @ lehman . cuny . edu >
- Accessibility
Support SEP
Mirror sites.
View this site from another server:
- Info about mirror sites
The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054
Marx, Marxism, and Communism
Overview: marx & engels , graphs and outlines, karl marx, friedrich engels, and the communist manifesto.
, but they are based on Marx's ideas. | |
written (at the instructions of the International Communist League) in 1848, rpt. German 1872, Russian trans. in 1882, Enl. trans. in 1888 | |
across Europe and the extremely horrible working conditions of the 19th century form the historical context for the , which was written just as socialist thought and workers' unions were uniting in protest. The rebellions in 1848 were failed revolutions in the sense that they did not elicit lasting political change. As with most historical movements, the effects of this publication were long ranging. has had a profound . |
· patrician/plebeian · lord/serf · oppressor/oppressed · bourgeoisie/proletariat |
"Evolutionary Progress" According to Marx:
Aristocracy / feudalism bourgeoisie / capitalism citizens / socialism proletariat / communism, ch. i: bourgeois and proletarians, dialectical materialism, commodification, commodification refers to the process of turning the means of production (labor and laborers) into commodities., other key terms:, "world market" and the means of production :.
"Meantime, the markets kept ever growing, the demand ever rising. Even manufacturers no longer sufficed. Thereupon, steam and machinery revolutionized industrial production. The place of manufacture was taken by the giant, Modern Industry; the place of the industrial middle class by industrial millionaires, the leaders of the whole industrial armies, the modern bourgeois.
Modern industry has established the world market, for which the discovery of America paved the way. This market has given an immense development to commerce, to navigation, to communication by land. This development has, in turn, reacted on the extension of industry; and in proportion as industry, commerce, navigation, railways extended, in the same proportion the bourgeoisie developed, increased its capital, and pushed into the background every class handed down from the Middle Ages."
Modern terms and Marxist roots
"The bourgeoisie, wherever it has got the upper hand, has put an end to all feudal, patriarchal , idyllic relations. It has pitilessly torn asunder the motley feudal ties that bound man to his 'natural superiors,' and has left no other nexus between man and man than naked self-interest, than callous 'cash payment.' It has drowned out the most heavenly ecstasies of religious fervour, of chivalrous enthusiasm, of philistine sentimentalism, in the icy water of egotistical calculation. It has resolved personal worth into exchange value, and in place of the numberless indefeasible chartered freedoms, has set up that single, unconscionable freedom — Free Trade. In one word, for exploitation, veiled by religious and political illusions, it has substituted naked, shameless, direct, brutal exploitation " (20, my emphasis).
"The bourgeoisie, by the rapid improvement of all instruments of production , by the immensely facilitated means of communication, draws all, even the most barbarian, nations into civilization. The cheap prices of commodities are the heavy artillery with which it forces the barbarians' intensely obstinate hatred of foreigners to capitulate. It compels all nations, on pain of extinction, to adopt the bourgeois mode of production ; it compels them to introduce what it calls civilization into their midst, i.e., to become bourgeois themselves. In one word, it creates a world after its own image" (22, my emphasis).
The Bourgeois Empire
Technology and surplus value.
"Owing to the extensive use of machinery, and to the division of labor, the work of the proletarians has lost all individual character, and, consequently, all charm for the workman. He becomes an appendage of the machine , and it is only the most simple, most monotonous, and most easily acquired knack, that is required of him. Hence, the cost of production of a workman is restricted, almost entirely, to the means of subsistence that he requires for maintenance, and for the propagation of his race. But the price of a commodity, and therefore also of labor, is equal to its cost of production. In proportion, therefore, as the repulsiveness of the work increases, the wage decreases. What is more, in proportion as the use of machinery and division of labor increases, in the same proportion the burden of toil also increases, whether by prolongation of the working hours, by the increase of the work exacted in a given time, or by increased speed of machinery, etc." (26).
Ch. II-IV :
Social democracy.
"1. Abolition of property in land and application of all rents of land to public purposes. 2. A heavy progressive or graduated income tax. 3. Abolition of all rights of inheritance. 4. Confiscation of the property of all emigrants and rebels. 5. Centralization of credit in the banks of the state, by means of a national bank with state capital and an exclusive monopoly. 6. Centralization of the means of communication and transport in the hands of the state. 7. Extension of factories and instruments of production owned by the state; the bringing into cultivation of waste lands, and the improvement of the soil generally in accordance with a common plan. 8. Equal obligation of all to work. Establishment of industrial armies, especially for agriculture. 9. Combination of agriculture with manufacturing industries; gradual abolition of all the distinction between town and country by a more equable distribution of the populace over the country. 10. Free education for all children in public schools. Abolition of children's factory labor in its present form. Combination of education with industrial production, etc." (42-3).
Women's Liberation Movement
"Bourgeois marriage is, in reality, a system of wives in common and thus, at the most, what the Communists might possibly be reproached with is that they desire to introduce, in substitution for a hypocritically concealed, an openly legalized system of free love. For the rest, it is self-evident that the abolition of the present system of production must bring with it the abolition of free love springing from that system, i.e., of prostitution both public and private" (39).
- The Big Think Interview
- Your Brain on Money
- Explore the Library
- The Universe. A History.
- The Progress Issue
- A Brief History Of Quantum Mechanics
- 6 Flaws In Our Understanding Of The Universe
- Michio Kaku
- Neil deGrasse Tyson
- Michelle Thaller
- Steven Pinker
- Ray Kurzweil
- Cornel West
- Helen Fisher
- Smart Skills
- High Culture
- The Present
- Hard Science
- Special Issues
- Starts With A Bang
- Everyday Philosophy
- The Learning Curve
- The Long Game
- Perception Box
- Strange Maps
- Free Newsletters
- Memberships
Thesis/Antithesis: Synthesis?
Cite this chapter.
- Thomas A. Kerns 2
51 Accesses
To summarize: the Thesis position, then, can be characterized as follows:
Many of these problems will be dealt with and solved by beginning the process of designing the trials. We must initiate phase III HIV vaccine efficacy trials as soon as possible, and follow the advice of the great experimental surgeon, Dr John Hunter (professor and friend of Edward Jenner): “Don’t just speculate; try the experiment.”
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save.
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
- Available as PDF
- Read on any device
- Instant download
- Own it forever
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Author information
Authors and affiliations.
North Seattle Community College, Seattle, Washington, USA
Thomas A. Kerns
You can also search for this author in PubMed Google Scholar
Copyright information
© 1997 Thomas A. Kerns
About this chapter
Kerns, T.A. (1997). Thesis/Antithesis: Synthesis?. In: Ethical Issues in HIV Vaccine Trials. Palgrave Macmillan, London. https://doi.org/10.1057/9780230380011_26
Download citation
DOI : https://doi.org/10.1057/9780230380011_26
Publisher Name : Palgrave Macmillan, London
Print ISBN : 978-0-333-67492-5
Online ISBN : 978-0-230-38001-1
eBook Packages : Palgrave Social & Cultural Studies Collection Social Sciences (R0)
Share this chapter
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Publish with us
Policies and ethics
- Find a journal
- Track your research
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Technical Report
- Open access
- Published: 06 September 2024
Dissociative and prioritized modeling of behaviorally relevant neural dynamics using recurrent neural networks
- Omid G. Sani ORCID: orcid.org/0000-0003-3032-5669 1 ,
- Bijan Pesaran ORCID: orcid.org/0000-0003-4116-0038 2 &
- Maryam M. Shanechi ORCID: orcid.org/0000-0002-0544-7720 1 , 3 , 4 , 5
Nature Neuroscience ( 2024 ) Cite this article
Metrics details
- Brain–machine interface
- Dynamical systems
- Machine learning
- Neural decoding
- Neural encoding
Understanding the dynamical transformation of neural activity to behavior requires new capabilities to nonlinearly model, dissociate and prioritize behaviorally relevant neural dynamics and test hypotheses about the origin of nonlinearity. We present dissociative prioritized analysis of dynamics (DPAD), a nonlinear dynamical modeling approach that enables these capabilities with a multisection neural network architecture and training approach. Analyzing cortical spiking and local field potential activity across four movement tasks, we demonstrate five use-cases. DPAD enabled more accurate neural–behavioral prediction. It identified nonlinear dynamical transformations of local field potentials that were more behavior predictive than traditional power features. Further, DPAD achieved behavior-predictive nonlinear neural dimensionality reduction. It enabled hypothesis testing regarding nonlinearities in neural–behavioral transformation, revealing that, in our datasets, nonlinearities could largely be isolated to the mapping from latent cortical dynamics to behavior. Finally, DPAD extended across continuous, intermittently sampled and categorical behaviors. DPAD provides a powerful tool for nonlinear dynamical modeling and investigation of neural–behavioral data.
Similar content being viewed by others
Neuronal travelling waves explain rotational dynamics in experimental datasets and modelling
Preparatory activity and the expansive null-space
High resolution behavioral and neural activity representation using a geometrical approach
Understanding how neural population dynamics give rise to behavior is a major goal in neuroscience. Many methods that relate neural activity to behavior use static mappings or embeddings, which do not describe the temporal structure in how neural population activity evolves over time 1 . In comparison, dynamical models can describe these temporal structures in terms of low-dimensional latent states embedded in the high-dimensional space of neural recordings. Prior dynamical models have often been linear or generalized linear 1 , 2 , 3 , 4 , 5 , 6 , 7 , thus motivating recent work to develop support for piece-wise linear 8 , locally linear 9 , switching linear 10 , 11 , 12 , 13 or nonlinear 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 models of neural dynamics, especially in applications such as single-trial smoothing of neural population activity 9 , 14 , 15 , 16 , 17 , 18 , 19 and decoding behavior 20 , 21 , 22 , 23 , 24 , 26 . Once trained, the latent states of these models can subsequently be mapped to behavior 1 , 25 to learn an overall dynamical transformation from neural activity to behavior. However, multiple challenges hinder the dynamical modeling and interpretation of neural–behavioral transformations.
First, the neural–behavioral transformation can exhibit nonlinearities, which the dynamical model should capture. Moreover, these nonlinearities can be in one or more different elements within the dynamical model, for example, in the dynamics of the latent state or in its embedding. Enabling hypothesis testing regarding the origin of nonlinearity (that is, where the nonlinearity can be isolated to within the model) is important for interpreting neural computations and developing neurotechnology but remains largely unaddressed in current nonlinear models. Second, neural dynamics related to a given behavior often constitute a minority of the total neural variance 28 , 29 , 30 , 31 , 32 , 33 . To avoid missing or confounding these dynamics, nonlinear dynamical models need to dissociate behaviorally relevant neural dynamics from other neural dynamics and prioritize the learning of the former, which is currently not possible. Indeed, existing nonlinear methods for modeling neural activity either do not explicitly model temporal dynamics 34 , 35 , 36 or do not prioritize behaviorally relevant dynamics 16 , 37 , 38 , or have a mixed objective 18 that may mix behaviorally relevant and other neural dynamics in the same latent states ( Discussion and Extended Data Table 1 ). Our prior method, termed PSID 6 , has enabled prioritized dissociation of behaviorally relevant neural dynamics but for linear dynamical models. Third, for broad applicability, in addition to continuous behaviors, dynamical models should admit categorical (for example, choices) or intermittently sampled behaviors (for example, mood reports), which are not supported by existing dynamical methods with a mixed objective 18 or by PSID. To date, learning nonlinear dynamical models of neural population activity that can address the above challenges has not been achieved.
Here, we develop dissociative prioritized analysis of dynamics (DPAD), a nonlinear dynamical modeling framework using recurrent neural networks (RNNs) that addresses all the above challenges. DPAD models both behaviorally relevant and other neural dynamics but dissociates them into separate latent states and prioritizes the learning of the former. To do so, we formulate a two-section RNN as the DPAD nonlinear dynamical model and develop a four-step optimization algorithm to train it. The first RNN section learns the behaviorally relevant latent states with priority, and the second section learns any remaining neural dynamics (Fig. 1a and Supplementary Fig. 1 ). Moreover, DPAD adjusts these optimization steps as needed to admit continuous-valued, categorical or intermittently sampled data ( Methods ). Furthermore, to capture nonlinearity in the neural–behavioral transformation and enable hypothesis testing regarding its origins, DPAD decomposes this transformation into the following four interpretable elements and allows each element to become linear or nonlinear (Fig. 1a,b ): the mapping from neural activity to the latent space (neural input), the latent state dynamics within this space (recursion) and the mappings of the state to neural activity and behavior (neural and behavior readouts). Finally, we formulate the DPAD model in predictor form such that the learned model can be directly used for inference, enabling causal and computationally efficient decoding for data, whether with or without a fixed-length trial structure ( Methods ).
a , DPAD decomposes the neural–behavioral transformation into four interpretable mapping elements. It learns the mapping of neural activity ( y k ) to latent states ( x k ), termed neural input in the model; learns the dynamics or temporal structure of the latent states, termed recursion in the model; dissociates the behaviorally relevant latent states ( \({x}_{k}^{\left(1\right)}\) ) that are relevant to a measured behavior ( z k ) from other states ( \({x}_{k}^{\left(2\right)}\) ); learns the mapping of the latent states to behavior and to neural activity, termed behavior and neural readouts in the model; and allows flexible linear or nonlinear mappings in any of its elements. DPAD additionally prioritizes the learning of behaviorally relevant neural dynamics to learn them accurately. b , Computation graph of the DPAD model consists of a two-section RNN whose input is neural activity at the current time step and whose outputs are the predicted behavior and neural activity in the next time step ( Methods ). This graph assumes that computations are Markovian, that is, with a high enough dimension, latent states can summarize the information from past neural data that is useful for predicting future neural–behavioral data. Each of the four mapping elements from a has a corresponding parameter in each section of the RNN model, indicated by the same colors and termed as introduced in a . c , We developed a four-step optimization method to learn all the model parameters from training neural–behavioral data (Supplementary Fig. 1a ). Further, each model parameter can be specified via the ‘nonlinearity setting’ to be linear or nonlinear with various options to implement the nonlinearity (Supplementary Fig. 1b,c ). After a model is learned, only past neural activity is used to decode behavior and predict neural activity using the computation graph in b . d , DPAD also has the option of automatically selecting the ‘nonlinearity setting’ for the data by fitting candidate models and comparing them in terms of both behavior decoding and neural self-prediction accuracy ( Methods ). In this work, we chose among 90 candidate models with various nonlinearity settings ( Methods ). We refer to this automatic selection of nonlinearity as ‘DPAD with flexible nonlinearity’.
To show its broad utility, we demonstrate five distinct use-cases for DPAD across four diverse nonhuman primate (NHP) datasets consisting of both population spiking activity and local field potentials (LFPs). First, DPAD more accurately models the overall neural–behavioral data than alternative nonlinear and linear methods. This is due both to DPAD’s prioritized and dynamical modeling of behaviorally relevant neural dynamics and to its nonlinearity. Second, DPAD can automatically uncover nonlinear dynamical transformations of raw LFP that are more predictive of behavior than traditional LFP power band features and in some datasets can even outperform population spiking activity in terms of behavior prediction. Further, DPAD reveals that among the neural modalities, the degree of nonlinearity is greatest for the raw LFP. Third, DPAD enables nonlinear and dynamical neural dimensionality reduction while preserving behavior information, thus extracting lower-dimensional yet more behavior-predictive latent states from past neural activity. Fourth, DPAD enables hypothesis testing regarding the origin of nonlinearity in the neural–behavioral transformation. Consistently across our movement-related datasets, doing so revealed that summarizing the nonlinearities just in the behavior readout from the latent state is largely sufficient for predicting the neural–behavioral data (see Discussion ). Fifth, DPAD extends to categorical and intermittently observed behaviors, which is important for cognitive neuroscience 11 , 39 and neuropsychiatry 40 , 41 , 42 . Together, these results highlight DPAD’s broad utility as a dynamical modeling tool to investigate the nonlinear and dynamical transformation of neural activity to specific behaviors across various domains of neuroscience.
Overview of DPAD
Formulation.
We model neural activity and behavior jointly and nonlinearly ( Methods ) as
where k is the time index, \({y}_{k}\in {{\mathbb{R}}}^{{n}_{y}}\) and \({z}_{k}\in {{\mathbb{R}}}^{{n}_{z}}\) denote the neural activity and behavior time series, respectively, \({x}_{k}\in {{\mathbb{R}}}^{{n}_{x}}\) is the latent state, and e k and \({{\epsilon }}_{k}\) denote neural and behavior dynamics that are unpredictable from past neural activity. Multi-input–multi-output functions A ′ (recursion), K (neural input), C y (neural readout) and C z (behavior readout) are parameters that fully specify the model and have interpretable descriptions ( Methods , Supplementary Note 1 and Fig. 1a,b ). The adjusted formulation for intermittently sampled and noncontinuous-valued (for example, categorical) data is provided in Methods . DPAD supports both linear and nonlinear modeling, which will be termed linear DPAD and nonlinear DPAD (or just DPAD), respectively.
Dissociative and prioritized learning
We further expand the model in Eq. ( 1 ) in two sections, as depicted in Fig. 1b (Eq. ( 2 ) in Methods and Supplementary Note 2 ). The first and second sections describe the behaviorally relevant neural dynamics and the other neural dynamics with latent states \({x}_{k}^{(1)}\in {{\mathbb{R}}}^{{n}_{1}}\) and \({x}_{k}^{(2)}\in {{\mathbb{R}}}^{{n}_{x}-{n}_{1}}\) , respectively. We specify the parameters of the two RNN sections with superscripts (for example, K (1) and K (2) ) and learn them all sequentially via a four-step optimization ( Methods , Supplementary Fig. 1a and Fig. 1b ). The first two steps exclusively learn neural dynamics that are behaviorally relevant with the objective of behavior prediction, whereas the optional last two steps learn any remaining neural dynamics with the objective of residual neural prediction ( Methods and Supplementary Fig. 1 ). We implement DPAD in Tensorflow and use an ADAM 43 optimizer ( Methods ).
Comparison baselines
As a baseline, we compare DPAD with standard nonlinear RNNs fitted to maximize neural prediction, unsupervised with respect to behavior. We refer to this baseline as nonlinear neural dynamical modeling (NDM) 6 or as linear NDM if all RNN parameters are linear. NDM is nondissociative and nonprioritized, so comparisons with NDM show the benefit of DPAD’s prioritized dissociation of behaviorally relevant neural dynamics. We also compare DPAD with latent factor analysis via dynamical systems (LFADS) 16 and with two concurrently 44 developed methods with DPAD named targeted neural dynamical modeling (TNDM) 18 and consistent embeddings of high-dimensional recordings using auxiliary variables (CEBRA) 36 in terms of neural–behavioral prediction; however, as summarized in Extended Data Table 1 , these and other existing methods differ from DPAD in key goals and capabilities and do not enable some of DPAD’s use-cases (see Discussion ).
Decoding using past neural data
Given DPAD’s learned parameters, the latent states can be causally extracted from neural activity by iterating through the RNN in Eq. ( 1 ) ( Methods and Supplementary Note 1 ). Note that this decoding always only uses neural activity without seeing the behavior data.
Flexible control of nonlinearities
We allow each model parameter (for example, C z ) to be an arbitrary multilayer neural network (Supplementary Fig. 1c ), which can universally approximate any smooth nonlinear function or implement linear matrix multiplications ( Methods and Supplementary Fig. 1b ). Users can manually specify which parameters will be learned as nonlinear and with what architecture (Fig. 1c ; see application in use-case 4). Alternatively, DPAD can automatically determine the best nonlinearity setting for the data by conducting a search over nonlinearity options (Fig. 1d and Methods ), a process that we refer to as flexible nonlinearity. For a fair comparison, we also implement this flexible nonlinearity for NDM. To show the benefits of nonlinearity, we also compare with linear DPAD, where all parameters are set to be linear, in which case Eq. ( 1 ) formulates a standard linear state-space model in predictor form ( Methods ).
Evaluation metrics
We evaluate how well the models can use the past neural activity to predict the next sample of behavior (termed ‘decoding’) or the next sample of neural activity itself (termed ‘neural self-prediction’ or simply ‘self-prediction’). Thus, decoding and self-prediction assess the one-step-ahead prediction accuracies and reflect the learning of behaviorally relevant and overall neural dynamics, respectively. Both performance measures are always computed with cross-validation ( Methods ).
Our primary interest is to find models that simultaneously reach both accurate behavior decoding and accurate neural self-prediction. But in some applications, only one of these metrics may be of interest. Thus, we use the term ‘performance frontier’ to refer to the range of performances achievable by those models that compared to every other model are better in neural self-prediction and/or behavior decoding or are similar in terms of both metrics ( Methods ).
Diverse neural–behavioral datasets
We used DPAD to study the behaviorally relevant neural dynamics in four NHPs performing four different tasks (Fig. 2 and Methods ). In the first task, the animal made naturalistic three-dimensional (3D) reach, grasp and return movements to diverse locations while the joint angles in the arm, elbow, wrist and fingers were tracked as the behavior (Fig. 2a ) 6 , 45 . In the second task, the animal made saccadic eye movements to one of eight possible targets on a screen, with the two-dimensional (2D) eye position tracked as the behavior (Fig. 2d ) 6 , 46 . In the third task, the animal made sequential 2D reaches on a screen using a cursor controlled with a manipulandum while the 2D cursor position and velocity were tracked as the behavior (Fig. 2g ) 47 , 48 . In the fourth task, the animal made 2D reaches to random targets in a virtual-reality-presented grid via a cursor that mirrored the animal’s fingertip movements, for which the 2D position and velocity were tracked as the behavior (Fig. 2i ) 49 . In tasks 1 and 4, primary motor cortical activity was modeled. For tasks 2 and 3, prefrontal cortex and dorsal premotor cortical activities were modeled, respectively.
a , The 3D reach task, along with example true and decoded behavior dimensions, decoded from spiking activity using DPAD, with more example trajectories for all modalities shown in Supplementary Fig. 3 . b , Cross-validated decoding accuracy correlation coefficient (CC) achieved by linear and nonlinear DPAD. Results are shown for spiking activity, raw LFP activity and LFP band power activity ( Methods ). For nonlinear DPAD, the nonlinearities are selected automatically based on the training data to maximize behavior decoding accuracy (that is, flexible nonlinearity). The latent state dimension in each session and fold is chosen (among powers of 2 up to 128) as the smallest that reaches peak decoding in the training data among all state dimensions ( Methods ). Bars show the mean, whiskers show the s.e.m., and dots show all data points ( N = 35 session-folds). Asterisks (*) show significance level for a one-sided Wilcoxon signed-rank test (* P < 0.05, ** P < 0.005 and *** P < 0.0005); NS, not significant. c , The difference between the nonlinear and linear results from b shown with the same notations. d – f , Same as a – c for the second dataset with saccadic eye movements ( N = 35 session-folds). g , h , Same as a and b for the third dataset, which did not include LFP data, with sequential cursor reaches controlled via a 2D manipulandum ( N = 15 session-folds). Behavior consists of the 2D position and velocity of the cursor, denoted as ‘hand kinematics’ in the figure. i – k , Same as a – c for the fourth dataset, with random grid virtual reality cursor reaches controlled via fingertip movement ( N = 35 session-folds). For all DPAD variations, only the first two optimization steps were used in this figure (that is, n 1 = n x ) to only focus on learning behaviorally relevant neural dynamics.
Source data
In all datasets, we modeled the Gaussian smoothed spike counts as the main neural modality ( Methods ). In three datasets that had LFP, we also modeled the following two additional modalities: (1) raw LFP, downsampled to the sampling rate of behavior (that is, 50-ms time steps), which in the motor cortex is known as the local motor potential 50 , 51 , 52 and has been used to decode behavior 6 , 50 , 51 , 52 , 53 ; and (2) LFP power in standard frequency bands from delta (0.1–4 Hz) to high gamma (130–170 Hz (refs. 5 , 6 , 40 ); Methods ). Similar results held for all three modalities.
Numerical simulations validate DPAD
We first validate DPAD with linear simulations here (Extended Data Fig. 1 ) and then present nonlinear simulations under use-case 4 below (Extended Data Fig. 2 and Supplementary Fig. 2 ). We simulated general random linear models (not emulating any real data) in which only a subset of state dimensions contributed to generating behavior and thus were behaviorally relevant ( Methods ). We found that with a state dimension equal to that of the true model, DPAD achieved ideal cross-validated prediction (that is, similar to the true model) for both behavior and neural signals (Extended Data Fig. 1b,d ). Moreover, even given a minimal state dimension equal to the true behaviorally relevant state dimension, DPAD still achieved ideal prediction for behavior (Extended Data Fig. 1c ). Finally, across various regimens of training samples, linear DPAD performed similarly to the linear-algebraic-based PSID 6 from our prior work (Extended Data Fig. 1 ). Thus, hereafter, we use linear DPAD as our linear modeling benchmark.
Use-case 1: DPAD enables nonlinear neural–behavioral modeling across modalities
Dpad captures nonlinearity in behaviorally relevant dynamics.
We modeled each neural modality (spiking, raw LFP or LFP power) along with behavior using linear and nonlinear DPAD and compared their cross-validated behavior decoding (Fig. 2b,e,h,j and Supplementary Fig. 3 ). Across all neural modalities in all datasets, nonlinear DPAD achieved significantly higher decoding accuracy than linear DPAD. This result suggests that there is nonlinearity in the dynamical neural–behavioral transformation, which DPAD successfully captures (Fig. 2b,e,h,j ).
DPAD better predicts the neural–behavioral data
Across all datasets and modalities, compared to nonlinear NDM or linear DPAD, nonlinear DPAD reached higher behavior decoding accuracy while also being as accurate or better in terms of neural self-prediction (Fig. 3 , Extended Data Fig. 3 and Supplementary Fig. 4 ). Indeed, compared to these, DPAD was always on the best performance frontier for predicting the neural–behavioral data (Fig. 3 and Extended Data Fig. 3 ). Additionally, DPAD was always on the best performance frontier for predicting the neural–behavioral data compared to long short-term memory (LSTM) networks as well as a concurrently 44 developed method with DPAD termed CEBRA 36 on our four datasets (Fig. 4a–h ) in addition to a fifth movement dataset 54 analyzed in the CEBRA paper (Fig. 4i,j ). These results suggest that DPAD provides a more accurate description for neural–behavioral data.
a , The 3D reach task. b , Cross-validated neural self-prediction accuracy (CC) achieved by each method shown on the horizontal axis versus the corresponding behavior decoding accuracy on the vertical axis for modeling spiking activity. Latent state dimension for each method in each session, and fold is chosen (among powers of 2 up to 128) as the smallest that reaches peak neural self-prediction in training data or reaches peak decoding in training data, whichever is larger ( Methods ). The plus on the plot shows the mean self-prediction and decoding accuracy across sessions and folds ( N = 35 session-folds), and the horizontal and vertical whiskers show the s.e.m. for these two measures, respectively. Capital letter annotations denote the methods according to the legend to make the plots more accessible. Models whose self-prediction and decoding accuracy measures lead to values toward the top-rightmost corner of the plot lie on the best performance frontier (indicated by red arrows) as they have better performance in both measures and thus better explain the neural–behavioral data ( Methods ). c , d , Same as a and b for the second dataset with saccadic eye movements ( N = 35 session-folds). e , f , Same as a and b for the third dataset, with sequential cursor reaches controlled via a 2D manipulandum ( N = 15 session-folds). g , h , Same as a and b for the fourth dataset with random grid virtual reality cursor reaches controlled via fingertip position ( N = 35 session-folds). For all DPAD variations, the first 16 latent state dimensions are learned using the first two optimization steps, and the remaining dimensions are learned using the last two optimization steps (that is, n 1 = 16). For nonlinear DPAD/NDM, we fit models with different combinations of nonlinearities and then select a final model among these fitted models based on either decoding or self-prediction accuracy in the training data and report both sets of results (Supplementary Fig. 1 and Methods ). DPAD with nonlinearity selected based on neural self-prediction was better than all other methods overall ( b , d , f and h ).
a – h , Figure content is parallel to Fig. 3 (with pluses and whiskers defined in the same way) but instead of NDM shows CEBRA and LSTM networks as baselines ( Methods ). i , j , Here, we also add a fifth dataset 54 ( Methods ), where in each trial an NHP moves a cursor from a center point to one of eight peripheral targets ( i ). In this fifth dataset ( N = 5 folds), we use the exact CEBRA hyperparameters that were used for this dataset from the paper introducing CEBRA 36 . In the other four datasets ( N = 35 session-folds in b , d and h and N = 15 session-folds in f ), we also show CEBRA results for when hyperparameters are picked based on an extensive search ( Methods ). Two types of LSTM networks are shown, one fitted to decode behavior from neural activity and another fitted to predict the next time step of neural activity (self-prediction). We also show the results for DPAD when only using the first two optimization steps. Note that CEBRA-Behavior (denoted by D and F), LSTM for behavior decoding (denoted by H) and DPAD when only using the first two optimization steps (denoted by G) dedicate all their latent states to behavior-related objectives (for example, prediction or contrastive loss), whereas other methods dedicate some or all latent states to neural self-prediction. As in Fig. 3 , the final latent dimension for each method in each session and fold is chosen (among powers of 2 up to 128) as the smallest that reaches peak neural self-prediction in training data or reaches peak decoding in training data, whichever is larger ( Methods ). Across all datasets, DPAD outperforms baseline methods in terms of cross-validated neural–behavioral prediction and lies on the best performance frontier. For a summary of the fundamental differences in goals and capabilities of these methods, see Extended Data Table 1 .
Beyond one-step-ahead predictions, we next evaluated DPAD in terms of multistep-ahead prediction of neural–behavioral data, also known as forecasting. To do this, starting with one-step-ahead predictions (that is, m = 1), we pass m -step-ahead predictions of neural data using the learned models as the neural observation in the next time step to obtain ( m + 1)-step-ahead predictions ( Methods ). Nonlinear DPAD was consistently better than nonlinear NDM and linear dynamical systems (LDS) modeling in multistep-ahead forecasting of behavior (Extended Data Fig. 4 ). For neural self-prediction, we used a naive predictor as a conservative forecasting baseline, which reflects how easy it is to predict the future in a model-free way purely based on the smoothness of neural data. DPAD significantly outperformed this baseline in terms of one-step-ahead and multistep-ahead neural self-predictions (Supplementary Fig. 5 ).
Use-case 2: DPAD extracts behavior-predictive nonlinear transformations from raw LFP
We next used DPAD to compare the amount of nonlinearity in the neural–behavioral transformation across different neural modalities (Fig. 2 and Supplementary Fig. 3 ). To do so, we compared the gain in behavior decoding accuracy when going from linear to nonlinear DPAD modeling in each modality. In all datasets, raw LFP activity had the highest gain from nonlinearity in behavior decoding accuracy (Fig. 2c,f,k ). Notably, using nonlinear DPAD, raw LFP reached more accurate behavior decoding than traditional LFP band powers in all tasks (Fig. 2b,e,j ). In one dataset, raw LFP even significantly surpassed spiking activity in terms of behavior decoding accuracy (Fig. 2e ). Note that computing LFP powers involves a prespecified nonreversible nonlinear transformation of raw LFP, which may be discarding important behaviorally relevant information that DPAD can uncover directly from raw LFP. Interestingly, linear dynamical modeling did worse for raw LFP than LFP powers in most tasks (compare linear DPAD for raw LFP versus LFP powers), suggesting that nonlinearity, captured by DPAD, was required for uncovering the extra behaviorally relevant information in raw LFP.
We next examined the spatial pattern of behaviorally relevant information across recording channels. For different channels, we compared the neural self-prediction of DPAD’s low-dimensional behaviorally relevant latent states (Extended Data Fig. 5 ). We computed the coefficient of variation (defined as standard deviation divided by mean) of the self-prediction over recording channels and found that the spatial distribution of behaviorally relevant information was less variable in raw LFP than spiking activity ( P ≤ 0.00071, one-sided signed-rank test, N = 35 for all three datasets with LFP). This could suggest that raw LFPs reflect large-scale network-level behaviorally relevant computations, which are thus less variable within the same spatial brain area than spiking, which represents local, smaller-scale computations 55 .
Use-case 3: DPAD enables behavior-predictive nonlinear dynamical dimensionality reduction
We next found that DPAD extracted latent states that were lower dimensional yet more behavior predictive than both nonlinear NDM and linear DPAD (Fig. 5 ). Specifically, we inspected the dimension required for nonlinear DPAD to reach almost (within 5% of) peak behavior decoding accuracy in each dataset (Fig. 5b,g,l,o ). At this low latent state dimension, linear DPAD and nonlinear and linear NDM all achieved much lower behavior decoding accuracy than nonlinear DPAD across all neural modalities (Fig. 5c–e,h–j,m,p–r ). The lower decoding accuracy of nonlinear NDM suggests that the dominant dynamics in spiking and LFP modalities can be unrelated to the modeled behavior. Thus, behaviorally relevant dynamics can be missed or confounded unless they are prioritized during nonlinear learning, as is done by DPAD. Moreover, we visualized the 2D latent state trajectories learned by each method (Extended Data Fig. 6 ). Consistent with the above results, DPAD extracted latent states from neural activity that were clearly different for different behavior/movement conditions (Extended Data Fig. 6b,e,h,k ). In comparison, NDM extracted latent states that did not as clearly dissociate different conditions (Extended Data Fig. 6c,f,i,l ). These results highlight the capability of DPAD for nonlinear dynamical dimensionality reduction in neural data while preserving behaviorally relevant neural dynamics.
a , The 3D reach task. b , Cross-validated decoding accuracy (CC) achieved by variations of linear/nonlinear DPAD/NDM for different latent state dimensions. For nonlinear DPAD/NDM, the nonlinearities are selected automatically based on the training data to maximize behavior decoding accuracy (flexible nonlinearity). Solid lines show the average across sessions and folds ( N = 35 session-folds), and the shaded areas show the s.e.m.; Low-dim., low-dimensional. c , Decoding accuracy of nonlinear DPAD versus linear DPAD and nonlinear/linear NDM at the latent state dimension for which DPAD reaches within 5% of its peak decoding accuracy in the training data across all latent state dimensions. Bars, whiskers, dots and asterisks are defined as in Fig. 2b ( N = 35 session-folds). d , Same as c for modeling of raw LFP ( N = 35 session-folds). e , Same as c for modeling of LFP band power activity ( N = 35 session-folds). f – j , Same as a – e for the second dataset with saccadic eye movements ( N = 35 session-folds). k – m , Same as a – c for the third dataset, which did not include LFP data, with sequential cursor reaches controlled via a 2D manipulandum ( N = 15 session-folds). n – r , Same as a – e for the fourth dataset, with random grid virtual reality cursor reaches controlled via fingertip position ( N = 35 session-folds). For all DPAD variations, only the first two optimization steps were used in this figure (that is, n 1 = n x ) to only focus on learning behaviorally relevant neural dynamics in the dimensionality reduction regimen.
Next, we found that at low dimensions, nonlinearity could improve the accuracy of both behavior decoding (Fig. 5b,g,l,o ) and neural self-prediction (Extended Data Fig. 7 ). However, as the state dimension was increased, linear methods reached similar neural self-prediction performance as nonlinear methods across modalities (Fig. 3 and Extended Data Fig. 3 ). This was in contrast to behavior decoding, which benefited from nonlinearity regardless of how high the dimension was (Figs. 2 and 3 ).
Use-case 4: DPAD localizes the nonlinearity in the neural–behavioral transformation
Numerical simulations validate dpad’s localization.
To demonstrate that DPAD can correctly find the origin of nonlinearity in the neural–behavioral transformation (Extended Data Fig. 2 and Supplementary Fig. 2 ), we simulated random models where only one of the parameters was set to a random nonlinear function ( Methods ). DPAD identifies a parameter as the origin if models with nonlinearity only in that parameter are on the best performance frontier when compared to alternative models, that is, models with nonlinearity in other parameters, models with flexible/full nonlinearity and fully linear models (Fig. 6a ). DPAD enables this assessment due to (1) its flexible control over nonlinearities to train alternative models and (2) its simultaneous neural–behavioral modeling and evaluation ( Methods ). In all simulations, DPAD identified that the model with the correct nonlinearity origin was on the best performance frontier compared to alternative nonlinear models (Extended Data Fig. 2 and Supplementary Fig. 2 ), thus correctly revealing the origin of nonlinearity.
a , The process of determining the origin of nonlinearity via hypothesis testing shown with an example simulation. Simulation results are taken from Extended Data Fig. 2b , and the origin is correctly identified as K . Pluses and whiskers are defined as in Fig. 3 ( N = 20 random models). b , The 3D reach task. c , DPAD’s hypothesis testing. Cross-validated neural self-prediction accuracy (CC) for each nonlinearity and the corresponding decoding accuracy. DPAD variations that have only one nonlinear parameter (for example, C z ) use a nonlinear neural network for that parameter and keep all other parameters linear. Linear and flexible nonlinear results are as in Fig. 3 . Latent state dimension in each session and fold is chosen (among powers of 2 up to 128) as the smallest that reaches peak neural self-prediction in training data or reaches peak decoding in training data, whichever is larger ( Methods ). Pluses and whiskers are defined as in Fig. 3 ( N = 35 session-folds). Annotated arrows indicate any individual nonlinearities that are on the best performance frontier compared to all other models. Results are shown for spiking activity here and for raw LFP and LFP power activity in Supplementary Fig. 6 . d , e , Same as b and c for the second dataset with saccadic eye movements ( N = 35 session-folds). f , g , Same as b and c for the third dataset, with sequential cursor reaches controlled via a 2D manipulandum ( N = 15 session-folds). h , i , Same as b and c for the fourth dataset, with random grid virtual reality cursor reaches controlled via fingertip position ( N = 35 session-folds). For all DPAD variations, the first 16 latent state dimensions are learned using the first two optimization steps, and the remaining dimensions are learned using the last two optimization steps (that is, n 1 = 16).
DPAD consistently localized nonlinearities in the behavior readout
Having validated the localization of nonlinearity in simulations, we used DPAD to find where in the model nonlinearities could be isolated to in our real datasets. We found that having the nonlinearity only in the behavior readout parameter C z was largely sufficient for achieving high behavior decoding and neural self-prediction accuracies across all our datasets and modalities (Fig. 6b–i and Supplementary Fig. 6 ). First, for spiking activity, models with nonlinearity only in the behavior readout parameter C z reached the best behavior decoding accuracy compared to models with other individual nonlinearities (Fig. 6c,e,i ) while reaching almost the same decoding accuracy as fully nonlinear models (Fig. 6c,e,g,i ). Second, these models with nonlinearity only in the behavior readout also reached a self-prediction accuracy that was unmatched by other types of individual nonlinearity (Fig. 6c,e,g,i ). Overall, this meant that models with nonlinearity only in the behavior readout parameter C z were always on the best performance frontier when compared to all other linear or nonlinear models (Fig. 6c,e,g,i ). This result interestingly also held for both LFP modalities (Supplementary Fig. 6 ).
Consistent with the above localization results, DPAD with flexible nonlinearity also, very frequently, automatically selected models with nonlinearity in the behavior readout parameter (Supplementary Fig. 7 ). However, critically, this observation on its own cannot conclude that nonlinearities can be isolated in the behavior readout parameter. This is because in the flexible nonlinearity approach, parameters may be selected as nonlinear as long as this nonlinearity does not hurt the prediction accuracies, which does not imply that such nonlinearities are necessary ( Methods ); this is why we need the hypothesis testing procedure above (Fig. 6a ). Of note, using an LSTM for the recursion parameter A ′ is one of the nonlinearity options that is automatically considered in DPAD (Extended Data Fig. 3 ), but we found that LSTM was rarely selected in our datasets as the recursion dynamics in the flexible search over nonlinearities (Supplementary Fig. 7 ). Finally, note that fitting models with a nonlinear behavior readout via a post hoc nonlinear refitting of linear DPAD models (1) cannot identify the origin of nonlinearity in general (for example, other brain regions or tasks) and (2) even in our datasets resulted in significantly worse decoding than the same models being fitted end-to-end as done by nonlinear DPAD ( P ≤ 0.0027, one-sided signed-rank test, N ≥ 15).
Together, these results highlight the application of DPAD in enabling investigations of nonlinear processing in neural computations underlying specific behaviors. DPAD’s machinery can not only fit fully nonlinear models but also provide evidence for the location in the model where the nonlinearity can be isolated ( Discussion ).
Use-case 5: DPAD extends to noncontinuous and intermittent data
Dpad extends to intermittently sampled behavior observations.
DPAD also supports intermittently sampled behaviors ( Methods ) 56 , that is, when behavior is measured only during a subset of time steps. We first confirmed in numerical simulations with random models that DPAD correctly learns the model with intermittently sampled behavioral data (Supplementary Fig. 8 ). Next, in each of our neural datasets, we emulated intermittent sampling by randomly discarding up to 90% of behavior samples during learning. DPAD learned accurate nonlinear models even in this case (Extended Data Fig. 8 ). This capability is important, for example, in affective neuroscience or neuropsychiatry applications where the behavior consists of sparsely sampled momentary ecological assessments of mental states such as mood 40 . We next simulated a mood decoding application and found that with as low as one behavioral (for example, mood survey) sample per day, DPAD still outperformed NDM even when NDM had access to continuous behavior samples (Extended Data Fig. 9 ). These results suggest the potential utility of DPAD in such applications, although substantial future validation in data is needed 7 , 40 , 41 , 42 .
DPAD extends to noncontinuous-valued observations
DPAD also extends to modeling of noncontinuous-valued (for example, categorical) behaviors ( Methods ). To demonstrate this, we modeled the transformation from neural activity to the momentary phase of the task in the 3D reach task: reach, hold, return or rest (Fig. 7 ). Compared to nonlinear NDM (which is dynamic) or nonlinear nondynamic methods such as support vector machines, DPAD more accurately predicted the task phase at each point in time (Fig. 7 ). This capability can extend the utility of DPAD to categorical behaviors such as decision choices in cognitive neuroscience 39 .
a , In the 3D reach dataset, we model spiking activity along with the epoch of the task as discrete behavioral data ( Methods and Fig. 2a ). The epochs/classes are (1) reaching toward the target, (2) holding the target, (3) returning to resting position and (4) resting until the next reach. b , DPAD’s predicted probability for each class is shown in a continuous segment of the test data. Most of the time, DPAD predicts the highest probability for the correct class. c , The cross-validated behavior classification performance, quantified as the area under curve (AUC) for the four-class classification, is shown for different methods at different latent state dimensions. Solid lines and shaded areas are defined as in Fig. 5b ( N = 35 session-folds). AUC of 1 and 0.5 indicate perfect and chance-level classification, respectively. We include three nondynamic/static classification methods that map neural activity for a given time step to class label at the same time step (Extended Data Table 1 ): (1) multilayer neural network, (2) nonlinear support vector machine (SVM) and (3) linear discriminant analysis (LDA). d , Cross-validated behavior classification performance (AUC) achieved by each method when choosing the state dimension in each session and fold as the smallest that reaches peak classification performance in the training data among all state dimensions with that method ( Methods ). Bars, whiskers, dots and asterisks are defined as in Fig. 2b ( N = 35 session-folds). e , Same as d when all methods use the same latent state dimension as DPAD (best nonlinearity for decoding) does in d ( N = 35 session-folds). c and e show DPAD’s benefit for dimensionality reduction. f , Cross-validated neural self-prediction accuracy achieved by each method versus the corresponding behavior classification performance. Here, the latent state dimension for each method in each session and fold is chosen (among powers of 2 up to 128) as the smallest that reaches peak neural self-prediction in training data or reaches peak decoding in training data, whichever is larger ( Methods ). Pluses and whiskers are defined as in Fig. 3 ( N = 35 session-folds).
Finally, we applied DPAD to nonsmoothed spike counts, where we compared the results with two noncausal sequential autoencoder methods, termed LFADS 16 and TNDM 18 (Supplementary Fig. 9 ), both of which have Poisson observations that model nonsmoothed spike counts 16 , 18 . TNDM 18 , which was developed after LFADS 16 and concurrently with our work 44 , 56 , adds behavioral terms to the objective function for a subset of latents but unlike DPAD does so with a mixed objective and thus does not completely dissociate or prioritize behaviorally relevant dynamics (Extended Data Table 1 and Supplementary Note 3 ). Compared to both LFADS and TNDM, DPAD remained on the best performance frontier for predicting the neural–behavioral data (Supplementary Fig. 9a ) and more accurately predicted behavior using low-dimensional latent states (Supplementary Fig. 9b ). Beyond this, TNDM and LFADS also have fundamental differences with DPAD and do not address some of DPAD’s use-cases ( Discussion and Extended Data Table 1 ).
We developed DPAD for nonlinear dynamical modeling and investigation of neural dynamics underlying behavior. DPAD can dissociate the behaviorally relevant neural dynamics and prioritize their learning over other neural dynamics, enable hypothesis testing regarding the origin of nonlinearity in the neural–behavioral transformation and achieve causal decoding. DPAD enables prioritized dynamical dimensionality reduction by extracting lower-dimensional yet more behavior-predictive latent states from neural population activity and supports modeling noncontinuous-valued (for example, categorical) and intermittently sampled behavioral data. These attributes make DPAD suitable for diverse use-cases across neuroscience and neurotechnology, some of which we demonstrated here.
We found similar results for three neural modalities: spiking activity, LFP band powers and raw LFP. For all modalities, nonlinear DPAD more accurately learned the behaviorally relevant neural dynamics than linear DPAD and linear/nonlinear NDM as reflected in its better decoding while also reaching the best performance frontier when considering both behavior decoding and neural self-prediction. Notably, the raw LFP activity benefited the most from nonlinear modeling using DPAD and outperformed LFP powers in all tasks in terms of decoding. This suggests that automatic learning of nonlinear models from raw LFP using DPAD reveals behaviorally relevant information that may be discarded when extracting traditionally used features such as LFP band powers. Also, nonlinearity was necessary to recover the extra information in raw LFP, as, unlike DPAD modeling, linear dynamical modeling of raw LFP did not outperform that of LFP powers in most datasets. These results highlight another use-case of DPAD for automatic dynamic feature extraction from LFP data.
As another use-case, DPAD enabled an investigation of which element in the neural–behavioral transformation was nonlinear. Interestingly, consistently across our four movement-related datasets, DPAD models with nonlinearity only in the behavior readout performed similarly to fully nonlinear models, reaching the best performance frontier for predicting future behavior and neural data using past neural data. The consistency of this result across our datasets is interesting because, as demonstrated in simulations (Extended Data Fig. 2 , Supplementary Fig. 2 and Fig. 6a ), the detected origin of nonlinearity could have technically been in any one (or more) of the following four elements (Fig. 1a,b ): neural input, recurrent dynamics and neural or behavior readouts, all of which were correctly localized in simulations (Extended Data Fig. 2 and Supplementary Fig. 2 ). Thus, the consistent localization results on our neural datasets provide evidence that across these four tasks, neural dynamics in these recorded cortical areas may be largely describable with linear dynamics of sufficiently high dimension, with additional nonlinearities introduced somewhere between the neural state and behavior. This finding may be consistent with (1) introduction of nonlinear processing along the downstream neuromuscular pathway that goes from the recorded cortical area to the measured behavior or any of the convergent inputs along this pathway 57 , 58 , 59 or (2) cognition intervening nonlinearly between these latent neural states and behavior, for example, by implementing context-dependent computations 60 . This result illustrates how DPAD can provide new hypotheses and the machinery to test them in future experiments that would record from multiple additional brain regions (for example, both motor and cognitive regions) and use DPAD to model them together. Such analyses may narrow down or revise the origin of nonlinearity for the wider neural–behavioral measurement set; for example, the state dynamics may be found to be nonlinear once additional brain regions are added. Localization of nonlinearity could also guide the design of competitive deep learning architectures that are more flexible or easier to implement in neurotechnologies such as brain–computer interfaces 61 .
Interestingly, the behavior decoding aspect of the localization finding here is consistent with a prior study 22 that explored the mapping of the motor cortex to an electromyogram (EMG) during a one-dimensional movement task with varying forces and found that a fully linear model was worse than a nonlinear EMG readout in decoding the EMG 22 . However, as our simulations show (Extended Data Fig. 2b and Fig. 6a ), comparing a linear model to a model that has nonlinear behavior readout is not sufficient to conclude the origin of nonlinearity, and a stronger test is needed (see Fig. 6a for a counter example and details in Methods ). Further, this previous study 22 used a specific condition-dependent nonlinearity for behavior readout rather than a universal nonlinear function approximator that DPAD enables. Finally, to conclude localization, the model with that specific nonlinearity should perform similarly to fully nonlinear models; however, unlike our results, a fully nonlinear LSTM model in some cases appears to outperform models with nonlinear readout in this prior study (see Fig. 7a,b in ref. 22 versus Fig. 9c in ref. 22 ); it is unclear if this result is due to this prior study’s specific readout nonlinearity being suboptimal or to the nonlinear origin being different in its dataset 22 . DPAD can address such questions by (1) allowing for training and comparison of alternative models with different nonlinear origins and (2) enabling a general (versus specific) nonlinearity in model parameters.
When hypothesis testing about where in the model nonlinearity can be isolated to, it may be possible to equivalently explain the same data with multiple types of nonlinearities (for example, with either a nonlinear neural input or a nonlinear readout). Such nonidentifiability is a common limitation for latent models. However, when such equivalence exists, we expect all equivalent nonlinear models to have similar performance and thus lie on the best performance frontier. But this was not the case in our datasets. Instead, we found that the nonlinear behavior readout was in most cases the only individual nonlinear parameter on the best performance frontier, providing evidence that no other individual nonlinear parameter was as suitable in our datasets. Alternatively, the best model describing the data may require two or more of the four parameters to be nonlinear. But in our datasets, models with nonlinearity only in the behavior readout were always on the best performance frontier and could not be considerably outperformed by models with more than one nonlinearity (Fig. 6 ). Nevertheless, we note that ultimately our analysis simply provides evidence for one location of nonlinearity resulting in a better fit to data with a parsimonious model, but it does not rule out other possibilities for explaining the data. For example, one could reformulate a nonlinear readout model by adding latent states and representing the readout nonlinearity as a recursion nonlinearity for the additional states, although such an equivalent but less parsimonious model may need more data to be learned as accurately. Finally, we also note that our conclusions were based on the datasets and family of nonlinear models (recursive RNNs) considered here, and thus we cannot rule out different conclusions in other scenarios and/or brain regions. Nevertheless, by providing evidence for a nonlinearity configuration, DPAD can provide testable hypotheses for future experiments that record from more brain regions.
Sequential autoencoders, spearheaded by LFADS 16 , have been used to smooth single-trial neural activity 16 without considering relevance to behavior, which is a distinct goal as we showed in comparison to PSID in our prior work 6 . Notably, another sequential autoencoder, termed TNDM, has been developed concurrently with our work 44 , 56 that adds a behavior term to the optimization objective 18 . However, these approaches do not enable several of the use-cases of DPAD here. First, unlike DPAD’s four-step learning approach, TNDM and LFADS use a single learning step with a neural-only objective (LFADS) 16 or a mixed neural–behavioral objective (TNDM) 18 that does not fully prioritize the behaviorally relevant neural dynamics (Extended Data Table 1 and Supplementary Note 3 ). DPAD’s prioritization is important for accurate learning of behaviorally relevant neural dynamics and for preserving them in dimensionality reduction, as our results comparing DPAD to TNDM/LFADS suggest (Supplementary Fig. 9 ). Second, TNDM and LFADS 16 , 18 , like other prior works 16 , 18 , 20 , 23 , 24 , 26 , 61 , do not provide flexible nonlinearity or explore hypotheses regarding the origin of nonlinearities because they use fixed nonlinear network structures (use-case 4). Third, TNDM considers spiking activity and continuous behaviors 18 , whereas DPAD extends across diverse neural and behavioral modalities: spiking, raw LFP and LFP powers and continuous, categorical or intermittent behavioral modalities. Fourth, in contrast to these noncausal sequential autoencoders 16 , 18 and some other nonlinear methods 8 , 14 , DPAD can process the test data causally and without expensive computations such as iterative expectation maximization 8 , 14 or sampling and averaging 16 , 18 . This causal efficient processing is also important for real-time closed-loop brain–computer interfaces 62 , 63 . Of note, noncausal processing is also implemented in the DPAD code library as an option ( Methods ), although it is not shown in this work. Finally, unlike these prior methods 14 , 16 , 18 , DPAD does not require fixed-length trials or trial structure, making it suitable for modeling naturalistic behaviors 5 and neural dynamics with trial-to-trial variability in the alignment to task events 64 .
Several methods can in some ways prioritize behaviorally relevant information while extracting latent embeddings from neural data but are distinct from DPAD in terms of goals and capabilities. One group includes nondynamic/static methods that do not explicitly model temporal dynamics 1 . These methods build linear maps (for example, as in demixed principal component analysis (dPCA) 34 ) or nonlinear maps, such as convolutional maps in a concurrently 44 developed method with DPAD named CEBRA 36 , to extract latent embeddings that can be guided by behavior either as a trial condition 34 or indirectly as a contrastive loss 36 . These nondynamic mappings only use a single sample or a small fixed window around each sample of neural data to extract latent embeddings (Extended Data Table 1 ). By contrast, DPAD can recursively aggregate information from all past neural data by explicitly learning a model of temporal dynamics (recursion), which also enables forecasting unlike in static/nondynamic methods. These differences may be one reason why DPAD outperformed CEBRA in terms of neural–behavioral prediction (Fig. 4 ). Another approach is used by task aligned manifold estimation (TAME-GP) 9 , which uses a Gaussian process prior (as in Gaussian process factor analysis (GPFA) 14 ) to expand the window of neural activity used for extracting the embedding into a complete trial. Unlike DPAD, methods with a Gaussian process prior have limited support for nonlinearity, often do not have closed-forms for inference and thus necessitate numerical optimization even for inference 9 and often operate noncausally 9 . Finally, the above methods do not provide flexible nonlinearity or hypothesis testing to localize the nonlinearity.
Other prior works have used RNNs either causally 20 , 22 , 23 , 24 , 26 or noncausally 16 , 18 , for example, for causal decoding of behavior from neural activity 20 , 22 , 23 , 24 , 26 . These works 20 , 22 , 23 , 24 , 26 have similarities to the first step of DPAD’s four-step optimization (Supplementary Fig. 1a ) in that the RNNs in these works learn dynamical models by solely optimizing behavior prediction. However, these works do not learn the mapping from the RNN latent states to neural activity, which is done in DPAD’s second optimization step to enable neural self-prediction (Supplementary Fig. 1a ). In addition, unlike what the last two optimization steps in DPAD enable, these prior works do not model additional neural dynamics beyond those that decode behavior and thus do not dissociate the two types of neural dynamics (Extended Data Table 1 ). Finally, as noted earlier, these prior works 9 , 20 , 23 , 24 , 26 , 36 , 61 , similar to prior sequential autoencoders 16 , 18 , have fixed nonlinear network structures and thus cannot explore hypotheses regarding the origin of nonlinearities or flexibly learn the best nonlinear structure for the training data (Fig. 1c,d and Extended Data Table 1 ).
DPAD’s optimization objective functions are not convex, similar to most nonlinear deep learning methods. Thus, as usual with nonconvex optimizations, convergence to a global optimum is not guaranteed. Moreover, as with any method, quality and neural–behavioral prediction of the learned models depend on dataset properties such as signal-to-noise ratio. Thus, we compare alternative methods within each dataset, suggesting that (for example, Fig. 4 ) across the multiple datasets here, DPAD learns more accurate models of neural–behavioral data. However, models in other datasets/scenarios may not be as accurate.
Here, we focused on using DPAD to model the transformation of neural activity to behavior. DPAD can also be used to study the transformation between other signals. For example, when modeling data from multiple brain regions, one region can be taken as the primary signal ( y k ) and another as the secondary signal ( z k ) to dissociate their shared versus distinct dynamics. Alternatively, when modeling the brain response to electrical 7 , 41 , 42 or sensory 41 , 65 , 66 stimulation, one could take the primary signal ( y k ) to be the stimulation and the secondary signal ( z k ) to be neural activity to dissociate and predict neural dynamics that are driven by stimulation. Finally, one may apply DPAD to simultaneously recorded brain activity from two subjects as primary and secondary signals to find shared intersubject dynamics during social interactions.
Model formulation
Equation ( 1 ) simplifies the DPAD model by showing both of its RNN sections as one, but the general two-section form of the model is as follows:
This equation separates the latent states of Eq. ( 1 ) into the following two parts: \({x}_{k}^{\left(1\right)}\in {{\mathbb{R}}}^{{n}_{1}}\) denotes the latent states of the first RNN section that summarize the behaviorally relevant dynamics, and \({x}_{k}^{\left(2\right)}\in {{\mathbb{R}}}^{{n}_{2}}\) , with \({n}_{2}={n}_{x}-{n}_{1}\) , denotes those of the second RNN section that represent the other neural dynamics (Supplementary Fig. 1a ). Here, A ′(1) , A ′(2) , K (1) , K (2) , \({C}_{y}^{\,\left(1\right)}\) , \({C}_{y}^{\,\left(2\right)}\) , \({C}_{z}^{\,\left(1\right)}\) and \({C}_{z}^{\,\left(2\right)}\) are multi-input–multi-output functions that parameterize the model, which we learn using a four-step numerical optimization formulation expanded on in the next section (Supplementary Fig. 1a ). DPAD also supports learning the initial value of the latent states at time 0 (that is, \({x}_{0}^{\left(1\right)}\) and \({x}_{0}^{\left(2\right)}\) ) as a parameter, but in all analyses in this paper, the initial states are simply set to 0 given their minimal impact when modeling long data sequences. Each pair of superscripted parameters (for example, A ′(1) and A ′(2) ) in Eq. ( 2 ) is a dissociated version of the corresponding nonsuperscripted parameter in Eq. ( 1 ) (for example, A ′). The computation graph for Eq. ( 2 ) is provided in Fig. 1b (and Supplementary Fig. 1a ). In Eq. ( 2 ), the recursions for computing \({x}_{k}^{\left(1\right)}\) are not dependent on \({x}_{k}^{\left(2\right)}\) , thus allowing the former to be computed without the latter. By contrast, \({x}_{k}^{\left(2\right)}\) can depend on \({x}_{k}^{\left(1\right)}\) , and this dependence is modeled via K (2) (see Supplementary Note 2 ). Note that such dependence of \({x}_{k}^{\left(2\right)}\) on \({x}_{k}^{\left(1\right)}\) via K (2) does not introduce new dynamics to \({x}_{k}^{\left(2\right)}\) because it does not involve the recursion parameter A ′(2) , which describes the dynamics of \({x}_{k}^{\left(2\right)}\) . This two-section RNN formulation is mathematically motivated by equivalent representations of a dynamical system model in different bases and by the relation between the predictor and stochastic forms of dynamical systems (Supplementary Notes 1 and 2 ).
For the RNN formulated in Eq. ( 1 ) or ( 2 ), neural activity y k constitutes the input, and predictions of neural and behavioral signals are the outputs (Fig. 1b ) given by
Note that each x k is estimated purely using all past y k (that is, y 1 , …, y k – 1 ), so the predictions in Eq. ( 3 ) are one-step-ahead predictions of y k and z k using past neural observations (Supplementary Note 1 ). Once the model parameters are learned, the extraction of latent states x k involves iteratively applying the first line from Eq. ( 2 ), and predicting behavior or neural activity involves applying Eq. ( 3 ) to the extracted x k . As such, by writing the nonlinear model in predictor form 67 , 68 (Supplementary Note 1 ), we enable causal and computationally efficient prediction.
Learning: four-step numerical optimization approach
Unlike nondynamic models 1 , 34 , 35 , 36 , 69 , dynamical models explicitly model temporal evolution in time series data. Recent dynamical models have gone beyond linear or generalized linear dynamical models 2 , 3 , 4 , 5 , 6 , 7 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 to incorporate switching linear 10 , 11 , 12 , 13 , locally linear 37 or nonlinear 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 23 , 24 , 26 , 27 , 38 , 61 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 dynamics, often using deep learning methods 25 , 91 , 92 , 93 , 94 . But these recent nonlinear/switching works do not aim to localize nonlinearity or allow for flexible nonlinearity and do not enable fully prioritized dissociation of behaviorally relevant neural dynamics because they either do not consider behavior in their learning objective at all 14 , 16 , 37 , 38 , 61 , 95 , 96 or incorporate it with a mixed neural–behavioral objective 9 , 18 , 35 , 61 (Extended Data Table 1 ).
In DPAD, we develop a four-step learning method for training our two-section RNN in Eq. ( 1 ) and extracting the latent states that (1) enables dissociation and prioritized learning of the behaviorally relevant neural dynamics in the nonlinear model, (2) allows for flexible modeling and localization of nonlinearities, (3) extends to data with diverse distributions and (4) does all this while also achieving causal decoding and being applicable to data both with and without a trial structure. DPAD is for nonlinear modeling, and its multistep learning approach, in each step, uses numerical optimization tools that are rooted in deep learning. Thus, DPAD is mathematically distinct from our prior PSID work for linear models, which is an analytical and linear technique. PSID is based on analytical linear algebraic projections rooted in control theory 6 , which are thus not extendable to nonlinear modeling or to non-Gaussian, noncontinuous or intermittently sampled data. Thus, even when we restrict DPAD to linear modeling as a special case, it is still mathematically different from PSID 6 .
To dissociate and prioritize the behaviorally relevant neural dynamics, we devise a four-step optimization approach for learning the two-section RNN model parameters (Supplementary Fig. 1a ). This approach prioritizes the extraction and learning of the behaviorally relevant dynamics in the first two steps with states \({x}_{k}^{\left(1\right)}\in {{\mathbb{R}}}^{{n}_{1}}\) while also learning the rest of the neural dynamics in the last two steps with states \({x}_{k}^{\left(2\right)}\in {{\mathbb{R}}}^{{n}_{2}}\) and dissociating the two subtypes of dynamics. This prioritization is important for accurate learning of behaviorally relevant neural dynamics and is achieved because of the multistep learning approach; the earlier steps learn the behaviorally relevant dynamics first, that is, with priority, and then the subsequent steps learn the other neural dynamics later so that they do not mask or confound the behaviorally relevant dynamics. Importantly, each optimization step is independent of subsequent steps so all steps can be performed in order, with no need to iteratively repeat any step. We define the neural and behavioral prediction losses that are used in the optimization steps based on the negative log-likelihoods (NLLs) associated with the neural and behavior distributions, respectively. This approach benefits from the statistical foundation of maximum likelihood estimation and facilitates generalizability across behavioral distributions. We now expand on each of the four optimization steps for RNN training.
Optimization step 1
In the first two optimization steps (Supplementary Fig. 1a ), the objective is to learn the behaviorally relevant latent states \({x}_{k}^{\left(1\right)}\) and their associated parameters. In the first optimization step, we learn the parameters A ′(1) , \({C}_{z}^{\,\left(1\right)}\) and K (1) of the RNN
and estimate its latent state \({x}_{k}^{\left(1\right)}\) while minimizing the NLL of the behavior z k given by \({x}_{k}^{\left(1\right)}\) . For continuous-valued (Gaussian) behavioral data, we minimize the following sum of squared prediction error 69 , 97 given by
where the sum is over all available samples of behavior z k , and \({\Vert .\Vert }_{2}\) indicates the two-norm operator. This objective, which is typically used when fitting models to continuous-valued data 69 , 97 , is proportional to the Gaussian NLL if we assume isotropic Gaussian residuals (that is, ∑ 𝜖 = σ 𝜖 I ) 69 , 97 . If desired, a general nonisotropic residual covariance ∑ 𝜖 can be empirically computed from model residuals after the above optimization is solved (see Learning noise statistics ), although having ∑ 𝜖 is mainly useful for simulating new data and is not needed when using the learned model for inference. Similarly, in the subsequent optimization steps detailed later, the same points hold regarding how the appropriate mean squared error used for continuous-valued data is proportional to the Gaussian NLL if we assume isotropic Gaussian residuals and how the residual covariance can be computed empirically after the optimization if desired.
Optimization step 2
The second optimization step uses the extracted latent state \({x}_{k}^{\left(1\right)}\) from the RNN and fits the parameter \({C}_{y}^{\left(1\right)}\) in
while minimizing the NLL of the neural activity y k given by \({x}_{k}^{(1)}\) . For continuous-valued (Gaussian) neural activity y k , we minimize the following sum of squared prediction error 69 :
where the sum is over all available samples of y k . Optimization steps 1 and 2 conclude the prioritized extraction and modeling of behaviorally relevant latent states \({x}_{k}^{(1)}\) (Fig. 1b ) and the learning of the first section of the RNN model (Supplementary Fig. 1a ).
Optimization step 3
In optimization steps 3 and 4 (Supplementary Fig. 1a ), the objective is to learn any additional dynamics in neural activity that are not learned in the first two optimization steps, that is, \({x}_{k}^{\left(2\right)}\) and the associated parameters. To do so, in the third optimization step, we learn the parameters A ′(2) , \({C}_{y}^{\,\left(2\right)}\) and K (2) of the RNN
and estimate its latent state \({x}_{k}^{\left(2\right)}\) while minimizing the aggregate NLL of y k given both latent states, that is, by also taking into account the NLL obtained from step 2 via the \({C}_{y}^{\,\left(1\right)}\left({x}_{k}^{\left(1\right)}\right)\) term in Eq. ( 6 ). The notations \({y}_{k}^{{\prime} }\) and \({e}_{k}^{{\prime} }\) in the second line of Eq. ( 8 ) signify the fact that it is not y k that is predicted by the RNN of Eq. ( 8 ), rather it is the yet unpredicted parts of y k (that is, unpredicted after extracting \({x}_{k}^{(1)}\) ) that are being predicted. In the case of continuous-valued (Gaussian) neural activity y k , we minimize the following loss:
where the sum is over all available samples of y k . Note that in the continuous-valued (Gaussian) case, this loss is equivalent to minimizing the error in predicting the residual neural activity given by \({y}_{k}-{C}_{y}^{\,\left(1\right)}\left({x}_{k}^{\left(1\right)}\right)\) and is computed using the previously learned parameter \({C}_{y}^{\,\left(1\right)}\) and the previously extracted states \({x}_{k}^{\left(1\right)}\) in steps 1 and 2. Also, the input to the RNN in Eq. ( 8 ) includes both y k and the extracted \({x}_{k+1}^{\left(1\right)}\) from optimization step 1. The above shows how the optimization steps are appropriately linked together to compute the aggregate likelihoods.
Optimization step 4
If we assume that the second set of states \({x}_{k}^{\left(2\right)}\) do not contain any information about behavior, we could stop the modeling. However, this may not be the case if the dimension of the states extracted in the first optimization step (that is, n 1 ) is selected to be very small such that some behaviorally relevant neural dynamics are not learned in the first step. To be robust to such selections of n 1 , we can use another final numerical optimization to determine based on the data whether and how \({x}_{k}^{\left(2\right)}\) should affect behavior prediction. Thus, a fourth optimization step uses the extracted latent state in optimization steps 1 and 3 and fits C z in
while minimizing the negative log-likelihood of behavior given both latent states. In the case of continuous-valued (Gaussian) behavior z k , we minimize the following loss:
The parameter C z that is learned in this optimization step will replace both \({C}_{z}^{\,\left(1\right)}\) and \({C}_{z}^{\,\left(2\right)}\) in Eq. ( 2 ). Optionally, in a final optimization step, a similar nonlinear mapping from \({x}_{k}^{\left(1\right)}\) and \({x}_{k}^{\left(2\right)}\) can also be learned, this time to predict y k , which allows DPAD to support nonlinear interactions of \({x}_{k}^{\left(1\right)}\) and \({x}_{k}^{\left(2\right)}\) in predicting neural activity. In this case, the resulting learned C y parameter will replace both \({C}_{y}^{\,\left(1\right)}\) and \({C}_{y}^{\,\left(2\right)}\) in Eq. ( 2 ). This concludes the learning of both model sections (Supplementary Fig. 1a ) and all model parameters in Eq. ( 2 ).
In this work, when optimization steps 1 and 3 are both used to extract the latent states (that is, when 0 < n 1 < n x ), we do not perform the additional fourth optimization step in Eq. ( 10 ), and the prediction of behavior is done solely using the \({x}_{k}^{\left(1\right)}\) states extracted in the first optimization step. Note that DPAD can also cover NDM as a special case if we only use the third optimization step to extract the states (that is, n 1 = 0, in which case the first two steps are not needed). In this case, we use the fourth optimization step to learn C z , which is the mapping from the latent states to behavior. Also, in this case, we simply have a unified state x k as there is no dissociation in NDM, and the only goal is to extract states that predict neural activity accurately.
Additional generalizations of state dynamics
Finally, the first lines of Eqs. ( 4 ) and ( 8 ) can also be written more generally as
where instead of an additive relation between the two terms of the righthand side, both terms are combined in nonlinear functions \({{A}^{{\prime} {\prime} }}^{\left(1\right)}\) and \({{A}^{{\prime} {\prime} }}^{\left(2\right)}\) , which as a special case can still learn the additive relation in Eqs. ( 4 ) and ( 8 ). Whenever both the state recursion A and neural input K parameters (with the appropriate superscripts) are specified to be nonlinear, we use the more general architecture in Eqs. ( 12 ) and ( 13 ), and if any one of A or K or both are linear, we use Eqs. ( 4 ) and ( 8 ).
As another option, both RNN sections can be made bidirectional, which enables noncausal prediction for DPAD by using future data in addition to past data, with the goal of improving prediction, especially in datasets with stereotypical trials. Although this option is not reported in this work, it is implemented and available for use in DPAD’s public code library.
Learning noise statistics
Once the learning is complete, we also compute the covariances of the neural and behavior residual time series e k and 𝜖 k as ∑ e and ∑ 𝜖 , respectively. This allows the learned model in Eq. ( 1 ) to be usable for generating new simulated data. This application is not the focus of this work, but an explanation of it is provided in Numerical simulations .
Regularization
Adding norm 1 or norm 2 regularization for any set of parameters and the option to automatically select the regularization weight with inner cross-validation is implemented in the DPAD code. However, we did not use regularization in any of the analyses presented here.
Forecasting
DPAD also enables the capability to predict neural–behavioral data more than one time step into the future. To obtain two-step-ahead prediction, we pass the one-step-ahead neural predictions of the model as neural observations into it. This allows us to perform one state update iteration, that is, line 1 of Eq. ( 2 ), with y k being replaced with \({\hat{y}}_{k}\) from Eq. ( 3 ). Repeating this procedure m times gives the ( m + 1)-step-ahead prediction of the latent state and neural–behavioral data.
Extending to intermittently measured behaviors
We also extend DPAD to modeling intermittently measured behavior time series (Extended Data Figs. 8 and 9 and Supplementary Fig. 8 ). To do so, when forming the behavior loss (Eqs. ( 5 ) and ( 11 )), we only compute the loss on samples where the behavior is measured and solve the optimization with this loss.
Extending to noncontinuous-valued data observations
We can also extend DPAD to noncontinuous-valued (non-Gaussian) observations by devising modified loss functions and observation models. Here, we demonstrate this extension for categorical behavioral observations, for example, discrete choices or epochs/phases during a task (Fig. 7 ). A similar approach could be used in the future to model other non-Gaussian behaviors and non-Gaussian (for example, Poisson) neural modalities, as shown in a thesis 56 .
To model categorical behaviors, we devise a new behavior observation model for DPAD by making three changes. First, we change the behavior loss (Eqs. ( 5 ) and ( 11 )) to the NLL of a categorical distribution, which we implement using the dedicated class in the TensorFlow library (that is, tf.keras.losses.CategoricalCrossentropy). Second, we change the behavior readout parameter C z to have an output dimension of n z × n c instead of n z , where n c denotes the number of behavior categories or classes. Third, we apply Softmax normalization (Eq. ( 14 )) to the output of the behavior readout parameter C z to ensure that for each of the n z behavior dimensions, the predicted probabilities for all the n c classes add up to 1 so that they represent valid probability mass functions. Softmax normalization can be written as
where \({l}_{k}\in {{\mathbb{R}}}^{{n}_{z}\times {n}_{c}}\) is the output of C z at time k , and the superscript ( m , n ) denotes the element of l k associated with the behavior dimension m and the class/category number n . With these changes, we obtain a new RNN architecture with categorical behavioral outputs. We then learn this new RNN architecture with DPAD’s four-step prioritized optimization approach as before but now incorporating the modified NLL losses for categorical data. Together, with these changes, DPAD extends to modeling categorical behavioral measurements.
Behavior decoding and neural self-prediction metrics and performance frontier
Cross-validation.
To evaluate the learning, we perform a cross-validation with five folds (unless otherwise noted). We cut the data from the recording session into five equal continuous segments, leave these segments out one by one as the test data and train the model only using the data in the remaining segments. Once the model is trained using the neural and behavior training data, we pass the neural test data to the model to get the latent states in the test data using the first line of Eq. ( 1 ) (or Eq. ( 2 ), equivalently). We then pass the extracted latent states to Eq. ( 3 ) to get the one-step-ahead prediction of the behavior and neural test data, which we refer to as behavior decoding and neural self-prediction, respectively. Note that only past neural data are used to get the behavior and neural predictions. Also, the behavior test data are never used in predictions. Given the predicted behavior and neural time series, we compute the CC between each dimension of these time series and the actual behavior and neural test time series. We then take the mean of CC across dimensions of behavior and neural data to get one final cross-validated CC value for behavior decoding and one final CC value for neural self-prediction in each cross-validation fold.
Selection of the latent state dimension
We often need to select a latent state dimension to report an overall behavior decoding and/or neural self-prediction accuracy for each model/method (for example, Figs. 2 – 7 ). By latent state dimension, we always refer to the total latent state dimension of the model, that is, n x . For DPAD, unless otherwise noted, we always used n 1 = 16 to extract the first 16 latent state dimensions (or all latent state dimensions when n x ≤ 16) using steps 1 and 2 and any remaining dimensions using steps 3 and 4. We chose n 1 = 16 because dedicating more, even all, latent state dimensions to behavior prediction only minimally improved it across datasets and neural modalities. For all methods, to select a state dimension n x , in each cross-validation fold, we fit models with latent state dimensions 1, 2, 4, 16,…and 128 (powers of 2 from 1 to 128) and select one of these models based on their decoding and neural self-prediction accuracies within the training data of that fold. We then report the decoding/self-prediction of this selected model computed in the test data of that fold. Our goal is often to select a model that simultaneously explains behavior and neural data well. For this goal, we pick the state dimension that reaches the peak neural self-prediction in the training data or the state dimension that reaches the peak behavior decoding in the training data, whichever is larger; we then report both the neural self-prediction and the corresponding behavior decoding accuracy of the same model with the selected state dimension in the test data (Figs. 3 – 4 , 6 and 7f , Extended Data Figs. 3 and 4 and Supplementary Figs. 4 – 7 and 9 ). Alternatively, for all methods, when our goal is to find models that solely aim to optimize behavior prediction, we report the cross-validated prediction performances for the smallest state dimension that reaches peak behavior decoding in training data (Figs. 2 , 5 and 7d , Extended Data Fig. 8 and Supplementary Fig. 3 ). We emphasize that in all cases, the reported performances are always computed in the test data of the cross-validation fold, which is not used for any other purpose such as model fitting or selection of the state dimension.
Performance frontier
When comparing a group of alternative models, we use the term ‘performance frontier’ to describe the best performances reached by models that in every comparison with any alternative model are in some sense better than or at least comparable to the alternative model. More precisely, when comparing a group \({\mathcal{M}}\) of models, model \({\mathcal{A}}\in {\mathcal{M}}\) will be described as reaching the best performance frontier when compared to every other model \({\mathcal{B}}{\mathscr{\in }}{\mathcal{M}}\) , \({\mathcal{A}}\) is significantly better than \({\mathcal{B}}\) in behavior decoding or in neural self-prediction or is comparable to \({\mathcal{B}}\) in both. Note that \({\mathcal{A}}\) may be better than some model \({{\mathcal{B}}}_{1}\in {\mathcal{M}}\) in decoding while being better than another model \({{\mathcal{B}}}_{2}\in {\mathcal{M}}\) in self-prediction; nevertheless \({\mathcal{A}}\) will be on the frontier as long as in every comparison one of the following conditions hold: (1) there is at least one measure for which \({\mathcal{A}}\) is more performant and (2) \({\mathcal{A}}\) is at least equally performant in both measures. To avoid exclusion of models from the best performance frontier due to very minimal performance differences, in this analysis, we only declare a difference in performance significant if in addition to resulting in P ≤ 0.05 in a one-sided signed-rank test there is also at least 1% relative difference in the mean performance measures.
DPAD with flexible nonlinearity: automatic determination of appropriate nonlinearity
Fine-grained control over nonlinearities.
Each parameter in the DPAD model represents an operation in the computation graph of DPAD (Fig. 1b and Supplementary Fig. 1a ). We solve the numerical optimizations involved in model learning in each step of our multistep learning via standard stochastic gradient descent 43 , which remains applicable for any modification of the computation graph that remains acyclic. Thus, the operation associated with each model parameter (for example, A ′, K , C y and C z ) can be replaced with any multilayer neural network with an arbitrary number of hidden units and layers (Supplementary Fig. 1c ), and the model remains trainable with the same approach. Having no hidden layers implements the special case of a linear mapping (Supplementary Fig. 1b ). Of course, given that the training data are finite, the typical trade-off between model capacity and generalization error remains 69 . Given that neural networks can approximate any continuous function (with a compact domain) 98 , replacing model parameters with neural networks should have the capacity to learn any nonlinear function in their place 99 , 100 , 101 . The resulting RNN in Eq. ( 1 ) can in turn approximate any state-space dynamics (under mild conditions) 102 . In this work, for nonlinear parameters, we use multilayer feed-forward networks with one or two hidden layers, each with 64 or 128 units. For all hidden layers, we always use a rectified linear unit (ReLU) nonlinear activation (Supplementary Fig. 1c ). Finally, when making a parameter (for example, C z ) nonlinear, we always do so for that parameter in both sections of the RNN (for example, both \({C}_{z}^{\,\left(1\right)}\) and \({C}_{z}^{\,\left(2\right)}\) ; see Supplementary Fig. 1a ) and using the same feed-forward network structure. Given that no existing RNN implementation allowed individual RNN elements to be independently set to arbitrary multilayer neural networks, we developed a custom TensorFlow RNN cell to implement the RNNs in DPAD (Eqs. ( 4 ) and ( 8 )). We used the Adam optimizer to implement gradient descent for all optimization steps 43 . We continued each optimization for up to 2,500 epochs but stopped earlier if the objective function did not improve in three consecutive epochs (convergence criteria).
Automatic selection of nonlinearity settings
We devise a procedure for automatically determining the most suitable combination of nonlinearities for the data, which we refer to as DPAD with flexible nonlinearity. In this procedure, for each cross-validation fold in each recording session of each dataset, we try a series of nonlinearities within the training data and select one based on an inner cross-validation within the training data (Fig. 1d ). Specifically, we consider the following options for the nonlinearity. First, each of the four main parameters (that is, A ′, K , C y and C z ) can be linear or nonlinear, resulting in 16 cases (that is, 2 4 ). In cases with nonlinearity, we consider four network structures for the parameters, that is, having one or two hidden layers and having 64 or 128 units in each hidden layer (Supplementary Fig. 1c ), resulting in 61 cases (that is, 15 × 4 + 1, where 1 is for the fully linear model) overall. Finally, specifically for the recursion parameter A ′, we also consider modeling it as an LSTM, with the other parameters still having the same nonlinearity options as before, resulting in another 29 cases for when this LSTM recursion is used (that is, 7 × 4 + 1, where 1 is for the case where the other three model parameters are all linear), bringing the total number of considered cases to 90. For each of these 90 considered linear or nonlinear architectures, we perform a twofold inner cross-validation within the training data to compute an estimate of the behavior decoding and neural self-prediction of each architecture using the training data. Note that although this process for automatic selection of nonlinearities is computationally expensive, it is parallelizable because each candidate model can be fitted independently on a different processor. Once all candidate architectures are fitted and evaluated within the training data, we select one final architecture purely based on training data to be used for that cross-validation fold based on one of the following two criteria: (1) decoding focused: pick the architecture with the best neural self-prediction in training data among all those that reach within 1 s.e.m. of the best behavior decoding; or (2) self-prediction focused: pick the architecture with the best behavior decoding in training data among all those that reach within 1 s.e.m. of the best neural self-prediction. The first criterion prioritizes good behavior decoding in the selection, and the second criterion prioritizes good neural self-prediction. Note that these two criteria are used when selecting among different already-learned models with different nonlinearities and thus are independent of the four internal objective functions used in learning the parameters for a given model with the four-step optimization approach (Supplementary Fig. 1a ). For example, in the first optimization step of DPAD, model parameters are always learned to optimize behavior decoding (Eq. ( 5 )). But once the four-step optimization is concluded and different models (with different combinations of nonlinearities) are learned, we can then select among these already-learned models based on either neural self-prediction or behavior decoding. Thus, whenever neural self-prediction is also of interest, we report the results for flexible nonlinearity based on both model selection criteria (for example, Figs. 3 , 4 and 6 ).
Localization of nonlinearities
DPAD enables an inspection of where nonlinearities can be localized to by providing two capabilities, without either of which the origin of nonlinearities may be incorrectly found. As the first capability, DPAD can train alternative models with different individual nonlinearities and then compare these alternative nonlinear models not only with a fully linear model but also with each other and with fully nonlinear models (that is, flexible nonlinearity). Indeed, our simulations showed that simply comparing a linear model to a model with nonlinearity in a given parameter may incorrectly identify the origin of nonlinearity (Extended Data Fig. 2b and Fig. 6a ). For example, in Fig. 6a , although the nonlinearity is just in the neural input parameter, a linear model does worse than a model with a nonlinear behavior readout parameter. Thus, just a comparison of the latter model to a linear model would incorrectly find the origin of nonlinearity to be the behavior readout. This issue is avoided in DPAD because it can also train a model with the neural input being nonlinear, thus finding it to be more predictive than models with any other individual nonlinearity and as predictive as a fully nonlinear model (Fig. 6a ). As the second capability, DPAD can compare alternative nonlinear models in terms of overall neural–behavioral prediction rather than either behavior decoding or neural prediction alone. Indeed, our simulations showed that comparing the models in terms of just behavior decoding (Extended Data Fig. 2d,f ) or just neural self-prediction (Extended Data Fig. 2d,h ) may lead to incorrect conclusions about the origin of nonlinearities; this is because a model with the incorrect origin may be equivalent in one of these metrics to the one with the correct origin. DPAD avoids this problem by jointly evaluating both neural–behavioral metrics. Here, when comparing models with nonlinearity in different individual parameters for localization purposes (for example, Fig. 6 ), we only consider one network architecture for the nonlinearity, that is, having one hidden layer with 64 units.
Numerical simulations
To validate DPAD in numerical simulations, we perform two sets of simulations. One set validates linear modeling to show the correctness of the four-step numerical optimization for learning. The other set validates nonlinear modeling. In the linear simulation, we randomly generate 100 linear models with various dimensionality and noise statistics, as described in our prior work 6 . Briefly, the neural and behavior dimensions are selected from 5 ≤ n y , n z ≤ 10 randomly with uniform probability. The state dimension is selected as n x = 16, of which n 1 = 4 latent state dimensions are selected to drive behavior. Eigenvalues of the state transition matrix are selected randomly as complex conjugate pairs with uniform probability within the unit disk. Each element in the behavior and neural readout matrices is generated as a random Gaussian variable. State and neural observation noise covariances are generated as random positive definite matrices and scaled randomly with a number between 0.003 and 0.3 or between 0.01 and 100, respectively, to obtain a wide range of relative noises across random models. A separate random linear state-space model with four latent state dimensions is generated to produce the behavior readout noise 𝜖 k , representing the behavior dynamics that are not encoded in the recorded neural activity. Finally, the behavior readout matrix is scaled to set the ratio of the signal standard deviation to noise standard deviation in each behavior dimension to a random number from 0.5 to 50. We perform model learning and evaluation with twofold cross-validation (Extended Data Fig. 1 ).
In the nonlinear simulations that are used to validate both DPAD and the hypothesis testing procedure it enables to find the origin of nonlinearity, we start by generating 20 random linear models ( n y = n z = 1) either with n x = n z = n y (Extended Data Fig. 2 ) or n x = 2 latent states, only one of which drives behavior (Supplementary Fig. 2 ). We then introduce nonlinearity in one of the four model parameters (that is, A ′, K , C y or C z ) by replacing that parameter with a nonlinear trigonometric function, such that roughly one period of the trigonometric function is visited by the model (while keeping the rest of the parameters linear). To do this, we first scale each latent state in the initial random linear model to find a similarity transform for it where the latent state has a 95% confidence interval range of 2 π . We then add a sine function to the original parameter that is to be changed to nonlinear and scale the amplitude of the sine such that its output reaches roughly 0.25 of the range of the outputs from the original linear parameter. This was done to reduce the chance of generating unrealistic unstable nonlinear models that produce outputs with infinite energy, which is likely when A ′ is nonlinear. Changing one parameter to nonlinear can change the range of the statistics of the latent states in the model; thus, we generate some simulated data from the model and redo the scaling of the nonlinearity until ratio conditions are met.
To generate data from any nonlinear model in Eq. ( 1 ), we first generate a neural noise time series e k based on its covariance ∑ e in the model and initialize the state as x 0 = 0. We then iteratively apply the second and first lines of Eq. ( 1 ) to get the simulated neural activity y k from line 2 and then the next state \({x}_{k+1}\) from line 1, respectively. Finally, once the state time series is produced, we generate a behavior noise time series 𝜖 k based on its covariance ∑ 𝜖 in the model and apply the third line of Eq. ( 1 ) to get the simulated behavior z k . Similar to linear simulations, we perform the modeling and evaluation of nonlinear simulations with twofold cross-validation (Extended Data Fig. 2 and Supplementary Fig. 2 ).
Neural datasets and behavioral tasks
We evaluate DPAD in five datasets with different behavioral tasks, brain regions and neural recording modalities to show the generality of our conclusions. For each dataset, all animal procedures were performed in compliance with the National Research Council Guide for Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee at the respective institution, namely New York University (datasets 1 and 2) 6 , 45 , 46 , Northwestern University (datasets 3 and 5) 47 , 48 , 54 and University of California San Francisco (dataset 4) 21 , 49 .
Across all four main datasets (datasets 1 to 4), the spiking activity was binned with 10-ms nonoverlapping bins, smoothed with a Gaussian kernel with standard deviation of 50 ms (refs. 6 , 14 , 34 , 103 , 104 ) and downsampled to 50 ms to be used as the neural signal to be modeled. The behavior time series was also downsampled to a matching 50 ms before modeling. In the three datasets where LFP activity was also available, we also studied two types of features extracted from LFP. As the first LFP feature, we considered raw LFP activity itself, which was high-pass filtered above 0.5 Hz to remove the baseline, low-pass filtered below 10 Hz (that is, antialiasing) and downsampled to the behavior sampling rate of a 50-ms time step (that is, 20 Hz). Note that in the context of the motor cortex, low-pass-filtered raw LFP is also referred to as the local motor potential 50 , 51 , 52 , 105 , 106 and has been used to decode behavior 6 , 50 , 51 , 52 , 53 , 105 , 106 , 107 . As the second feature, we computed the LFP log-powers 5 , 6 , 7 , 40 , 77 , 79 , 106 , 108 , 109 in eight standard frequency bands (delta: 0.1–4 Hz, theta: 4–8 Hz, alpha: 8–12 Hz, low beta: 12–24 Hz, mid-beta: 24–34 Hz, high beta: 34–55 Hz, low gamma: 65–95 Hz and high gamma: 130–170 Hz) in sliding 300-ms windows at a time step of 50 ms using Welch’s method (using eight subwindows with 50% overlap) 6 . The median analyzed data length for each session across the datasets ranged from 4.6 to 9.9 min.
First dataset: 3D reaches to random targets
In the first dataset, the animal (named J) performed reaches to a target randomly positioned in 3D space within the reach of the animal, grasped the target and returned its hand to resting position 6 , 45 . Kinematic data were acquired using the Cortex software package (version 5.3) to track retroreflective markers in 3D (Motion Analysis) 6 , 45 . Joint angles were solved from the 3D marker data using a Rhesus macaque musculoskeletal model via the SIMM toolkit (version 4.0, MusculoGraphics) 6 , 45 . Angles of 27 joints in the shoulder, elbow, wrist and fingers in the active hand (right hand) were taken as the behavior signal 6 , 45 . Neural activity was recorded with a 137-electrode microdrive (Gray Matter Research), of which 28 electrodes were in the contralateral primary motor cortex M1. The multiunit spiking activity in these M1 electrodes was used as the neural signal. For LFP analyses, LFP features were also extracted from the same M1 electrodes. We analyzed the data from seven recording sessions.
To visualize the low-dimensional latent state trajectories for each behavioral condition (Extended Data Fig. 6 ), we determined the periods of reach and return movements in the data (Fig. 7a ), resampled them to have similar number of time samples and averaged the latent states across those resampled trials. Given the redundancy in latent descriptions (that is, any scaling, rotation and so on on the latent states still gives an equivalent model), before averaging trials across cross-validation folds and sessions, we devised the following procedure to standardize the latent states for each fold in the case of 2D latent states (Extended Data Fig. 6 ). (1) We z score all state dimensions to have zero mean and unit variance. (2) We rotate the 2D latent states such that the average 2D state trajectory for the first condition (here, the reach epochs) starts from an angle of 0. (3) We estimate the direction of the rotation for the average 2D state trajectory of the first condition, and if it is not counterclockwise, we multiply the second state dimension by –1 to make it so. Note that in each step, the same mapping is applied to the latent states during the whole test data, regardless of condition, so this procedure does not alter the relative differences in the state trajectory across different conditions. The procedure also does not change the learned model and simply corresponds to a similarity transform that changes the basis of the model. This procedure only removes the redundancies for describing a 2D latent state-space model and standardizes the extracted latent states so that trials across different test sets can be averaged together.
Second dataset: saccadic eye movements
In the second dataset, the animal (named A) performed saccadic eye movements to one of eight targets on a display 6 , 46 . The visual stimuli in the task with saccadic eye movements were controlled via custom LabVIEW (version 9.0, National Instruments) software executed on a real-time embedded system (NI PXI-8184, National Instruments) 46 . The 2D position of the eye was tracked and taken as the behavior signal. Neural activity was recorded with a 32-electrode microdrive (Gray Matter Research) covering the prefrontal cortex 6 , 46 . Single-unit activity from these electrodes, ranging from 34 to 43 units across different recording sessions, was used as the neural signal. For LFP analyses, LFP features were also extracted from the same 32 electrodes. We analyzed the data from the first 7 days of recordings. We only included data from successful trials where the animal performed the task correctly by making a saccadic eye movement to the specified target. To visualize the low-dimensional latent state trajectories for each behavioral condition (Extended Data Fig. 6 ), we grouped the trials based on their target position. Standardization across folds before averaging was done as in the first dataset.
Third dataset: sequential reaches with a 2D cursor controlled with a manipulandum
In the third dataset, which was collected and made publicly available by the laboratory of L. E. Miller 47 , 48 , the animal (named T) controlled a cursor on a 2D screen using a manipulandum and performed a sequential reach task 47 , 48 . The 2D cursor position and velocity were taken as the behavior signal. Neural activity was recorded using a 100-electrode microelectrode array (Blackrock Microsystems) in the dorsal premotor cortex 47 , 48 . Single-unit activity, recorded from 37 to 49 units across recording sessions, was used as the neural signal. This dataset did not include any LFP recordings, so LFP features could not be considered. We analyzed the data from all three recording sessions. To visualize the low-dimensional latent state trajectories for each behavioral condition (Extended Data Fig. 6 ), we grouped the trials into eight different conditions based on the angle of the direction of movement (that is, end position minus starting position) during the trial, with each condition covering movement directions within a 45° (that is, 360/8) range. Standardization across folds before averaging was performed as in the first dataset.
Fourth dataset: virtual reality random reaches with a 2D cursor controlled with the fingertip
In the fourth dataset, which was collected and made publicly available by the laboratory of P. N. Sabes 49 , the animal (named I) controlled a cursor based on the fingertip position on a 2D surface within a 3D virtual reality environment 21 , 49 . The 2D cursor position and velocity were taken as the behavior signal. Neural activity was recorded with a 96-electrode microelectrode array (Blackrock Microsystems) 21 , 49 covering M1. We selected a random subset of 32 of these electrodes, which had 77 to 99 single units across the recording sessions, as the neural signal. LFP features were also extracted from the same 32 electrodes. We analyzed the data for the first seven sessions for which the wideband activity was also available (sessions 20160622/01 to 20160921/01). Grouping into conditions for visualization of low-dimensional latent state trajectories (Extended Data Fig. 6 ) was done as in the third dataset. Standardization across folds before averaging was done as in the first dataset.
Fifth dataset: center-out cursor control reaching task
In the fifth dataset, which was collected and made publicly available by the laboratory of L. E. Miller 54 , the animal (named H) controlled a cursor on a 2D screen using a manipulandum and performed reaches from a center point to one of eight peripheral targets (Fig. 4i ). The 2D cursor position was taken as the behavior signal. Neural activity was recorded with a 96-electrode microelectrode array (Blackrock Microsystems) covering area 2 of the somatosensory cortex 54 . Preprocessing for this dataset was done as in ref. 36 . Specifically, the spiking activity was binned with 1-ms nonoverlapping bins and smoothed with a Gaussian kernel with a standard deviation of 40 ms (ref. 110 ), with the behavior also being sampled with the same 1-ms sampling rate. Trials were also aligned as in the same prior work 110 with data from –100 to 500 ms around movement onset of each trial being used for modeling 36 .
Additional details for baseline methods
For the fifth dataset, which has been analyzed in ref. 36 and introduces CEBRA, we used the exact same CEBRA hyperparameters as those reported in ref. 36 (Fig. 4i,j ). For each of the other four datasets (Fig. 4a–h ), when learning a CEBRA-Behavior or CEBRA-Time model for each session, fold and latent dimension, we also performed an extensive search over CEBRA hyperparameters and picked the best value with the same inner cross-validation approach as we use for the automatic selection of nonlinearities in DPAD. We considered 30 different sets of hyperparameters: 3 options for the ‘time-offset’ hyperparameter (1, 2 or 10) and 10 options for the ‘temperature’ hyperparameter (from 0.0001 to 0.01), which were designed to include all sets of hyperparameters reported for primate data in ref. 36 . We swept the CEBRA latent dimension over the same values as DPAD, that is, powers of 2 up to 128. In all cases, we used a k -nearest neighbors regression to map the CEBRA-extracted latent embeddings to behavior and neural data as done in ref. 36 because CEBRA itself does not learn a reconstruction model 36 (Extended Data Table 1 ).
It is important to note that CEBRA and DPAD have fundamentally different architectures and goals (Extended Data Table 1 ). CEBRA uses a small ten-sample window (when ‘model_architecture’ is ‘offset10-model’) around each datapoint to extract a latent embedding via a series of convolutions. By contrast, DPAD learns a dynamical model that recursively aggregates all past neural data to extract an embedding. Also, in contrast to CEBRA-Behavior, DPAD’s embedding includes and dissociates both behaviorally relevant neural dimensions and other neural dimensions to predict not only the behavior but also the neural data well. Finally, CEBRA does not automatically map its latent embeddings back to neural data or to behavior during learning but does so post hoc, whereas DPAD learns these mappings for all its latent states. Given these differences, several use-cases of DPAD are not targeted by CEBRA, including explicit dynamical modeling of neural–behavioral data (use-case 1), flexible nonlinearity, hypothesis testing regarding the origin of nonlinearity (use-case 4) and forecasting.
We used the Wilcoxon signed-rank test for all paired statistical tests.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Three of the datasets used in this work are publicly available 47 , 48 , 49 , 54 . The other two datasets used to support the results are available upon reasonable request from the corresponding author. Source data are provided with this paper.
Code availability
The code for DPAD is available at https://github.com/ShanechiLab/DPAD .
Cunningham, J. P. & Yu, B. M. Dimensionality reduction for large-scale neural recordings. Nat. Neurosci. 17 , 1500–1509 (2014).
Article CAS PubMed PubMed Central Google Scholar
Macke, J. H. et al. Empirical models of spiking in neural populations. In Advances in Neural Information Processing Systems 24 (eds. Shawe-Taylor, J., Zemel, R. S., Bartlett, P. L., Pereira, F. & Weinberger, K. Q.) 1350–1358 (Curran Associates, 2011).
Kao, J. C. et al. Single-trial dynamics of motor cortex and their applications to brain–machine interfaces. Nat. Commun. 6 , 7759 (2015).
Article CAS PubMed Google Scholar
Bondanelli, G., Deneux, T., Bathellier, B. & Ostojic, S. Network dynamics underlying OFF responses in the auditory cortex. eLife 10 , e53151 (2021).
Abbaspourazad, H., Choudhury, M., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale low-dimensional motor cortical state dynamics predict naturalistic reach-and-grasp behavior. Nat. Commun. 12 , 607 (2021).
Sani, O. G., Abbaspourazad, H., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification. Nat. Neurosci. 24 , 140–149 (2021).
Yang, Y. et al. Modelling and prediction of the dynamic responses of large-scale brain networks during direct electrical stimulation. Nat. Biomed. Eng. 5 , 324–345 (2021).
Article PubMed Google Scholar
Durstewitz, D. A state space approach for piecewise-linear recurrent neural networks for identifying computational dynamics from neural measurements. PLoS Comput. Biol. 13 , e1005542 (2017).
Article PubMed PubMed Central Google Scholar
Balzani, E., Noel, J.-P. G., Herrero-Vidal, P., Angelaki, D. E. & Savin, C. A probabilistic framework for task-aligned intra- and inter-area neural manifold estimation. In International Conference on Learning Representations https://openreview.net/pdf?id=kt-dcBQcSA (ICLR, 2023).
Petreska, B. et al. Dynamical segmentation of single trials from population neural data. In Advances in Neural Information Processing Systems 24 (eds. Shawe-Taylor, J., Zemel, R. S., Bartlett, P. L., Pereira, F. & Weinberger, K. Q.) 756–764 (Curran Associates, 2011).
Zoltowski, D., Pillow, J. & Linderman, S. A general recurrent state space framework for modeling neural dynamics during decision-making. In Proc. 37th International Conference on Machine Learning (eds. Daumé, H. & Singh, A.) 11680–11691 (PMLR, 2020).
Song, C. Y., Hsieh, H.-L., Pesaran, B. & Shanechi, M. M. Modeling and inference methods for switching regime-dependent dynamical systems with multiscale neural observations. J. Neural Eng. 19 , 066019 (2022).
Article Google Scholar
Song, C. Y. & Shanechi, M. M. Unsupervised learning of stationary and switching dynamical system models from Poisson observations. J. Neural Eng. 20 , 066029 (2023).
Article PubMed Central Google Scholar
Yu, B. M. et al. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. J. Neurophysiol. 102 , 614–635 (2009).
Wu, A., Roy, N. A., Keeley, S. & Pillow, J. W. Gaussian process based nonlinear latent structure discovery in multivariate spike train data. Adv. Neural Inf. Process. Syst. 30 , 3496–3505 (2017).
PubMed PubMed Central Google Scholar
Pandarinath, C. et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nat. Methods 15 , 805–815 (2018).
Rutten, V., Bernacchia, A., Sahani, M. & Hennequin, G. Non-reversible Gaussian processes for identifying latent dynamical structure in neural data. Adv. Neural Inf. Process. Syst. 33 , 9622–9632 (2020).
Google Scholar
Hurwitz, C. et al. Targeted neural dynamical modeling. In Proc. 35th International Conference on Neural Information Processing Systems (eds. Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S. & Wortman Vaughan, J.) 29379–29392 (Curran Associates, 2021).
Kim, T. D., Luo, T. Z., Pillow, J. W. & Brody, C. Inferring latent dynamics underlying neural population activity via neural differential equations. In Proc. 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) 5551–5561 (PMLR, 2021).
Sussillo, D., Stavisky, S. D., Kao, J. C., Ryu, S. I. & Shenoy, K. V. Making brain–machine interfaces robust to future neural variability. Nat. Commun. 7 , 13749 (2016).
Makin, J. G., O’Doherty, J. E., Cardoso, M. M. B. & Sabes, P. N. Superior arm-movement decoding from cortex with a new, unsupervised-learning algorithm. J. Neural Eng. 15 , 026010 (2018).
Naufel, S., Glaser, J. I., Kording, K. P., Perreault, E. J. & Miller, L. E. A muscle-activity-dependent gain between motor cortex and EMG. J. Neurophysiol. 121 , 61–73 (2019).
Glaser, J. I. et al. Machine learning for neural decoding. eNeuro 7 , ENEURO.0506-19.2020 (2020).
Kim, M.-K., Sohn, J.-W. & Kim, S.-P. Decoding kinematic information from primary motor cortex ensemble activities using a deep canonical correlation analysis. Front. Neurosci . 14 , 509364 (2020).
Vyas, S., Golub, M. D., Sussillo, D. & Shenoy, K. V. Computation through neural population dynamics. Annu. Rev. Neurosci. 43 , 249–275 (2020).
Willett, F. R., Avansino, D. T., Hochberg, L. R., Henderson, J. M. & Shenoy, K. V. High-performance brain-to-text communication via handwriting. Nature 593 , 249–254 (2021).
Shi, Y.-L., Steinmetz, N. A., Moore, T., Boahen, K. & Engel, T. A. Cortical state dynamics and selective attention define the spatial pattern of correlated variability in neocortex. Nat. Commun. 13 , 44 (2022).
Otazu, G. H., Tai, L.-H., Yang, Y. & Zador, A. M. Engaging in an auditory task suppresses responses in auditory cortex. Nat. Neurosci. 12 , 646–654 (2009).
Goris, R. L. T., Movshon, J. A. & Simoncelli, E. P. Partitioning neuronal variability. Nat. Neurosci. 17 , 858–865 (2014).
Sadtler, P. T. et al. Neural constraints on learning. Nature 512 , 423–426 (2014).
Allen, W. E. et al. Thirst regulates motivated behavior through modulation of brainwide neural population dynamics. Science 364 , eaav3932 (2019).
Article CAS Google Scholar
Engel, T. A. & Steinmetz, N. A. New perspectives on dimensionality and variability from large-scale cortical dynamics. Curr. Opin. Neurobiol. 58 , 181–190 (2019).
Stringer, C. et al. Spontaneous behaviors drive multidimensional, brainwide activity. Science 364 , eaav7893 (2019).
Kobak, D. et al. Demixed principal component analysis of neural population data. eLife 5 , e10989 (2016).
Zhou, D. & Wei, X.-X. Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE. In Advances in Neural Information Processing Systems 33 (eds. Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F. & Lin, H.) 7234–7247 (Curran Associates, 2020).
Schneider, S., Lee, J. H. & Mathis, M. W. Learnable latent embeddings for joint behavioural and neural analysis. Nature 617 , 360–368 (2023).
Hernandez, D. et al. Nonlinear evolution via spatially-dependent linear dynamics for electrophysiology and calcium data. NBDT https://nbdt.scholasticahq.com/article/13476-nonlinear-evolution-via-spatially-dependent-linear-dynamics-for-electrophysiology-and-calcium-data (2020).
Gao, Y., Archer, E. W., Paninski, L. & Cunningham, J. P. Linear dynamical neural population models through nonlinear embeddings. In Advances in Neural Information Processing Systems 29 (eds. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R.) 163–171 (Curran Associates, 2016).
Aoi, M. C., Mante, V. & Pillow, J. W. Prefrontal cortex exhibits multidimensional dynamic encoding during decision-making. Nat. Neurosci. 23 , 1410–1420 (2020).
Sani, O. G. et al. Mood variations decoded from multi-site intracranial human brain activity. Nat. Biotechnol. 36 , 954–961 (2018).
Shanechi, M. M. Brain–machine interfaces from motor to mood. Nat. Neurosci. 22 , 1554–1564 (2019).
Oganesian, L. L. & Shanechi, M. M. Brain–computer interfaces for neuropsychiatric disorders. Nat. Rev. Bioeng. 2 , 653–670 (2024).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2017).
Sani, O. G., Pesaran, B. & Shanechi, M. M. Where is all the nonlinearity: flexible nonlinear modeling of behaviorally relevant neural dynamics using recurrent neural networks. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2021.09.03.458628v1 (2021).
Wong, Y. T., Putrino, D., Weiss, A. & Pesaran, B. Utilizing movement synergies to improve decoding performance for a brain machine interface. In 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 289–292 (IEEE, 2013).
Markowitz, D. A., Curtis, C. E. & Pesaran, B. Multiple component networks support working memory in prefrontal cortex. Proc. Natl. Acad. Sci. USA 112 , 11084–11089 (2015).
Perich, M. G., Lawlor, P. N., Kording, K. P. & Miller, L. E. Extracellular neural recordings from macaque primary and dorsal premotor motor cortex during a sequential reaching task. CRCNS.org https://doi.org/10.6080/K0FT8J72 (2018).
Lawlor, P. N., Perich, M. G., Miller, L. E. & Kording, K. P. Linear–nonlinear-time-warp-Poisson models of neural activity. J. Comput. Neurosci. 45 , 173–191 (2018).
O’Doherty, J. E., Cardoso, M. M. B., Makin, J. G. & Sabes, P. N. Nonhuman primate reaching with multichannel sensorimotor cortex electrophysiology. Zenodo https://doi.org/10.5281/zenodo.3854034 (2020).
Schalk, G. et al. Decoding two-dimensional movement trajectories using electrocorticographic signals in humans. J. Neural Eng. 4 , 264–275 (2007).
Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E. & Slutzky, M. W. Local field potentials allow accurate decoding of muscle activity. J. Neurophysiol. 108 , 18–24 (2012).
Stavisky, S. D., Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V. A high performing brain–machine interface driven by low-frequency local field potentials alone and together with spikes. J. Neural Eng. 12 , 036009 (2015).
Bansal, A. K., Truccolo, W., Vargas-Irwin, C. E. & Donoghue, J. P. Decoding 3D reach and grasp from hybrid signals in motor and premotor cortices: spikes, multiunit activity, and local field potentials. J. Neurophysiol. 107 , 1337–1355 (2011).
Chowdhury, R. H., Glaser, J. I. & Miller, L. E. Area 2 of primary somatosensory cortex encodes kinematics of the whole arm. eLife 9 , e48198 (2020).
Pesaran, B. et al. Investigating large-scale brain dynamics using field potential recordings: analysis and interpretation. Nat. Neurosci. 21 , 903–919 (2018).
Sani, O. G. Modeling and Control of Behaviorally Relevant Brain States . PhD Thesis, University of Southern California (2020).
Büttner, U. & Büttner-Ennever, J. A. Present concepts of oculomotor organization. In Progress in Brain Research (ed. Büttner-Ennever, J. A.) 1–42 (Elsevier, 2006).
Lemon, R. N. Descending pathways in motor control. Annu. Rev. Neurosci. 31 , 195–218 (2008).
Ebbesen, C. L. & Brecht, M. Motor cortex—to act or not to act? Nat. Rev. Neurosci. 18 , 694–705 (2017).
Wise, S. P. & Murray, E. A. Arbitrary associations between antecedents and actions. Trends Neurosci . 23 , 271–276 (2000).
Abbaspourazad, H., Erturk, E., Pesaran, B. & Shanechi, M. M. Dynamical flexible inference of nonlinear latent factors and structures in neural population activity. Nat. Biomed. Eng . 8 , 85–108 (2024).
Shanechi, M. M. et al. Rapid control and feedback rates enhance neuroprosthetic control. Nat. Commun. 8 , 13825 (2017).
Nason, S. R. et al. A low-power band of neuronal spiking activity dominated by local single units improves the performance of brain–machine interfaces. Nat. Biomed. Eng. 4 , 973–983 (2020).
Williams, A. H. et al. Discovering precise temporal patterns in large-scale neural recordings through robust and interpretable time warping. Neuron 105 , 246–259 (2020).
Walker, E. Y. et al. Inception loops discover what excites neurons most using deep predictive models. Nat. Neurosci. 22 , 2060–2065 (2019).
Vahidi, P., Sani, O. G. & Shanechi, M. M. Modeling and dissociation of intrinsic and input-driven neural population dynamics underlying behavior. Proc. Natl. Acad. Sci. USA 121 , e2212887121 (2024).
Van Overschee, P. & De Moor, B. Subspace Identification for Linear Systems . (Springer, 1996).
Katayama, T. Subspace Methods for System Identification . (Springer Science & Business Media, 2006).
Friedman, J., Hastie, T. & Tibshirani, R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction . (Springer, 2001).
Wu, W., Kulkarni, J. E., Hatsopoulos, N. G. & Paninski, L. Neural decoding of hand motion using a linear state-space model with hidden states. IEEE Trans. Neural Syst. Rehabil. Eng. 17 , 370–378 (2009).
Vargas-Irwin, C. E. et al. Decoding complete reach and grasp actions from local primary motor cortex populations. J. Neurosci. 30 , 9659–9669 (2010).
Buesing, L., Macke, J. H. & Sahani, M. Spectral learning of linear dynamics from generalised-linear observations with application to neural population data. In Advances in Neural Information Processing Systems 25 (eds. Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.) 1682–1690 (Curran Associates, 2012).
Buesing, L., Macke, J. H. & Sahani, M. Learning stable, regularised latent models of neural population dynamics. Netw. Comput. Neural Syst. 23 , 24–47 (2012).
Semedo, J., Zandvakili, A., Kohn, A., Machens, C. K. & Yu, B. M. Extracting latent structure from multiple interacting neural populations. In Advances in Neural Information Processing Systems 27 (eds. Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 2942–2950 (Curran Associates, 2014).
Gao, Y., Busing, L., Shenoy, K. V. & Cunningham, J. P. High-dimensional neural spike train analysis with generalized count linear dynamical systems. In Advances in Neural Information Processing Systems 28 (eds. Cortes, C., Lawrence, N., Lee, D., Sugiyama, M. & Garnett, R.) 2044–2052 (Curran Associates, 2015).
Aghagolzadeh, M. & Truccolo, W. Inference and decoding of motor cortex low-dimensional dynamics via latent state-space models. IEEE Trans. Neural Syst. Rehabil. Eng. 24 , 272–282 (2016).
Hsieh, H.-L., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale modeling and decoding algorithms for spike-field activity. J. Neural Eng. 16 , 016018 (2018).
Abbaspourazad, H., Hsieh, H. & Shanechi, M. M. A multiscale dynamical modeling and identification framework for spike-field activity. IEEE Trans. Neural Syst. Rehabil. Eng. 27 , 1128–1138 (2019).
Yang, Y., Sani, O. G., Chang, E. F. & Shanechi, M. M. Dynamic network modeling and dimensionality reduction for human ECoG activity. J. Neural Eng. 16 , 056014 (2019).
Ahmadipour, P., Yang, Y., Chang, E. F. & Shanechi, M. M. Adaptive tracking of human ECoG network dynamics. J. Neural Eng. 18 , 016011 (2020).
Ahmadipour, P., Sani, O. G., Pesaran, B. & Shanechi, M. M. Multimodal subspace identification for modeling discrete-continuous spiking and field potential population activity. J. Neural Eng. 21 , 026001 (2024).
Zhao, Y. & Park, I. M. Variational latent Gaussian process for recovering single-trial dynamics from population spike trains. Neural Comput. 29 , 1293–1316 (2017).
Yu, B. M. et al. Extracting dynamical structure embedded in neural activity. In Advances in Neural Information Processing Systems 18 (Weiss, Y., Schölkopf, B. & Platt, J.) 1545–1552 (MIT Press, 2006).
Xie, Z., Schwartz, O. & Prasad, A. Decoding of finger trajectory from ECoG using deep learning. J. Neural Eng. 15 , 036009 (2018).
Anumanchipalli, G. K., Chartier, J. & Chang, E. F. Speech synthesis from neural decoding of spoken sentences. Nature 568 , 493 (2019).
Makin, J. G., Moses, D. A. & Chang, E. F. Machine translation of cortical activity to text with an encoder–decoder framework. Nat. Neurosci. 23 , 575–582 (2020).
She, Q. & Wu, A. Neural dynamics discovery via Gaussian process recurrent neural networks. In Proceedings of The 35th Uncertainty in Artificial Intelligence Conferenc e (eds. Adams, Ryan P. & Gogate, Vibhav) 454–464 (PMLR, 2020).
Moses, D. A. et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N. Engl. J. Med. 385 , 217–227 (2021).
Schimel, M., Kao, T.-C., Jensen, K. T. & Hennequin, G. iLQR-VAE: control-based learning of input-driven dynamics with applications to neural data. In International Conference on Learning Representations (ICLR, 2022).
Zhao, Y., Nassar, J., Jordan, I., Bugallo, M. & Park, I. M. Streaming variational monte carlo. IEEE Trans. Pattern Anal. Mach. Intell. 45 , 1150–1161 (2023).
Richards, B. A. et al. A deep learning framework for neuroscience. Nat. Neurosci. 22 , 1761–1770 (2019).
Livezey, J. A. & Glaser, J. I. Deep learning approaches for neural decoding across architectures and recording modalities. Brief. Bioinform. 22 , 1577–1591 (2021).
Saxe, A., Nelli, S. & Summerfield, C. If deep learning is the answer, what is the question? Nat. Rev. Neurosci. 22 , 55–67 (2021).
Yang, G. R. & Wang, X.-J. Artificial neural networks for neuroscientists: a primer. Neuron 107 , 1048–1070 (2020).
Keshtkaran, M. R. et al. A large-scale neural network training framework for generalized estimation of single-trial population dynamics. Nat. Methods 19 , 1572–1577 (2022).
Archer, E., Park, I. M., Buesing, L., Cunningham, J. & Paninski, L. Black box variational inference for state space models. Preprint at https://doi.org/10.48550/arXiv.1511.07367 (2015).
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Lu, Z. et al. The expressive power of neural networks: a view from the width. In Proc. 31st International Conference on Neural Information Processing Systems (eds. von Luxburg, U., Guyon, I., Bengio, S., Wallach, H. & Fergus R.) 6232–6240 (Curran Associates, 2017).
Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 2 , 359–366 (1989).
Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2 , 303–314 (1989).
Funahashi, K.-I. On the approximate realization of continuous mappings by neural networks. Neural Netw. 2 , 183–192 (1989).
Schäfer, A. M. & Zimmermann, H. G. Recurrent neural networks are universal approximators. In Artificial Neural Networks—ICANN 2006 (eds. Kollias, S. D., Stafylopatis, A., Duch, W. & Oja, E.) 632–640 (Springer, 2006).
Williams, A. H. et al. Unsupervised discovery of demixed, low-dimensional neural dynamics across multiple timescales through tensor component analysis. Neuron 98 , 1099–1115 (2018).
Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A. & Miller, L. E. Long-term stability of cortical population dynamics underlying consistent behavior. Nat. Neurosci. 23 , 260–270 (2020).
Flint, R. D., Wright, Z. A., Scheid, M. R. & Slutzky, M. W. Long term, stable brain machine interface performance using local field potentials and multiunit spikes. J. Neural Eng. 10 , 056005 (2013).
Bundy, D. T., Pahwa, M., Szrama, N. & Leuthardt, E. C. Decoding three-dimensional reaching movements using electrocorticographic signals in humans. J. Neural Eng. 13 , 026021 (2016).
Mehring, C. et al. Inference of hand movements from local field potentials in monkey motor cortex. Nat. Neurosci. 6 , 1253–1254 (2003).
Chestek, C. A. et al. Hand posture classification using electrocorticography signals in the gamma band over human sensorimotor brain areas. J. Neural Eng. 10 , 026002 (2013).
Hsieh, H.-L. & Shanechi, M. M. Optimizing the learning rate for adaptive estimation of neural encoding models. PLoS Comput. Biol. 14 , e1006168 (2018).
Pei, F. et al. Neural Latents Benchmark '21: Evaluating latent variable models of neural population activity. In Advances in Neural Information Processing Systems (NeurIPS), Track on Datasets and Benchmarks https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/979d472a84804b9f647bc185a877a8b5-Paper-round2.pdf (2021).
Download references
Acknowledgements
This work was supported, in part, by the following organizations and grants: the Office of Naval Research (ONR) Young Investigator Program under contract N00014-19-1-2128, National Institutes of Health (NIH) Director’s New Innovator Award DP2-MH126378, NIH R01MH123770, NIH BRAIN Initiative R61MH135407 and the Army Research Office (ARO) under contract W911NF-16-1-0368 as part of the collaboration between the US DOD, the UK MOD and the UK Engineering and Physical Research Council (EPSRC) under the Multidisciplinary University Research Initiative (MURI) (to M.M.S.) and a University of Southern California Annenberg Fellowship (to O.G.S.).
Author information
Authors and affiliations.
Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
Omid G. Sani & Maryam M. Shanechi
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Bijan Pesaran
Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USA
Maryam M. Shanechi
Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USA
Alfred E. Mann Department of Biomedical Engineering, University of Southern California, Los Angeles, CA, USA
You can also search for this author in PubMed Google Scholar
Contributions
O.G.S. and M.M.S. conceived the study, developed the DPAD algorithm and wrote the manuscript, and O.G.S. performed all the analyses. B.P. designed and performed the experiments for two of the NHP datasets and provided feedback on the manuscript. M.M.S. supervised the work.
Corresponding author
Correspondence to Maryam M. Shanechi .
Ethics declarations
Competing interests.
University of Southern California has a patent related to modeling and decoding of shared dynamics between signals in which M.M.S. and O.G.S. are inventors. The other author declares no competing interests.
Peer review
Peer review information.
Nature Neuroscience thanks Il Memming Park and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended data fig. 1 dpad dissociates and prioritizes the behaviorally relevant neural dynamics while also learning the other neural dynamics in numerical simulations of linear models..
a , Example data generated from one of 100 random models ( Methods ). These random models do not emulate real data but for terminological consistency, we still refer to the primary signal (that is, y k in Eq. ( 1 )) as the ‘neural activity’ and to the secondary signal (that is, z k in Eq. ( 1 )) as the ‘behavior’. b , Cross-validated behavior decoding accuracy (correlation coefficient, CC) for each method as a function of the number of training samples when we use a state dimension equal to the total state dimension of the true model. The performance measures for each random model are normalized by their ideal values that were achieved by the true model itself. Performance for the true model is shown in black. Solid lines and shaded areas are defined as in Fig. 5b ( N = 100 random models). c , Same as b but when learned models have low-dimensional latent states with enough dimensions just for the behaviorally relevant latent states (that is, n x = n 1 ). d - e , Same as b - c showing the cross-validated normalized neural self-prediction accuracy. Linear NDM, which learns the parameters using a numerical optimization, performs similarly to a linear algebraic subspace-based implementation of linear NDM 67 , thus validating NDM’s numerical optimization implementation. Linear DPAD, just like PSID 6 , achieves almost ideal behavior decoding even with low-dimensional latent states ( c ); this shows that DPAD correctly dissociates and prioritizes behaviorally relevant dynamics, as opposed to aiming to simply explain the most neural variance as non-prioritized methods such as NDM do. For this reason, with a low-dimensional state, non-prioritized NDM methods can explain neural activity well ( e ) but prioritized methods can explain behavior much better ( c ). Nevertheless, using the second stage of PSID and the last two optimization steps in DPAD, these two prioritized techniques are still able to learn the overall neural dynamics accurately if state dimension is high enough ( d ). Overall, the performance of linear DPAD and PSID 6 are similar for the special case of linear modeling.
Extended Data Fig. 2 DPAD successfully identifies the origin of nonlinearity and learns it in numerical simulations.
DPAD can perform hypothesis testing regarding the origin of nonlinearity by considering both behavior decoding (vertical axis) and neural self-prediction (horizontal axis). a , True value for nonlinear neural input parameter K in an example random model with nonlinearity only in K and the nonlinear value that DPAD learned for this parameter when only K in the learned model was set to be nonlinear. The true and learned mappings match and almost exactly overlap. b , Behavior decoding and neural self-prediction accuracy achieved by DPAD models with different locations of nonlinearities. These accuracies are for data generated from 20 random models that only had nonlinearity in the neural input parameter K . Performance measures for each random model are normalized by their ideal values that were achieved by the true model itself. Pluses and whiskers are defined as in Fig. 3 ( N = 20 random models). c , d , Same as a , b for data simulated from models that only have nonlinearity in the recursion parameter A ′. e - f , Same as a , b for data simulated from models that only have nonlinearity in the neural readout parameter C y . g , h , Same as a , b for data simulated from models that only have nonlinearity in the behavior readout parameter C z . In each case ( b , d , f , h ), the nonlinearity option that reaches closest to the upper-rightmost corner of the plot, that is, has both the best behavior decoding and the best neural self-prediction, is chosen as the model that specifies the origin of nonlinearity. Regardless of the true location of nonlinearity ( b , d , f , h ), always the correct location (for example, K in b ) achieves the best performance overall compared with all other locations of nonlinearities. These results provide evidence that by fitting and comparing DPAD models with different nonlinearities, we can correctly find the origin of nonlinearity in simulated data.
Extended Data Fig. 3 Across spiking and LFP neural modalities, DPAD is on the best performance frontier for neural-behavioral prediction unlike LSTMs, which are fitted to explain neural data or behavioral data.
a , The 3D reach task. b , Cross-validated neural self-prediction accuracy achieved by each method versus the corresponding behavior decoding accuracy on the vertical axis. Latent state dimension for each method in each session and fold is chosen (among powers of 2 up to 128) as the smallest that reaches peak neural self-prediction in training data or reaches peak decoding in training data, whichever is larger ( Methods ). Pluses and whiskers are defined as in Fig. 3 ( N = 35 session-folds). Note that DPAD considers an LSTM as a special case ( Methods ). Nevertheless, results are also shown for LSTM networks fitted to decode behavior from neural activity (that is, RNN decoders in Extended Data Table 1 ) or to predict the next time step of neural activity (self-prediction). Also, note that LSTM for behavior decoding (denoted by H) and DPAD when only using the first two optimization steps (denoted by G) dedicate all their latent states to behavior prediction, whereas other methods dedicate some or all latent states to neural self-prediction. Compared with all methods including these LSTM networks, DPAD always reaches the best performance frontier for predicting the neural-behavioral data whereas LSTM does not; this is partly due to the four-step optimization algorithm in DPAD that allows for overall neural-behavioral description rather than one or the other, and that prioritizes the learning of the behaviorally relevant neural dynamics. c , Same as b for raw LFP activity ( N = 35 session-folds). d , Same as b for LFP band power activity ( N = 35 session-folds). e - h , Same as a - d for the second dataset, with saccadic eye movements ( N = 35 session-folds). i , j , Same as a and b for the third dataset, with sequential cursor reaches controlled via a 2D manipulandum ( N = 15 session-folds). k - n , Same as a - d for the fourth dataset, with random grid virtual reality cursor reaches controlled via fingertip position ( N = 35 session-folds). Results and conclusions are consistent across all datasets.
Extended Data Fig. 4 DPAD can also be used for multi-step-ahead forecasting of behavior.
a , The 3D reach task. b , Cross-validated behavior decoding accuracy for various numbers of steps into the future. For m -step-ahead prediction, behavior at time step k is predicted using neural activity up to time step k − m . All models are taken from Fig. 3 , without any retraining or finetuning, with m -step-ahead forecasting done by repeatedly ( m −1 times) passing the neural predictions of the model as its neural observation in the next time step ( Methods ). Solid lines and shaded areas are defined as in Fig. 5b ( N = 35 session-folds). Across the number of steps ahead, the statistical significance of a one-sided pairwise comparison between nonlinear DPAD vs nonlinear NDM is shown with the orange top horizontal line with p-value indicated by asterisks next to the line as defined in Fig. 2b (N = 35 session-folds). Similar pairwise comparison between nonlinear DPAD vs linear dynamical system (LDS) modeling is shown with the purple top horizontal line. c - d , Same as a - b for the second dataset, with saccadic eye movements ( N = session-folds). e - f , Same as a - b for the third dataset, with sequential cursor reaches controlled via a 2D manipulandum ( N = 15 session-folds). g - h , Same as a - b for the fourth dataset, with random grid virtual reality cursor reaches controlled via fingertip position ( N = 35 session-folds).
Extended Data Fig. 5 Neural self-prediction accuracy of nonlinear DPAD across recording electrodes for low-dimensional behaviorally relevant latent states.
a , The 3D reach task. b , Average neural self-prediction correlation coefficient (CC) achieved by nonlinear DPAD for analyzed smoothed spiking activity is shown for each recording electrode ( N = 35 session-folds; best nonlinearity for decoding). c , Same as b for modeling of raw LFP activity. d , Same as b for modeling of LFP band power activity. Here, prediction accuracy averaged across all 8 band powers ( Methods ) of a given recording electrode is shown for that electrode. e-h , Same a - d for the second dataset, with saccadic eye movements ( N = 35 session-folds). For datasets with single-unit activity ( Methods ), spiking self-prediction of each electrode is averaged across the units associated with that electrode. i - j , Same as a , b for the third dataset, with sequential cursor reaches controlled via a 2D manipulandum ( N = 15 session-folds). White areas are due to electrodes that did not have a neuron associated with them in the data. k - n , Same as a - d for the fourth dataset, with random grid virtual reality cursor reaches controlled via fingertip position ( N = 35 session-folds). For all results, the latent state dimension was 16, and all these dimensions were learned using the first optimization step (that is, n 1 = 16).
Extended Data Fig. 6 Nonlinear DPAD extracted distinct low dimensional latent states from neural activity for all datasets, which were more behaviorally relevant than those extracted using nonlinear NDM.
a , The 3D reach task. b , The latent state trajectory for 2D states extracted from spiking activity using nonlinear DPAD, averaged across all reach and return epochs across sessions and folds. Here only optimization steps 1-2 of DPAD are used to just extract 2D behaviorally relevant states. c , Same as b for 2D states extracted using nonlinear NDM (special case of using just DPAD optimization steps 3-4). d , Saccadic eye movement task. Trials are averaged depending on the eye movement direction. e , The latent state trajectory for 2D states extracted using DPAD (extracted using optimizations steps 1-2), averaged across all trials of the same movement direction condition across sessions and folds. f , Same as d for 2D states extracted using nonlinear NDM. g-i , Same as d - f for the third dataset, with sequential cursor reaches controlled via a 2D manipulandum. j - l , Same as d - f for the fourth dataset, with random grid virtual reality cursor reaches controlled via fingertip position. Overall, in each dataset, latent states extracted by DPAD were clearly different for different behavior conditions in that dataset ( b , e , h , k ), whereas NDM’s extracted latent states did not as clearly dissociate different conditions ( c , f , i , l ). Of note, in the first dataset, DPAD revealed latent states with rotational dynamics that reversed direction during reach versus return epochs, which is consistent with the behavior roughly reversing direction. In contrast, NDM’s latent states showed rotational dynamics that did not reverse direction, thus were less congruent with behavior. In this first dataset, in our earlier work 6 , we had compared PSID and a subspace-based linear NDM method and, similar to b and c here, had found that only PSID uncovers reverse-directional rotational patterns across reach and return movement conditions. These results thus also complement our prior work 6 by showing that even nonlinear NDM models may not uncover the distinct reverse-directional dynamics in this dataset, thus highlighting the need for dissociative and prioritized learning even in nonlinear modeling, as enabled by DPAD.
Extended Data Fig. 7 Neural self-prediction across latent state dimensions.
a , The 3D reach task. b , Cross-validated neural self-prediction accuracy (CC) achieved by variations of nonlinear and linear DPAD/NDM, for different latent state dimensions. Solid lines and shaded areas are defined as in Fig. 5b ( N = 35 session-folds). Across latent state dimensions, the statistical significance of a one-sided pairwise comparison between nonlinear DPAD/NDM (with best nonlinearity for self-prediction) vs linear DPAD/NDM is shown with a horizontal green/orange line with p-value indicated by asterisks next to the line as defined in Fig. 2b ( N = 35 session-folds). c , d , Same as a , b for the second dataset, with saccadic eye movements ( N = 35 session-folds). e , f , Same as a , b for the third dataset, with sequential cursor reaches controlled via a 2D manipulandum ( N = 15 session-folds). g , h Same as a , b for the fourth dataset, with random grid virtual reality cursor reaches controlled via fingertip position ( N = 35 session-folds). For all DPAD variations, the first 16 latent state dimensions are learned using the first two optimization steps and the remaining dimensions are learned using the last two optimization steps (that is, n 1 = 16). As expected, at low state dimensions, DPAD’s latent states achieve higher behavior decoding (Fig. 5 ) but lower neural self-prediction than NDM because DPAD prioritizes the behaviorally relevant neural dynamics in these dimensions. However, by increasing the state dimension and utilizing optimization steps 3-4, DPAD can reach similar neural self-prediction to NDM while doing better in terms of behavior decoding (Fig. 3 ). Also, for low dimensional latent states, nonlinear DPAD/NDM consistently result in significantly more accurate neural self-prediction than linear DPAD/NDM. For high enough state dimensions, linear DPAD/NDM eventually reach similar neural self-prediction accuracy to nonlinear DPAD/NDM. Given that NDM solely aims to optimize neural self-prediction (irrespective of the relevance of neural dynamics to behavior), the latter result suggests that the overall neural dynamics can be approximated with linear dynamical models but only with high-dimensional latent states. Note that in contrast to neural self-prediction, behavior decoding of nonlinear DPAD is higher than linear DPAD even at high state dimensions (Fig. 3 ).
Extended Data Fig. 8 DPAD accurately learns the mapping from neural activity to behavior dynamics in all datasets even if behavioral samples are intermittently available in the training data.
Nonlinear DPAD can perform accurately and better than linear DPAD even when as little as 20% of training behavior samples are kept. a , The 3D reach task. b , Examples are shown from one of the joints in the original behavior time series (light gray) and intermittently subsampled versions of it (cyan) where a subset of the time samples of the behavior time series are randomly chosen to be kept for use in training. In each subsampling, all dimensions of the behavior data are sampled together at the same time steps; this means that at any given time step, either all behavior dimensions are kept or all are dropped to emulate the realistic case with intermittent measurements. c , Cross-validated behavior decoding accuracy (CC) achieved by linear DPAD and by nonlinear DPAD with nonlinearity in the behavior readout parameter C z . For this nonlinear DPAD, we show the CC when trained with different percentage of behavior samples kept (that is, we emulate different rates of intermittent sampling). The state dimension in each session and fold is chosen (among powers of 2 up to 128) as the smallest that reaches peak decoding in training data. Bars, whiskers, dots, and asterisks are defined as in Fig. 2b ( N = 35 session-folds). d , e , Same as a , c for the second dataset, with saccadic eye movements ( N = 35 session-folds). f , g , Same as a , c for the third dataset, with sequential cursor reaches controlled via a 2D manipulandum ( N = 15 session-folds). h , i , Same as a , c for the fourth dataset, with random grid virtual reality cursor reaches controlled via fingertip position ( N = 35 session-folds). For all DPAD variations, the first 16 latent state dimensions are learned using the first two optimization steps and the remaining dimensions are learned using the last two optimization steps (that is, n 1 = 16).
Extended Data Fig. 9 Simulations suggest that DPAD may be applicable with sparse sampling of behavior, for example with behavior being a self-reported mood survey value collected once per day.
a , We simulated the application of decoding self-reported mood variations from neural signals 40 , 41 . Neural data is simulated based on linear models fitted to intracranial neural data recorded from epilepsy subjects. Each recorded region in each subject is simulated as a linear state-space model with a 3-dimensional latent state, with the same parameters as those fitted to neural recordings from that region. Simulated latent states from a subset of regions were linearly combined to generate a simulated mood signal (that is, biomarker). As the simulated models were linear, we used the linear versions of DPAD and NDM (NDM used the subspace identification method that we found does similarly to numerical optimization for linear models in Extended Data Fig. 1 ). We generated the equivalent of 3 weeks of intracranial recordings, which is on the order the time-duration of the real intracranial recordings. We then subsampled the simulated mood signal (behavior) to emulate intermittent behavioral measures such as mood surveys. b , Behavior decoding results in unseen simulated test data, across N = 87 simulated models, for different sampling rates of behavior in the training data. Box edges show the 25 th and 75 th percentiles, solid horizontal lines show the median, whiskers show the range of data, and dots show all data points ( N = 87 simulated models). Asterisks are defined as in Fig. 2b . DPAD consistently outperformed NDM regardless of how sparse behavior measures were, even when these measures were available just once per day ( P < 0.0005, one-sided signed-rank, N = 87).
Supplementary information
Supplementary information.
Supplementary Figs. 1–9 and Notes 1–4.
Reporting Summary
Source data figs. 2–7 and extended data figs. 3, 7 and 8.
Statistical source data.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
About this article
Cite this article.
Sani, O.G., Pesaran, B. & Shanechi, M.M. Dissociative and prioritized modeling of behaviorally relevant neural dynamics using recurrent neural networks. Nat Neurosci (2024). https://doi.org/10.1038/s41593-024-01731-2
Download citation
Received : 22 April 2023
Accepted : 17 July 2024
Published : 06 September 2024
DOI : https://doi.org/10.1038/s41593-024-01731-2
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
IMAGES
VIDEO
COMMENTS
Thesis: Present the status quo, the viewpoint that is currently accepted and widely held. Antithesis: Articulate the problems with the thesis. (Hegel also called this phase "the negative.") Synthesis: Share a new viewpoint (a modified thesis) that resolves the problems. Hegel's method focused less on the search for absolute truth and more ...
In CISC 497, the rationales must be backed up with facts found during research on the topic. For the presentations, the thesis/antithesis/synthesis structure may be divided between speakers. For instance, one person could present the thesis, one the antithesis, and one the synthesis. For the papers, all three must be included in each paper and ...
Thesis, Antithesis, Synthesis | Encyclopedia MDPI
This revised idea sometimes sparks another opposing idea, another synthesis, and so on… If you can show this pattern at work in your literature review, and, above all, if you can suggest a new synthesis of two opposing views, or demolish one of the opposing views, then you are almost certainly on the right track. Next topic: Step 1: Choose ...
What is a literature review? Thesis, antithesis and synthesis; 1. Choose your topic; 2. Collect relevant material; 3. Read/Skim articles; 4. Group articles by themes
Thesis, Antithesis, Synthesis Claudio Katz Loyola University Chicago, [email protected] This Book Review is brought to you for free and open access by the Faculty Publications at Loyola eCommons. It has been accepted for inclusion in Political Science: Faculty Publications and Other Works by an authorized administrator of Loyola eCommons.
9.6: Assignment- Writing the Antithesis Essay. Page ID. Steven D. Krause. Eastern Michigan University. Based on the most current and most recently revised version of your working thesis, write a brief essay where you identify, explain, and answer the antithesis to your position. Keep in mind that the main goal of this essay is to think about an ...
Hegel's dialectic is a philosophical theory developed by German philosopher Georg Wilhelm Friedrich Hegel in the early 19th century. It is based on the concept of thesis, antithesis and synthesis, which are steps in the process of progress. The thesis is an idea or statement that is the starting point of an argument.
How to Use Antithesis in Your Writing: Definition and Examples of Antithesis as a Literary Device. Written by MasterClass. Last updated: Sep 29, 2021 • 3 min read. The English language is full of literary devices that can enliven your writing. One tool used often in literature and politics is called antithesis. Articles. Videos.
9.5: Strategies for Answering Antithetical Arguments. It might not seem logical, but directly acknowledging and addressing positions that are different from the one you are holding in your research project can actually make your position stronger. When you take on the antithesis in your research project, it shows you have thought carefully ...
The first one questions the first premise of the working thesis about the "threat" of computer hackers in the first place. The second takes the opposite view of the second premise. Step 3: Ask "why" about possible antithetical arguments. Of course, these examples of creating oppositions with simple changes demand more explanation than ...
Scribd is the world's largest social reading and publishing site.
Antithesis refers to the refutation of the idea. Synthesis is the moulding of the idea and its refutations into a new idea. For instance, I can crudely write an example like this: Thesis - There is a God. Antithesis - There is a lot of bad in the world. Synthesis - There is a God but His ways are mysterious. See below: A couple of things to ...
Hegel's Dialectics - Stanford Encyclopedia of Philosophy
Both use the following model: Thesis Antithesis Synthesis Dialectical thought assumes that everything in nature has its opposite (i.e., life/death). A dialectical view of history interprets the clash of antithetical historical forces (i.e., the proletariat versus the bourgeoisie) as anterior to achieving social progress. ...
Antithesis (pl.: antitheses; Greek for "setting opposite", from ἀντι-"against" and θέσις "placing") is used in writing or speech either as a proposition that contrasts with or reverses some previously mentioned proposition, or when two opposites are introduced together for contrasting effect. [1] [2]Antithesis can be defined as "a figure of speech involving a seeming contradiction of ...
Somewhere during the early part of his tenure at Crozer, King's intellectual curiosity caught fire, and he spent the balance of his time at the seminary exploring the work of some of the world ...
The triad thesis, antithesis, synthesis (German: These, Antithese, Synthese; originally: [1] Thesis, Antithesis, Synthesis) is often used to describe the thought of German philosopher Georg Wilhelm Friedrich Hegel. [2] Hegel never used the term himself. It originated with Johann Fichte. [1] The triad, also known as the dialectical method, is usually described in the following way:
If your topic or take on an issue is particularly controversial, you might have to work hard at convincing almost all of your readers about the validity of your argument. The process of considering opposing viewpoints is the goal of this exercise, the Antithesis essay. Think about this exercise as a way of exploring the variety of different and ...
To adopt a dialectical worldview means to strive to embrace that seemingly opposite ideas can both be true and to accept change as a natural occurrence. Dialectics is like a teeter-totter: the two seats reflect opposite sides or truths that can exist at the same time. These opposites sides are called "thesis" and "antithesis.".
For over fifty years, Hegel interpreters have rejected the former belief that Hegel used thesis-antithesis-synthesis dialectics. In this incisive analysis of Hegel's philosophy, Leonard F. Wheat shows that the modern interpretation is false. Wheat rigorously demonstrates that there are in fact thirty-eight well-concealed dialectics in Hegel's two most important works--twenty-eight in ...
To summarize: the Thesis position, then, can be characterized as follows: Many of these problems will be dealt with and solved by beginning the process of designing the trials. We must initiate phase III HIV vaccine efficacy trials as soon as possible, and follow the...
Understanding the dynamical transformation of neural activity to behavior requires new capabilities to nonlinearly model, dissociate and prioritize behaviorally relevant neural dynamics and test ...