30 · The Second Law of Thermodynamics, and Engines of Cognition

Concise Summary简洁概述

The Second Law of Thermodynamics -- far from being a mere physical curiosity -- turns out to be Bayesian in nature. Liouville's Theorem shows that phase space volume is conserved under physics, which means entropy can only shift between subsystems, never be destroyed globally. This has a startling corollary: to increase the mutual information between your mind and the world (i.e., to form accurate beliefs), you must physically interact with the world and expend thermodynamic work. Maxwell's Demon illustrates this precisely -- knowing the state of molecules is negentropy you can exploit, but acquiring that knowledge costs work. The upshot: "forming accurate beliefs requires evidence" is not just an epistemic slogan but a consequence of Liouville's Theorem. Blind faith, if it actually produced knowledge, would let you run a perpetual-motion machine of the second kind.

热力学第二定律——远不仅仅是一个物理趣谈——其本质原来是贝叶斯式的。刘维尔定理表明，物理学下的相空间体积守恒，这意味着熵只能在子系统之间转移，而永远不能在全局消除。这有一个令人惊讶的推论：要增加你的心智与世界之间的互信息（即形成准确信念），你必须与世界发生物理交互并耗费热力学功。麦克斯韦妖精确地阐明了这一点——知道分子状态是可以利用的负熵，但获取这种知识本身需要做功。结论是："形成准确信念需要证据"不只是一句认识论口号，而是刘维尔定理的推论。如果盲目信仰真的能产生知识，你就可以运行一台第二类永动机。

Infographic信息图

net change in phase space volume under physics (Liouville's Theorem)

物理学下相空间体积的净变化量（刘维尔定理）

H(M)+H(Y)−I(M;Y)

joint entropy of Maxwell's Demon + water system

麦克斯韦妖与水系统的联合熵

2nd kind

perpetual-motion type that free knowledge acquisition would enable

若知识可无代价获取将实现的永动机类型

⚙️

Phase space is conserved

相空间守恒

Liouville's Theorem proves that the volume of any phase space region evolves unchanged under physics -- you can redistribute entropy but never globally destroy it.

刘维尔定理证明，任何相空间区域的体积在物理演化下保持不变——你只能重新分配熵，而永远不能在全局消灭它。

🧊

The refrigerator analogy

冰箱类比

A refrigerator cools one subsystem by expanding the phase space of another; the joint volume stays constant. Entropy is moved, not destroyed.

冰箱通过扩大另一个子系统的相空间来冷却一个子系统；联合体积保持不变。熵被转移，而非消灭。

👻

Maxwell's Demon and knowledge as negentropy

麦克斯韦妖与知识即负熵

If you already know the exact state of all molecules, that mutual information is usable negentropy -- but acquiring it in the first place costs thermodynamic work.

如果你已经知道所有分子的精确状态，那种互信息就是可利用的负熵——但首先获得它本身就需要热力学做功。

🔬

Observation is physical work

观察即物理做功

Any rational mind that gains mutual information with its environment has done real thermodynamic work -- not just metaphorical mental effort.

任何与其环境建立互信息的理性心智都完成了真实的热力学做功——而不仅仅是隐喻意义上的脑力消耗。

🌡️

Cold rationality -- literally

字面意义上的冷静理性

Engines of cognition, like heat engines, must radiate waste heat when imperfect; "cold rationality" is true in a thermodynamic sense Hollywood never intended.

认知引擎与热机一样，不完美时必须散发废热；"冷静理性"在好莱坞从未想到的热力学意义上是真的。

The argument, step by step

论证的推进链条

First Law: energy is conserved -- but this alone does not prohibit converting heat into work in either direction.

第一定律：能量守恒——但这本身并不禁止双向转换热与功。

Second Law follows from Liouville's Theorem: phase space volume is conserved, so entropy can only be redistributed, never globally destroyed.

第二定律来自刘维尔定理：相空间体积守恒，因此熵只能被重新分配，而不能在全局消灭。

The Second Law is inherently Bayesian: it is a strict statement about your beliefs about a system, not a deterministic law about the system itself.

第二定律本质上是贝叶斯式的：它是关于你对系统的信念的严格陈述，而不是关于系统本身的决定论定律。

Maxwell's Demon shows that knowledge of a system's state is usable negentropy -- but acquiring that knowledge requires physical interaction and thermodynamic work.

麦克斯韦妖表明，对系统状态的了解是可利用的负熵——但获取这种了解需要物理交互和热力学做功。

Therefore: forming accurate beliefs requires observing the world -- this is a physical, thermodynamic necessity, not merely an epistemic norm.

因此：形成准确信念需要观察世界——这是物理上的、热力学上的必要条件，而不仅仅是认识论规范。

Corollary: any argument that claims to yield true knowledge of the unseen without observation must violate physics at some specific step.

推论：任何声称无需观察便能产生对未见之物的真实知识的论证，必然在某一具体步骤上违反了物理定律。

Detailed Summary详细概述

The First Law and Its Limits

Yudkowsky opens by distinguishing what the First Law (Conservation of Energy) does and does not prohibit. It rules out perpetual motion machines of the first type -- devices that create energy from nothing. By mathematical induction over individual particle interactions, no assembly of gears, however clever, can violate this. A similar argument applies to Conservation of Momentum and "reactionless drives."

But Conservation of Energy, by itself, says nothing to stop you from converting warm water back into ice cubes and electricity. The net energy change is zero in both directions. So why can't you run the machine in reverse?

Liouville's Theorem and the Real Second Law

The Second Law of Thermodynamics, Yudkowsky argues, is not a separate mysterious principle -- it is a corollary of Liouville's Theorem: in any closed system, phase space volume is conserved over time.

He illustrates with a toy model: system X has 8 states, Y has 4, joint system has 32. A subspace S of 4 states maps to 4 future states -- a refrigerator-like process where Y narrows its spread while X expands. The joint volume is conserved; 4 initial states map to 4 end states. No more, no less.

This is the Second Law: you can squeeze the phase space of one subsystem only if you widen another. You cannot reduce the total phase space volume. Which means entropy -- the log of that volume -- cannot decrease overall.

The Bayesian Nature of the Second Law

Here Yudkowsky makes his key philosophical move: the Second Law is essentially Bayesian. When we speak of entropy, we are speaking of our uncertainty about a system. A hot glass of water has higher entropy than a cold one not in some mystical sense, but because hotter molecules can be moving faster in more ways, so your uncertainty about any one molecule's velocity is greater -- and multiplied across molecules, exponentially so. "We take the logarithm of this exponential volume of uncertainty, and call that the entropy."

Conversely, if Saint Laplace revealed the exact positions and velocities of every molecule in your glass of water, the thermodynamic entropy would genuinely be zero -- and you could actually extract electricity from it and leave behind an ice cube. This is not a thought experiment; it is what a Szilard engine does.

Maxwell's Demon

Maxwell's Demon is the key illustration. The Demon sorts fast and slow molecules to create a temperature differential -- a perpetual free-energy machine? No: the Demon generates entropy by inspecting molecules and deciding which to let through. But if the Demon already knew the state of every molecule (mutual information already established), it could run without generating new entropy, and extract useful work.

The math is clean: if M (Demon) has 2 bits of entropy and Y (water) has 2 bits, but they share 2 bits of mutual information, then H(M,Y) = H(M) + H(Y) - I(M;Y) = 2 + 2 - 2 = 2 bits. The Demon transforms that mutual information ("coldness") into actual thermodynamic coldness of the water. Liouville's Theorem is not violated -- the joint phase space is conserved.

The Payoff: Observation Is Thermodynamic Work

The conclusion is striking: to form accurate beliefs about something, you really do have to observe it. This is not merely an epistemic norm; it is a consequence of physics. Gaining mutual information with a system requires physical interaction and thermodynamic work. Without that work, you cannot gain knowledge; if you could, you could build a Maxwell's Demon that runs on blind faith and violate the Second Law -- which would violate Liouville's Theorem.

"Forming accurate beliefs requires a corresponding amount of evidence" is, as E.T. Jaynes put it, a very cogent truth both in human relations and in thermodynamics. Engines of cognition are not so different from heat engines; an imperfect one must radiate waste heat. And any elaborate argument that purports to deliver knowledge of the unseen without observation must, at some specific step, violate the laws of physics.

第一定律及其局限

Yudkowsky 首先区分了第一定律（能量守恒）确实禁止的事和不禁止的事。它排除了第一类永动机——凭空创造能量的装置。通过对单个粒子相互作用的数学归纳，无论多聪明的齿轮组合都无法违背这一点。类似的论证也适用于动量守恒和"无反作用推进器"。

但能量守恒本身并不能阻止你将温水反向转化回冰块和电能。两个方向的净能量变化都是零。那么，为什么不能反向运行这台机器？

刘维尔定理与真正的第二定律

Yudkowsky 论证说，热力学第二定律并不是一个独立的神秘原则——它是刘维尔定理的推论：任何封闭系统中，相空间体积随时间守恒。

他用一个玩具模型来说明：系统 X 有 8 种状态，Y 有 4 种，联合系统有 32 种。4 个状态的子空间 S 映射到 4 个未来状态——一个类似冰箱的过程，其中 Y 缩小其分布范围而 X 扩展。联合体积守恒；4 个初始状态映射到 4 个终态。不多也不少。

这就是第二定律：只有扩大另一个子系统的相空间，才能压缩一个子系统的相空间。你不能减小总相空间体积。这意味着熵——该体积的对数——在整体上不能减小。

第二定律的贝叶斯本质

在这里，Yudkowsky 做出了他的关键哲学论断：第二定律本质上是贝叶斯式的。当我们谈论熵时，我们是在谈论我们对系统的不确定性。一杯热水的熵比冷水高，不是以某种神秘的方式，而是因为更热的分子可以以更多方式更快地运动，所以你对任一分子速度的不确定性更大——乘以所有分子，就是指数级的不确定性。"我们对这个指数级的不确定体积取对数，称之为熵。"

相反，如果拉普拉斯圣人揭示了你那杯水中每个分子的精确位置和速度，热力学熵将真正变为零——你实际上可以从中提取电能，留下一块冰块。这不是思想实验；这正是西拉德发动机所做的事。

麦克斯韦妖

麦克斯韦妖是关键的说明。妖精通过分选快慢分子来制造温差——一台永久的免费能量机器？不：妖精在检查分子、决定让哪些通过的过程中产生了熵。但如果妖精已经知道每个分子的状态（互信息已经建立），它就可以在不产生新熵的情况下运行，并提取有用功。

数学很清晰：如果 M（妖精）有 2 比特熵，Y（水）有 2 比特，但它们共享 2 比特互信息，则 H(M,Y) = H(M) + H(Y) − I(M;Y) = 2 + 2 − 2 = 2 比特。妖精将这种互信息（"冷度"）转化为水实际的热力学冷度。刘维尔定理没有被违反——联合相空间守恒。

结论：观察即热力学做功

结论令人震惊：要形成关于某事物的准确信念，你真的必须观察它。 这不仅仅是一个认识论规范；它是物理学的推论。与系统建立互信息需要物理交互和热力学做功。没有这种做功，你就无法获得知识；如果可以，你就可以建造一台以盲目信仰运行的麦克斯韦妖，违反第二定律——而这将违反刘维尔定理。

正如 E.T. 贾恩斯所说，"形成准确信念需要相应量的证据"，无论在人际关系中还是在热力学中都是非常深刻的真理。认知引擎与热机并没有太大不同；一台不完美的认知引擎必须散发废热。任何声称无需观察就能提供对未见之物的知识的精巧论证，必然在某个具体步骤上违反了物理定律。

FAQ常见问答

Why is the Second Law described as "Bayesian in nature"?为什么说第二定律"本质上是贝叶斯式的"？

The Second Law ultimately governs your uncertainty about a system, not a deterministic property of the system itself. Entropy is the log of the phase space volume consistent with what you know. Hot water has higher entropy because there are exponentially more ways the molecules could be arranged; cold water, fewer. If you gained complete knowledge of every molecule's state, the thermodynamic entropy would literally be zero -- the Bayesian and thermodynamic notions are the same thing.

第二定律最终支配的是你对系统的不确定性，而不是系统本身的决定论属性。熵是与你所知相符的相空间体积的对数。热水的熵更高，是因为分子可能的排列方式呈指数级增多；冷水则更少。如果你获得了每个分子状态的完整知识，热力学熵字面上就会变为零——贝叶斯意义上的熵与热力学意义上的熵是同一件事。

What exactly is Liouville's Theorem, and why does it imply the Second Law?刘维尔定理究竟是什么，它为何蕴含第二定律？

Liouville's Theorem is a proven result in classical mechanics: if you take any region of phase space (all positions and momenta of all particles) and evolve it forward under the laws of physics, the volume of that region is conserved. It cannot shrink. This means no physical process can map many different states all to the same end-state -- which in turn means you can never reduce total entropy: every compression in one subsystem must be balanced by an expansion elsewhere.

刘维尔定理是经典力学中的一个已证明结果：如果你取相空间（所有粒子的所有位置和动量）中的任意区域，并按物理定律向前演化，该区域的体积守恒。它不能缩小。这意味着没有任何物理过程能将许多不同的初始状态全都映射到同一个终态——这反过来意味着你永远无法减小总熵：一个子系统的任何压缩都必须被其他地方的扩展所平衡。

How does Maxwell's Demon illustrate the connection between knowledge and thermodynamics?麦克斯韦妖如何说明知识与热力学之间的联系？

The Demon sorts molecules by their speed, seemingly reducing entropy for free. The catch: inspecting each molecule and deciding its fate generates entropy in the Demon's memory. But if the Demon already possessed complete mutual information about the gas (it knew every molecule's state before starting), it could exploit that stored knowledge without generating new entropy -- effectively converting knowledge (negentropy) into physical work. This is not science fiction; it is the operating principle of a Szilard engine.

妖精通过速度分选分子，看似在免费地减少熵。问题在于：检查每个分子并决定其命运会在妖精的记忆中产生熵。但如果妖精预先拥有关于气体的完整互信息（它在开始前就知道每个分子的状态），它就可以在不产生新熵的情况下利用那些存储的知识——实际上是将知识（负熵）转化为物理功。这不是科幻小说；这是西拉德发动机的工作原理。

Does this really mean that blind faith, if it worked, would violate physics?这是否真的意味着，如果盲目信仰有效，它将违反物理定律？

Precisely. If you could form accurate beliefs about the world without physically interacting with it -- without observation -- you would possess mutual information with the world that cost no thermodynamic work to acquire. You could then exploit that mutual information as a Maxwell's Demon does, extracting useful work from a uniform heat source. That would violate the Second Law, which would violate Liouville's Theorem. So any argument that purports to deliver knowledge of the unseen without observation must be violating physics at some specific step.

正是如此。如果你能在不与世界发生物理交互的情况下——不经观察——形成关于世界的准确信念，你就拥有了无需热力学做功便获取的互信息。然后你可以像麦克斯韦妖那样利用该互信息，从均匀热源中提取有用功。这将违反第二定律，而这将违反刘维尔定理。因此，任何声称无需观察便能提供对未见之物知识的论证，必然在某个具体步骤上违反了物理定律。

The essay says the Second Law is "probabilistic" -- does that mean Liouville's Theorem might be violated?文章说第二定律是"概率性的"——这是否意味着刘维尔定理可能被以小概率违反？

No. The probabilistic character is in how you describe your uncertainty, not in the theorem itself. Liouville's Theorem is a theorem -- it holds exactly. The probabilistic element enters because, starting from a large region of phase space you're uncertain about, you assign a tiny (but nonzero) probability to ending up in some specific tiny region. Hot water could spontaneously become ice and electricity -- but the probability is so small as to be negligible. The theorem is never violated; only the outcome is uncertain.

不。概率性特征体现在你如何描述你的不确定性，而不在于定理本身。刘维尔定理是一个定理——它精确成立。概率性因素之所以出现，是因为从你不确定的一大片相空间区域出发，你给落入某个特定微小区域赋予了一个极小（但非零）的概率。热水可以自发变成冰块和电能——但概率小到可以忽略不计。定理从未被违反；只是结果是不确定的。

What does Yudkowsky mean by "engines of cognition"?Yudkowsky 所说的"认知引擎"是什么意思？

A mind that forms accurate beliefs is, in a precise thermodynamic sense, an engine: it takes in information (does work), reduces its own uncertainty about the world (cools one subsystem), and necessarily increases entropy elsewhere (radiates waste heat). The better the mind, the more efficiently it runs -- but it can never be perfectly efficient, just as no heat engine can. "Cold rationality" is literally true: a perfectly rational reasoner would keep its internal states as low-entropy as possible, minimizing wasted work.

一个形成准确信念的心智，在精确的热力学意义上就是一台引擎：它输入信息（做功），减少自身对世界的不确定性（冷却一个子系统），并必然在其他地方增加熵（散发废热）。心智越好，运行就越高效——但它永远不能完全高效，就像没有热机能达到完美效率一样。"冷静理性"字面上是真的：一个完全理性的推理者会尽可能保持其内部状态的低熵，最小化无效做功。

In-depth Analysis · Pros & Cons深入解读 · 优缺点

This essay is Yudkowsky at his most technically ambitious: he derives an epistemological principle ("you must observe to know") from a theorem in classical mechanics (Liouville's Theorem), via Shannon information theory and Maxwell's Demon. The argument is genuinely interdisciplinary and, in its core logic, correct.

这篇文章是 Yudkowsky 技术上最具野心的作品之一：他从经典力学中的一个定理（刘维尔定理）出发，经由香农信息论和麦克斯韦妖，推导出一条认识论原则（"你必须观察才能知道"）。这一论证是真正跨学科的，其核心逻辑是正确的。

✓Strengths亮点 / 优点

The thermodynamic grounding is real
热力学基础是真实的
Liouville's Theorem is a genuine result in classical mechanics; Szilard engines and Maxwell's Demon are real (and experimentally realized) physics. Yudkowsky is not being loose with the science.
刘维尔定理是经典力学中的真实结果；西拉德发动机和麦克斯韦妖是真实的（且已在实验中实现的）物理现象。Yudkowsky 对科学并没有随意处理。
Unifying move: entropy = uncertainty
统一性论断：熵 = 不确定性
By showing that thermodynamic entropy and information-theoretic entropy are the same thing (as Jaynes argued formally), the essay dissolves the apparent gap between physics and epistemology in one stroke.
通过表明热力学熵与信息论熵是同一件事（正如贾恩斯在形式上所论证的），文章一举消解了物理学与认识论之间表面上的鸿沟。
The toy model is pedagogically excellent
玩具模型在教学上极为出色
The X/Y 8-state/4-state example makes phase space volume conservation concrete and checkable, and the refrigerator analogy that follows from it is memorable.
X/Y 8态/4态的例子使相空间体积守恒变得具体可查，由此引出的冰箱类比令人难忘。
The payoff is genuinely surprising
结论真正出人意料
The conclusion -- that "blind faith as epistemology would permit a perpetual motion machine" -- is not mere rhetoric; it follows directly from the argument and gives the essay lasting punch.
结论——"盲目信仰作为认识论将允许一台永动机"——不只是修辞；它直接从论证中推导出来，赋予文章持久的震撼力。

▲Limits & Critiques局限 / 批评

Classical mechanics, not quantum
经典力学而非量子力学
Liouville's Theorem holds in classical phase space. In quantum mechanics, the relevant analogue involves density matrices and von Neumann entropy, which is more subtle; the essay acknowledges quantum effects only in passing, but the real world is quantum, not classical.
刘维尔定理在经典相空间中成立。在量子力学中，相关类比涉及密度矩阵和冯·诺依曼熵，更为微妙；文章只是顺带提及量子效应，但真实世界是量子的，不是经典的。
The Landauer/erasure subtlety is glossed over
兰道尔/擦除的微妙之处被一笔带过
Yudkowsky acknowledges in a parenthetical that whether the thermodynamic cost is in observation or in erasing memory to prepare for the next observation (Landauer's principle) is "just a matter of words." But this distinction is physically and conceptually important: the demon's memory is the bottleneck, not its sensing. The dismissal could mislead readers about where the real cost falls.
Yudkowsky 在括号中承认，热力学代价究竟在于观察还是在于擦除记忆以准备下一次观察（兰道尔原理），"只是措辞问题"。但这一区分在物理上和概念上都很重要：瓶颈在于妖精的记忆，而不是其感知。这种轻描淡写可能会误导读者，让他们误解真正的代价落在哪里。
The epistemic jump is wider than claimed
认识论跳跃比声称的更大
The physical argument shows that a perfectly efficient Maxwell's Demon needs prior mutual information. But the jump to "therefore all epistemically valid belief formation requires observation" tacitly assumes that human cognition is relevantly analogous to a Szilard engine -- an assumption that is plausible but not argued.
物理论证表明，一台完全高效的麦克斯韦妖需要预先的互信息。但跳到"因此所有认识论上有效的信念形成都需要观察"，则默认人类认知与西拉德发动机有相关的类比——这是一个合理但未经论证的假设。
Logical truths are waved aside
逻辑真理被挥手略过
The essay brackets "discovering logical truths" with a parenthetical deferral. But this is a significant exception: deductive reasoning can (in principle) yield new knowledge without external observation, and the relationship between logical inference and thermodynamic cost is non-trivial (reversible computation can be done at arbitrarily low energy cost). Omitting it limits the argument's generality.
文章用括号推迟了对"发现逻辑真理"的处理。但这是一个重要的例外：演绎推理（原则上）可以在不依赖外部观察的情况下产生新知识，而逻辑推理与热力学代价之间的关系是不简单的（可逆计算理论上可以以任意低的能量代价完成）。省略这一点限制了论证的普遍性。

Bottom line

总评

A rare piece that earns its ambition: the thermodynamic grounding of epistemology is not hand-waving but genuine physics, and the concluding reductio on blind faith is one of the sharpest arguments in the Sequences. The main weaknesses are the classical (not quantum) framing and the parenthetical dismissal of Landauer's principle -- both of which a physics-literate reader will notice. But for the core thesis -- that observation is physical work -- the argument holds.

一篇实现了自身野心的罕见文章：对认识论的热力学奠基不是空洞的比喻，而是真实的物理学，关于盲目信仰的最终归谬论证是整个系列中最犀利的论证之一。主要缺点是经典（而非量子）的框架，以及对兰道尔原理的括号式轻描淡写——物理学素养较高的读者都会注意到这两点。但对于核心论题——观察即物理做功——论证是成立的。

Original Text原文

Read on LessWrong ↗在 LessWrong 阅读原文 ↗

The first law of thermodynamics, better known as Conservation of Energy, says that you can't create energy from nothing: it prohibits perpetual motion machines of the first type, which run and run indefinitely without consuming fuel or any other resource. According to our modern view of physics, energy is conserved in each individual interaction of particles. By mathematical induction, we see that no matter how large an assemblage of particles may be, it cannot produce energy from nothing - not without violating what we presently believe to be the laws of physics.

This is why the US Patent Office will summarily reject your amazingly clever proposal for an assemblage of wheels and gears that cause one spring to wind up another as the first runs down, and so continue to do work forever, according to your calculations. There's a fully general proof that at least one wheel must violate (our standard model of) the laws of physics for this to happen. So unless you can explain how one wheel violates the laws of physics, the assembly of wheels can't do it either.

A similar argument applies to a "reactionless drive", a propulsion system that violates Conservation of Momentum. In standard physics, momentum is conserved for all individual particles and their interactions; by mathematical induction, momentum is conserved for physical systems whatever their size. If you can visualize two particles knocking into each other and always coming out with the same total momentum that they started with, then you can see how scaling it up from particles to a gigantic complicated collection of gears won't change anything. Even if there's a trillion quadrillion atoms involved, 0 + 0 + ... + 0 = 0.

But Conservation of Energy, as such, cannot prohibit converting heat into work. You can, in fact, build a sealed box that converts ice cubes and stored electricity into warm water. It isn't even difficult. Energy cannot be created or destroyed: The net change in energy, from transforming (ice cubes + electricity) to (warm water), must be 0. So it couldn't violate Conservation of Energy, as such, if you did it the other way around...

Perpetual motion machines of the second type, which convert warm water into electrical current and ice cubes, are prohibited by the Second Law of Thermodynamics.

The Second Law is a bit harder to understand, as it is essentially Bayesian in nature.

Yes, really.

The essential physical law underlying the Second Law of Thermodynamics is a theorem which can be proven within the standard model of physics: In the development over time of any closed system, phase space volume is conserved.

Let's say you're holding a ball high above the ground. We can describe this state of affairs as a point in a multidimensional space, at least one of whose dimensions is "height of ball above the ground". Then, when you drop the ball, it moves, and so does the dimensionless point in phase space that describes the entire system that includes you and the ball. "Phase space", in physics-speak, means that there are dimensions for the momentum of the particles, not just their position - i.e., a system of 2 particles would have 12 dimensions, 3 dimensions for each particle's position, and 3 dimensions for each particle's momentum.

If you had a multidimensional space, each of whose dimensions described the position of a gear in a huge assemblage of gears, then as you turned the gears a single point would swoop and dart around in a rather high-dimensional phase space. Which is to say, just as you can view a great big complex machine as a single point in a very-high-dimensional space, so too, you can view the laws of physics describing the behavior of this machine over time, as describing the trajectory of its point through the phase space.

The Second Law of Thermodynamics is a consequence of a theorem which can be proven in the standard model of physics: If you take a volume of phase space, and develop it forward in time using standard physics, the total volume of the phase space is conserved.

For example:

Let there be two systems, X and Y: where X has 8 possible states, Y has 4 possible states, and the joint system (X,Y) has 32 possible states.

The development of the joint system over time can be described as a rule that maps initial points onto future points. For example, the system could start out in X~7~Y~2~, then develop (under some set of physical laws) into the state X~3~Y~3~ a minute later. Which is to say: if X started in 7, and Y started in 2, and we watched it for 1 minute, we would see X go to 3 and Y go to 3. Such are the laws of physics.

Next, let's carve out a subspace S of the joint system state. S will be the subspace bounded by X being in state 1 and Y being in states 1-4. So the total volume of S is 4 states.

And let's suppose that, under the laws of physics governing (X,Y) the states initially in S behave as follows:

X~1~Y~1~ -\> X~2~Y~1~ X~1~Y~2~ -\> X~4~Y~1~ X~1~Y~3~ -\> X~6~Y~1~ X~1~Y~4~ -\> X~8~Y~1~

That, in a nutshell, is how a refrigerator works.

The X subsystem began in a narrow region of state space - the single state 1, in fact - and Y began distributed over a wider region of space, states 1-4. By interacting with each other, Y went into a narrow region, and X ended up in a wide region; but the total phase space volume was conserved. 4 initial states mapped to 4 end states.

Clearly, so long as total phase space volume is conserved by physics over time, you can't squeeze Y harder than X expands, or vice versa - for every subsystem you squeeze into a narrower region of state space, some other subsystem has to expand into a wider region of state space.

Now let's say that we're uncertain about the joint system (X,Y), and our uncertainty is described by an equiprobable distribution over S. That is, we're pretty sure X is in state 1, but Y is equally likely to be in any of states 1-4. If we shut our eyes for a minute and then open them again, we will expect to see Y in state 1, but X might be in any of states 2-8. Actually, X can only be in some of states 2-8, but it would be too costly to think out exactly which states these might be, so we'll just say 2-8.

If you consider the Shannon entropy of our uncertainty about X and Y as individual systems, X began with 0 bits of entropy because it had a single definite state, and Y began with 2 bits of entropy because it was equally likely to be in any of 4 possible states. (There's no mutual information between X and Y.) A bit of physics occurred, and lo, the entropy of Y went to 0, but the entropy of X went to log~2~(7) = 2.8 bits. So entropy was transferred from one system to another, and decreased within the Y subsystem; but due to the cost of bookkeeping, we didn't bother to track some information, and hence (from our perspective) the overall entropy increased.

If there was a physical process that mapped past states onto future states like this:

X2,Y1 -> X2,Y1 X2,Y2 -> X2,Y1 X2,Y3 -> X2,Y1 X2,Y4 -> X2,Y1

Then you could have a physical process that would actually decrease entropy, because no matter where you started out, you would end up at the same place. The laws of physics, developing over time, would compress the phase space.

But there is a theorem, Liouville's Theorem, which can be proven true of our laws of physics, which says that this never happens: phase space is conserved.

The Second Law of Thermodynamics is a corollary of Liouville's Theorem: no matter how clever your configuration of wheels and gears, you'll never be able to decrease entropy in one subsystem without increasing it somewhere else. When the phase space of one subsystem narrows, the phase space of another subsystem must widen, and the joint space keeps the same volume.

Except that what was initially a compact phase space, may develop squiggles and wiggles and convolutions; so that to draw a simple boundary around the whole mess, you must draw a much larger boundary than before - this is what gives the appearance of entropy increasing. (And in quantum systems, where different universes go different ways, entropy actually does increase in any local universe. But omit this complication for now.)

The Second Law of Thermodynamics is actually probabilistic in nature - if you ask about the probability of hot water spontaneously entering the "cold water and electricity" state, the probability does exist, it's just very small. This doesn't mean Liouville's Theorem is violated with small probability; a theorem's a theorem, after all. It means that if you're in a great big phase space volume at the start, but you don't know where, you may assess a tiny little probability of ending up in some particular phase space volume. So far as you know, with infinitesimal probability, this particular glass of hot water may be the kind that spontaneously transforms itself to electrical current and ice cubes. (Neglecting, as usual, quantum effects.)

So the Second Law really is inherently Bayesian. When it comes to any real thermodynamic system, it's a strictly lawful statement of your beliefs about the system, but only a probabilistic statement about the system itself.

"Hold on," you say. "That's not what I learned in physics class," you say. "In the lectures I heard, thermodynamics is about, you know, temperatures. Uncertainty is a subjective state of mind! The temperature of a glass of water is an objective property of the water! What does heat have to do with probability?"

Oh ye of little trust.

In one direction, the connection between heat and probability is relatively straightforward: If the only fact you know about a glass of water is its temperature, then you are much more uncertain about a hot glass of water than a cold glass of water.

Heat is the zipping around of lots of tiny molecules; the hotter they are, the faster they can go. Not all the molecules in hot water are travelling at the same speed - the "temperature" isn't a uniform speed of all the molecules, it's an average speed of the molecules, which in turn corresponds to a predictable statistical distribution of speeds - anyway, the point is that, the hotter the water, the faster the water molecules could be going, and hence, the more uncertain you are about the velocity (not just speed) of any individual molecule. When you multiply together your uncertainties about all the individual molecules, you will be exponentially more uncertain about the whole glass of water.

We take the logarithm of this exponential volume of uncertainty, and call that the entropy. So it all works out, you see.

The connection in the other direction is less obvious. Suppose there was a glass of water, about which, initially, you knew only that its temperature was 72 degrees. Then, suddenly, Saint Laplace reveals to you the exact locations and velocities of all the atoms in the water. You now know perfectly the state of the water, so, by the information-theoretic definition of entropy, its entropy is zero. Does that make its thermodynamic entropy zero? Is the water colder, because we know more about it?

Ignoring quantumness for the moment, the answer is: Yes! Yes it is!

Maxwell once asked: Why can't we take a uniformly hot gas, and partition it into two volumes A and B, and let only fast-moving molecules pass from B to A, while only slow-moving molecules are allowed to pass from A to B? If you could build a gate like this, soon you would have hot gas on the A side, and cold gas on the B side. That would be a cheap way to refrigerate food, right?

The agent who inspects each gas molecule, and decides whether to let it through, is known as "Maxwell's Demon". And the reason you can't build an efficient refrigerator this way, is that Maxwell's Demon generates entropy in the process of inspecting the gas molecules and deciding which ones to let through.

But suppose you already knew where all the gas molecules were?

Then you actually could run Maxwell's Demon and extract useful work.

So (again ignoring quantum effects for the moment), if you know the states of all the molecules in a glass of hot water, it is cold in a genuinely thermodynamic sense: you can take electricity out of it and leave behind an ice cube.

This doesn't violate Liouville's Theorem, because if Y is the water, and you are Maxwell's Demon (denoted M), the physical process behaves as:

M1,Y1 -> M1,Y1 M2,Y2 -> M2,Y1 M3,Y3 -> M3,Y1 M4,Y4 -> M4,Y1

Because Maxwell's demon knows the exact state of Y, this is mutual information between M and Y. The mutual information decreases the joint entropy of (M,Y): H(M,Y) = H(M) + H(Y) - I(M;Y). M has 2 bits of entropy, Y has two bits of entropy, and their mutual information is 2 bits, so (M,Y) has a total of 2 + 2 - 2 = 2 bits of entropy. The physical process just transforms the "coldness" (negentropy) of the mutual information to make the actual water cold - afterward, M has 2 bits of entropy, Y has 0 bits of entropy, and the mutual information is 0. Nothing wrong with that!

And don't tell me that knowledge is "subjective". Knowledge has to be represented in a brain, and that makes it as physical as anything else. For M to physically represent an accurate picture of the state of Y, M's physical state must correlate with the state of Y. You can take thermodynamic advantage of that - it's called a Szilard engine.

Or as E.T. Jaynes put it, "The old adage 'knowledge is power' is a very cogent truth, both in human relations and in thermodynamics."

And conversely, one subsystem cannot increase in mutual information with another subsystem, without (a) interacting with it and (b) doing thermodynamic work.

Otherwise you could build a Maxwell's Demon and violate the Second Law of Thermodynamics - which in turn would violate Liouville's Theorem - which is prohibited in the standard model of physics.

Which is to say: **To form accurate beliefs about something, you really do have to observe it.** It's a very physical, very real process: any rational mind does "work" in the thermodynamic sense, not just the sense of mental effort.

(It is sometimes said that it is erasing bits in order to prepare for the next observation that takes the thermodynamic work - but that distinction is just a matter of words and perspective; the math is unambiguous.)

(Discovering logical "truths" is a complication which I will not, for now, consider - at least in part because I am still thinking through the exact formalism myself. In thermodynamics, knowledge of logical truths does not count as negentropy; as would be expected, since a reversible computer can compute logical truths at arbitrarily low cost. All this that I have said is true of the logically omniscient: any lesser mind will necessarily be less efficient.)

"Forming accurate beliefs requires a corresponding amount of evidence" is a very cogent truth both in human relations and in thermodynamics: if blind faith actually worked as a method of investigation, you could turn warm water into electricity and ice cubes. Just build a Maxwell's Demon that has blind faith in molecule velocities.

Engines of cognition are not so different from heat engines, though they manipulate entropy in a more subtle form than burning gasoline. For example, to the extent that an engine of cognition is not perfectly efficient, it must radiate waste heat, just like a car engine or refrigerator.

"Cold rationality" is true in a sense that Hollywood scriptwriters never dreamed (and false in the sense that they did dream).

So unless you can tell me which specific step in your argument violates the laws of physics by giving you true knowledge of the unseen, don't expect me to believe that a big, elaborate clever argument can do it either.

热力学第一定律，更广为人知的名字是能量守恒，说的是你不能无中生有地创造能量：它禁止第一类永动机，那些无需消耗燃料或任何其他资源就能永远运转做功的机器。根据我们对物理学的现代理解，能量在每一次单个粒子的相互作用中都是守恒的。通过数学归纳，我们看到无论粒子的集合体有多大，它都无法凭空产生能量——否则就违反了我们目前所相信的物理定律。

这就是为什么美国专利局会断然拒绝你那份关于一组轮子与齿轮的绝妙提案——按照你的计算，其中一根弹簧在松弛的同时会带动另一根弹簧上紧，如此永远做功。有一个完全普遍的证明表明，至少有一个齿轮必须违反（我们的标准模型中的）物理定律，这才有可能发生。所以除非你能解释某一个齿轮是如何违反物理定律的，否则齿轮的组合同样做不到。

类似的论证也适用于"无反作用推进器"——一种违反动量守恒的推进系统。在标准物理学中，动量对所有单个粒子及其相互作用都是守恒的；通过数学归纳，无论物理系统规模多大，动量都是守恒的。如果你能想象两个粒子碰撞，总动量始终与碰撞前相同，那么你就能明白，从粒子扩展到一大堆复杂齿轮时，什么都不会改变。即便涉及到一万亿的一千万亿个原子，0 + 0 + … + 0 = 0。

但是，能量守恒本身，并不能禁止将热转化为功。事实上，你可以造一个密封的箱子，将冰块和储存的电能转化为温水。这甚至不难。能量不能被创造或消灭：将（冰块 + 电能）转化为（温水），净能量变化必须为 0。所以，如果你反向操作，这并不会违反能量守恒本身……

将温水转化为电流和冰块的第二类永动机，被热力学第二定律所禁止。

第二定律理解起来稍微难一些，因为它本质上是贝叶斯式的。

是的，真的。

热力学第二定律背后的基本物理定律，是一个可以在物理学标准模型框架内被证明的定理：任何封闭系统随时间的演化过程中，相空间体积守恒。

假设你手里高高举着一个球。我们可以用一个多维空间中的点来描述这一状态，其中至少有一个维度是"球距地面的高度"。然后，当你松手让球下落，球运动了，用来描述包括你和球在内的整个系统的那个无量纲点也在相空间中移动了。"相空间"，在物理学的术语里，是指这个空间的维度不仅包括粒子的位置，还包括粒子的动量——即，一个由 2 个粒子组成的系统有 12 个维度，每个粒子的位置贡献 3 个维度，每个粒子的动量贡献 3 个维度。

如果你有一个多维空间，其每一个维度描述一个巨大齿轮组合中某个齿轮的位置，那么当你转动这些齿轮时，相空间中就有一个点在极高维度的空间里飞速掠过。也就是说，正如你可以将一台庞大复杂的机器看作极高维空间中的一个点，你同样可以将描述这台机器随时间行为的物理定律，看作是在描述这个点在相空间中的轨迹。

热力学第二定律是物理学标准模型中一个可以被证明的定理的推论：如果你取相空间的一个体积，并用标准物理学将它向前演化，相空间的总体积是守恒的。

例如：

设有两个系统 X 和 Y：其中 X 有 8 种可能状态，Y 有 4 种可能状态，联合系统 (X,Y) 有 32 种可能状态。

联合系统随时间的演化，可以被描述为一条将初始点映射到未来点的规则。例如，系统可能从 X~7~Y~2~ 出发，然后（在某一组物理定律下）一分钟后演化到 X~3~Y~3~。也就是说：如果 X 从状态 7 出发，Y 从状态 2 出发，观察 1 分钟后，我们会看到 X 变为 3，Y 变为 3。此即物理定律。

接下来，让我们在联合系统的状态空间中划出一个子空间 S。S 是以"X 处于状态 1 且 Y 处于状态 1-4"为边界的子空间。所以 S 的总体积为 4 个状态。

假设在支配 (X,Y) 的物理定律下，S 中的初始状态如下演化：

X~1~Y~1~ -> X~2~Y~1~ X~1~Y~2~ -> X~4~Y~1~ X~1~Y~3~ -> X~6~Y~1~ X~1~Y~4~ -> X~8~Y~1~

这，简而言之，就是冰箱的工作原理。

X 子系统一开始处于状态空间的一个狭窄区域——实际上就是单一的状态 1——而 Y 一开始分布在更宽的区域，即状态 1-4。通过相互作用，Y 进入了一个狭窄区域，X 则扩展到了一个更宽的区域；但总相空间体积守恒。 4 个初始状态映射到 4 个终态。

显然，只要总相空间体积随时间被物理学所守恒，你就不可能把 Y 压缩得比 X 扩展的更多，反之亦然——对于你压缩到更窄状态空间区域的每一个子系统，其他某个子系统必然会扩展到更宽的状态空间区域。

现在假设我们对联合系统 (X,Y) 不确定，我们的不确定性由 S 上的等概率分布描述。也就是说，我们相当确定 X 处于状态 1，但 Y 处于状态 1-4 中任意一个的可能性相等。如果我们闭上眼睛一分钟再睁开，我们预期会看到 Y 处于状态 1，但 X 可能处于状态 2-8 中的任意一个。实际上，X 只能处于状态 2-8 中的某些状态，但要精确推算出到底是哪些状态代价太高，所以我们就说 2-8 吧。

如果你考虑我们对 X 和 Y 作为独立系统的不确定性的香农熵，X 一开始有 0 比特熵，因为它有唯一一个确定的状态，Y 一开始有 2 比特熵，因为它等可能地处于 4 种可能状态中的任意一种。（X 和 Y 之间没有互信息。）物理过程发生后，Y 的熵变为 0，但 X 的熵变为 log~2~(7) = 2.8 比特。所以熵从一个系统转移到了另一个系统，在 Y 子系统内部减少了；但由于记账的代价，我们没有费心追踪某些信息，因此（从我们的角度来看）总体熵增加了。

如果有一个物理过程将过去的状态映射到未来的状态，像这样：

X2,Y1 -> X2,Y1 X2,Y2 -> X2,Y1 X2,Y3 -> X2,Y1 X2,Y4 -> X2,Y1

那么你就可以有一个物理过程，它会实际上减少熵，因为无论你从哪里出发，你都会到达同一个地方。随时间演化的物理定律将会压缩相空间。

但有一个定理，刘维尔定理，它可以被证明对我们的物理定律成立，它说这种情况永远不会发生：相空间守恒。

热力学第二定律是刘维尔定理的推论：无论你的轮子和齿轮构型多么聪明，你都永远无法在不增加其他地方熵的情况下减少一个子系统中的熵。当一个子系统的相空间缩小，另一个子系统的相空间必然扩大，联合空间保持相同的体积。

只不过，最初紧凑的相空间，可能会发展出波纹、褶皱和卷曲；以至于要在整个混乱区域外画一条简单的边界，你必须画一条比以前大得多的边界——这就是熵看起来在增加的原因所在。（在量子系统中，不同宇宙走向不同的路径，熵在任何局域宇宙中确实增加。但现在先省略这个复杂情况。）

热力学第二定律实际上在本质上是概率性的——如果你问热水自发进入"冷水和电能"状态的概率，这个概率确实存在，只是非常小。这并不意味着刘维尔定理以小概率被违反；毕竟，定理就是定理。这意味着，如果你一开始处于一个巨大的相空间体积中，但你不知道身在何处，你可能会对最终到达某个特定的相空间体积给出一个极小的概率估计。就你所知，以无穷小的概率，这一杯特定的热水可能正是那种会自发转化为电流和冰块的那种。（照例，忽略量子效应。）

所以第二定律确实本质上是贝叶斯式的。当涉及到任何真实的热力学系统时，它是关于你对系统的信念的严格合规陈述，但只是关于系统本身的概率性陈述。

"等一下，"你说。"这不是我在物理课上学到的，"你说。"在我听的讲座里，热力学讲的是，你知道，温度。不确定性是一种主观的心理状态！一杯水的温度是水的客观属性！热与概率有什么关系？"

哦，你对信任缺乏信心啊。

从一个方向来看，热与概率之间的联系相对简单：如果你对一杯水唯一知道的事实是它的温度，那么你对热水的不确定性要远大于对冷水的不确定性。

热是许多微小分子飞速运动的结果；它们越热，运动就越快。热水中的所有分子并非以相同速度运动——"温度"不是所有分子的统一速度，而是分子的平均速度，这反过来对应于一个可预测的统计速度分布——不管怎样，关键是，水越热，水分子可能运动得越快，因此你对任意单个分子的速度（不仅仅是速率）就越不确定。当你把对所有单个分子的不确定性乘在一起，你对整杯水的不确定性将呈指数级增长。

我们取这个指数级不确定性体积的对数，称之为熵。所以一切都说得通了，你看。

另一个方向的联系则不那么显而易见。假设有一杯水，最初你只知道它的温度是 72 华氏度。然后，突然间，拉普拉斯圣人向你揭示了水中所有原子的精确位置和速度。你现在完全了解了水的状态，所以，根据信息论对熵的定义，它的熵为零。这会使它的热力学熵变为零吗？水会因为我们对它了解更多而变冷吗？

暂时忽略量子效应，答案是：是的！确实如此！

麦克斯韦曾经问道：我们为什么不能取一种均匀的热气体，把它分隔成两个体积 A 和 B，只让快速运动的分子从 B 流向 A，同时只让慢速运动的分子从 A 流向 B？如果你能造这样一扇门，很快你就会在 A 侧得到热气体，在 B 侧得到冷气体。这将是一种廉价的食物冷藏方式，对吧？

这个检查每个气体分子并决定是否让其通过的主体，被称为"麦克斯韦妖"。而你无法用这种方式造出高效冰箱的原因，是麦克斯韦妖在检查气体分子并决定让哪些通过的过程中产生了熵。

但假设你已经知道所有气体分子在哪里呢？

那么你实际上就能运行麦克斯韦妖并提取有用功。

所以（再次暂时忽略量子效应），如果你知道一杯热水中所有分子的状态，它在真正的热力学意义上就是冷的：你可以从中取出电能，留下一块冰块。

这并不违反刘维尔定理，因为如果 Y 是水，而你是麦克斯韦妖（记作 M），那么物理过程的表现如下：

M1,Y1 -> M1,Y1 M2,Y2 -> M2,Y1 M3,Y3 -> M3,Y1 M4,Y4 -> M4,Y1

因为麦克斯韦妖知道 Y 的精确状态，M 和 Y 之间存在互信息。互信息减少了 (M,Y) 的联合熵：H(M,Y) = H(M) + H(Y) - I(M;Y)。M 有 2 比特熵，Y 有 2 比特熵，它们的互信息是 2 比特，所以 (M,Y) 总共有 2 + 2 - 2 = 2 比特熵。这个物理过程只是将互信息的"冷度"（负熵）转化为使水实际变冷——之后，M 有 2 比特熵，Y 有 0 比特熵，互信息为 0。这没有任何问题！

别告诉我知识是"主观的"。知识必须在大脑中被表征，这使它和其他任何事物一样是物理的。为了让 M 的物理状态准确映像 Y 的状态，M 的物理状态必须与 Y 的状态相关联。你可以从热力学上利用这一点——这就叫做西拉德发动机。

或者如 E.T. 贾恩斯所说："古语云'知识就是力量'，无论在人际关系中还是在热力学中，都是一句非常深刻的真理。"

反过来，一个子系统不能在不（a）与另一个子系统交互且（b）做热力学功的情况下，增加与另一个子系统的互信息。

否则你就可以建造一个麦克斯韦妖，违反热力学第二定律——而这反过来将违反刘维尔定理——这在物理学的标准模型中是被禁止的。

也就是说：**要形成关于某事物的准确信念，你确实必须观察它。** 这是一个非常物理的、非常真实的过程：任何理性的心智在热力学意义上都在"做功"，而不只是在心理努力的意义上。

（有时有人说，做热力学功的是为准备下一次观察而擦除比特——但这种区分只是措辞和视角的问题；数学是明确的。）

（发现逻辑上的"真理"是一个复杂情况，我目前不打算考虑——部分原因是我自己对确切形式主义还在思考之中。在热力学中，对逻辑真理的了解不算作负熵；这与预期相符，因为可逆计算机可以以任意低的代价计算逻辑真理。我所说的这一切对逻辑全知者都成立：任何较弱的心智必然效率更低。）

"形成准确信念需要相应量的证据"，无论在人际关系中还是在热力学中都是一句非常深刻的真理：如果盲目信仰真的能作为一种调查方法奏效，你就可以将温水转化为电能和冰块。只需建造一个对分子速度抱有盲目信仰的麦克斯韦妖即可。

认知引擎与热机并没有太大不同，尽管它们以比燃烧汽油更微妙的形式操控熵。例如，一台认知引擎越是不完全高效，它就越需要散发废热，就像汽车发动机或冰箱一样。

"冷静理性"在好莱坞编剧从未想到过的意义上是真实的（而在他们确实想到过的意义上则是虚假的）。

所以，除非你能告诉我你的论证中哪个具体步骤通过给你提供对未见之物的真实知识而违反了物理定律，否则别指望我相信一个庞大、精巧、聪明的论证能做到这一点。

The Second Law of Thermodynamics, and Engines of Cognition热力学第二定律与认知引擎