28 · Argument Screens Off Authority — LessWrong 双语精读

Concise Summary简洁概述

Yudkowsky contrasts two kinds of evidence: personal authority and explicit argument. At first they seem symmetrical — both can raise or lower your credence. But a careful thought-experiment reveals an asymmetry: once you have heard the best possible argument with all its inferential steps, knowing the speaker's credentials adds almost nothing. The reverse is not true — knowing an expert believes something still leaves you curious about why. Using the formalism of causal graphs and D-separation from Judea Pearl's work, Yudkowsky shows that the Argument node screens off the Truth node from the Expert Belief node. Authority is a proxy for argument; argument is the thing itself. In practice some residual role for authority remains, but it is much weaker and easily overwhelmed by a good case.

Yudkowsky 对比了两种证据：个人权威与显式论据。乍看两者对称——都能提高或降低你的可信度。但一个思想实验揭示了不对称性：一旦你听到了最佳论据（包含所有推理步骤），了解说话者的资质几乎毫无额外价值。反过来则不然——知道某位专家相信某事，你仍会好奇他为何这样相信。借助朱迪亚·珀尔工作中的因果图与 D-分离形式，Yudkowsky 说明「论据」节点屏蔽了「真相」节点与「专家信念」节点之间的路径。权威是论据的代理；论据才是本体。实践中权威仍有少量残余作用，但远比论据弱，极易被一个好的论证所淹没。

Infographic信息图

2 scenarios

Barry/Charles (authority) vs. David/Ernie (argument)

巴里/查尔斯（权威）对比大卫/欧尼（论据）

P(T|A,E)=P(T|A)

argument screens off expert belief from truth

论据屏蔽专家信念对真相的作用

~0 extra

information authority adds once argument is fully known

论据完全已知后，权威所能增加的额外信息

⚖️

The Apparent Symmetry

表面的对称

Strong authority and strong argument both raise probability; weak versions lower it. They look like parallel kinds of evidence.

强权威和强论据都能提高概率；弱版本则降低概率。两者看起来像并行的证据类型。

🔍

The Hidden Asymmetry

隐藏的不对称

When arguments are equally good, credentials barely matter. But when credentials are equal, argument quality still dominates — argument screens off authority, not the reverse.

当论据同样好时，资质几乎无关紧要。但当资质相当时，论据质量仍然主导——论据屏蔽权威，而非相反。

🕸️

Causal Graphs and D-Separation

因果图与D-分离

Truth causes Arguments; Arguments cause Expert Belief. If you know the Argument node, it D-separates Truth from Expert Belief — no information flows through.

真相产生论据；论据产生专家信念。若你知道论据节点的值，它就 D-分离了真相与专家信念——信息不再流通。

🌧️

The Sprinkler Analogy

洒水器类比

Night causes Sprinkler, Sprinkler causes Slippery. Once you know the Sprinkler is on, whether it is Night becomes irrelevant to predicting Slipperiness — screened off.

夜晚导致洒水器开启，洒水器导致地面湿滑。一旦知道洒水器开着，是否是夜晚对湿滑的预测就变得无关——被屏蔽了。

📚

Residual Role of Authority

权威的残余作用

Good authorities know counterevidence you may not. Judging inferential steps may need thirty years of experience you cannot replicate. Authority remains real but small.

好的权威知道你可能不知道的反驳证据；判断推理步骤的力度可能需要三十年无法复制的经验。权威依然真实，但微弱。

The argument, step by step

论证的推进链条

Scenario 1: authority alone — Barry (expert) vs. Charles (delinquent) giving bare assertions.

场景一：纯权威——巴里（专家）对比查尔斯（少年犯）只做裸断言。

Scenario 2: argument alone — David (good case) vs. Ernie (weak case), credentials unknown.

场景二：纯论据——大卫（好论证）对比欧尼（弱论证），资质未知。

Cross-test: equalize arguments for Barry/Charles, equalize credentials for David/Ernie — asymmetry emerges.

交叉检验：让巴里/查尔斯提供同等论证，让大卫/欧尼共享同等资质——不对称性浮现。

Causal structure: Truth -> Arguments -> Expert Belief; knowing Argument node D-separates Truth from Expert.

因果结构：真相→论据→专家信念；知道论据节点使真相与专家 D-分离。

Sprinkler analogy: Night -> Sprinkler -> Slippery; knowing Sprinkler screens off Night.

洒水器类比：夜晚→洒水器→湿滑；知道洒水器状态屏蔽了夜晚信息。

Conclusion: argument eliminates reliance on authority; residual authority is small and ceteris paribus only.

结论：论据消除了对权威的依赖；权威的残余作用微弱，且仅在条件相同时成立。

Detailed Summary详细概述

The Opening Contrast

Yudkowsky opens with two parallel scenarios. In the first, Arthur hears a geological claim from Barry (a famous expert) and assigns 90% probability, then the same claim from Charles (a juvenile delinquent) and assigns 10%. In the second, Arthur hears a physics claim with a detailed, well-sourced argument from David and assigns 90%, then hears the same claim with a weak argument from Ernie and assigns 10%. Both scenarios look symmetric: strong evidence yields 90%, weak yields 10%.

The Asymmetry Revealed

The essay then runs a cross-test. Suppose Barry and Charles are asked to make full technical cases and give equally good ones. Credentials now matter little for David vs. Ernie, but argument quality still dominates the Barry vs. Charles comparison. A good technical argument eliminates reliance on personal authority of the speaker. Knowing the full argument makes the speaker's credentials nearly irrelevant. But knowing only credentials still leaves you wanting to hear the argument.

Yudkowsky's key sentence: "A good technical argument is one that eliminates reliance on the personal authority of the speaker."

Probability Theory and Screening Off

Far from contradicting Bayesian probability, this asymmetry falls out naturally from causal graph theory. The essay introduces screening off via the sprinkler example:

Night → Sprinkler → Slippery sidewalk

Once you know the Sprinkler is on, knowing it is Night adds nothing to your estimate of Slipperiness. Formally: P(Slippery | Night, Sprinkler) = P(Slippery | Sprinkler). Night is screened off by Sprinkler. A dice example also illustrates: Die1 and Die2 are independent, but given the Sum, knowing one die tells you the other. The direction of causal arrows matters.

The causal diagram for argument and authority is analogous:

Truth → Arguments → Expert Belief

If something is true, it tends to have arguments supporting it; experts observe those arguments and update. If you already know the full argument, the Expert Belief node is D-separated from Truth — P(truth | argument, expert) = P(truth | argument). The D-separation criterion from Judea Pearl's Probabilistic Reasoning in Intelligent Systems formalizes this. No new kind of probability is needed; authority and argument are not ontologically different evidence, any more than sprinklers are made of different stuff than sunlight.

Limits: Authority Is Never Fully Eliminable

Yudkowsky is careful not to overstate. Several factors preserve a residual role for authority:

Good authorities are more likely to know about counterevidence you have not heard.
Judging the strength of inferential steps may require intuitions built over thirty years of experience you cannot replicate merely by reading.

So there is "an ineradicable legitimacy" to slightly preferring E. T. Jaynes's word over a newcomer's on Bayesian probability. But this slight edge is only ceteris paribus — easily overwhelmed by a good explicit argument. The essay closes personally: Yudkowsky found an erratum in one of Jaynes's books. Algebra trumps authority.

开场的对比

Yudkowsky 以两个平行场景开篇。第一个场景：亚瑟听到巴里（著名地质学家）做出反直觉的地质断言，给出 90% 的可信度；听到查尔斯（有案底的少年犯）做出同样断言，给出 10%。第二个场景：亚瑟听到大卫提出物理学主张并给出详细的有参考文献的论证，给出 90%；听到欧尼给出漏洞百出的论证，给出 10%。两个场景看似对称：强证据得 90%，弱证据得 10%。

不对称性的揭示

文章随后进行交叉检验。假设要求巴里与查尔斯各自提出完整的技术论证——两人质量相同。此时资质对大卫与欧尼几乎无关，但论据质量仍然主导巴里与查尔斯的比较。一个好的技术论证消除了对说话者个人权威的依赖。知道完整论据，说话者的资质几乎无关紧要。但仅知道资质，你仍会想听论据。

Yudkowsky 的关键句：「好的技术论证能消除对说话者个人权威的依赖。」

概率论与屏蔽效应

这种不对称性并非对贝叶斯概率论的矛盾，而是从因果图理论中自然涌现的。文章通过洒水器例子引入屏蔽概念：

夜晚 → 洒水器 → 地面湿滑

一旦你知道洒水器开着，知道是否是夜晚对湿滑估计毫无帮助。形式上：P(湿滑 | 夜晚, 洒水器) = P(湿滑 | 洒水器)。夜晚被洒水器屏蔽。骰子例子也说明了这点：骰子一和骰子二独立，但在已知总和的条件下，知道一个骰子就能推出另一个。因果箭头的方向至关重要。

论据与权威的因果图与此类比：

真相 → 论据 → 专家信念

若某事为真，它倾向于有支持性论据；专家观察论据并更新信念。若你已知完整论据，专家信念节点被 D-分离于真相之外——P(真相 | 论据, 专家) = P(真相 | 论据)。来自朱迪亚·珀尔《智能系统中的概率推理》的 D-分离准则将此形式化。无需新类型的概率；权威与论据不是本体论不同的证据，正如洒水器不是由与阳光不同的材料构成。

局限：权威永远无法完全消除

Yudkowsky 谨慎地避免过度宣称。若干因素保留了权威的残余作用：

好的权威更可能知道你尚未听说的反驳证据。
判断推理步骤的力度可能需要三十年经验积累的直觉，而这无法仅靠阅读复制。

因此，在贝叶斯概率问题上，略微倾向相信 E. T. 贾因斯而非新手，存在「不可消除的正当性」。但这种微弱优势仅在条件相同时成立——极易被好的显式论证所淹没。文章以私人注脚收尾：Yudkowsky 在贾因斯的一本书里发现了错误。代数胜过权威。

FAQ常见问答

What does it mean for argument to "screen off" authority?论据「屏蔽」权威是什么意思？

Screening off is a precise probabilistic concept from causal graph theory. It means that once you know the value of an intermediate node (Argument), information about a downstream node (Expert Belief) no longer updates your probability of the hypothesis (Truth). Formally, P(Truth | Argument, Expert) = P(Truth | Argument). The expert's belief is rendered redundant — not worthless in general, but worthless given the argument.

屏蔽是因果图理论中一个精确的概率概念。它意味着：一旦你知道中间节点（论据）的值，关于下游节点（专家信念）的信息就不再更新你对假设（真相）的概率。形式上：P(真相 | 论据, 专家) = P(真相 | 论据)。专家的信念变得多余——不是普遍无价值，而是在已知论据的条件下无价值。

Doesn't this imply we should always ignore experts?这是否意味着我们应该永远忽视专家？

No — and Yudkowsky explicitly says so. Authority retains a small but real role. Experts are more likely to know counterevidence you have not encountered, and their intuitions about inferential steps may encode decades of experience. The point is that authority is asymmetrically weaker than argument, not that it is useless. In the absence of a strong argument, deferring to good authority is entirely rational.

不——Yudkowsky 明确这样说。权威保留了小但真实的作用。专家更可能知道你尚未遇到的反驳证据，他们对推理步骤的直觉可能编码了数十年的经验。关键在于：权威不对称地弱于论据，而非权威毫无价值。在缺乏强论证的情况下，尊重好的权威完全是理性的。

How does the sprinkler example relate to argument and authority?洒水器例子与论据和权威有何关联？

The sprinkler chain (Night → Sprinkler → Slippery) is a concrete illustration of D-separation in a causal graph. It shows that when one node screens off another, knowing the intermediate variable makes the upstream variable irrelevant. The argument-and-authority chain (Truth → Argument → Expert Belief) has the same causal structure, so the same mathematical rules apply. Yudkowsky uses the mundane case to make the abstract case tangible.

洒水器链（夜晚→洒水器→湿滑）是因果图中 D-分离的具体说明。它展示了：当中间节点屏蔽另一个节点时，知道中间变量就使上游变量变得无关。论据与权威的链（真相→论据→专家信念）具有相同的因果结构，因此适用相同的数学规则。Yudkowsky 用日常案例让抽象内容变得具体可感。

Is this compatible with ordinary Bayesian probability theory?这与普通贝叶斯概率论兼容吗？

Yes — that is part of Yudkowsky's point. He anticipates a reader thinking argument and authority must require two different kinds of probability. But the asymmetry is just a consequence of the probabilistic dependence structure between variables, expressible in standard Bayesian terms via causal graphs. No new epistemology is needed.

是的——这正是 Yudkowsky 论点的一部分。他预料到读者会认为论据与权威必须需要两种不同类型的概率。但这种不对称性只是变量间概率依赖结构的结果，可通过因果图以标准贝叶斯术语表达。无需新的认识论。

What is D-separation, and do I need to fully understand it to follow the essay?D-分离是什么？我需要完全理解它才能读懂这篇文章吗？

D-separation is a criterion for reading conditional independences off causal graphs, developed by Judea Pearl. Yudkowsky recommends Pearl's Probabilistic Reasoning in Intelligent Systems and Causality for the full details. You do not need to master it to follow the essay's main point — the sprinkler and dice examples convey the intuition. Full technical fluency lets you verify rather than trust the claim.

D-分离是由朱迪亚·珀尔开发的、用于从因果图读取条件独立性的准则。Yudkowsky 推荐珀尔的《智能系统中的概率推理》与《因果论》以获取完整细节。你不需要掌握它就能理解文章主要论点——洒水器和骰子例子传达了直觉。完全的技术掌握让你能够验证而非信任这一主张。

Why does Yudkowsky end with finding an erratum in Jaynes's book?Yudkowsky 为何以发现贾因斯著作中的勘误作结？

It is a rhetorical proof of concept. Yudkowsky has just argued that authority should yield to good arguments, and he closes by enacting this claim: E. T. Jaynes is one of the most respected Bayesian theorists, yet Yudkowsky found a mathematical error in his book. "Algebra trumps authority" is the thesis demonstrated, not merely stated.

这是一个修辞上的概念验证。Yudkowsky 刚刚论证了权威应当向好的论证让步，他用自己的案例结束来实践这一主张：E. T. 贾因斯是最受尊敬的贝叶斯理论家之一，但 Yudkowsky 在他的书中发现了数学错误。「代数胜过权威」——这是被付诸实践的论点，而非仅仅被陈述。

In-depth Analysis · Pros & Cons深入解读 · 优缺点

This essay is one of the most technically precise in the Sequences, using causal graph formalism to dissolve what might seem like a deep puzzle about epistemology. It converts a qualitative intuition ("arguments feel different from credentials") into a quantitative structural claim derivable from standard probability theory.

这篇文章是序列中技术最为精确的文章之一，它使用因果图形式主义消解了一个看似深刻的认识论谜题。它将一种定性直觉（「论据感觉与资质不同」）转化为一种可从标准概率论推导出的定量结构性主张。

✓Strengths亮点 / 优点

Genuine technical grounding
真正的技术基础
Unlike many essays that invoke Bayesianism loosely, this one presents an actual formal structure — causal graphs, D-separation, Pearl's criterion — and shows exactly where the asymmetry comes from.
与许多宽泛援引贝叶斯主义的文章不同，这篇文章提出了真实的形式结构——因果图、D-分离、珀尔准则——并精确说明了不对称性的来源。
Concreteness of analogies
类比的具体性
The sprinkler/night and dice-roll/sum examples are genuinely illuminating, not decorative. They instantiate the same mathematical structure as the argument/authority case, making the abstraction touchable.
洒水器/夜晚与骰子/总和的例子是真正启发性的，而非装饰性的。它们实例化了与论据/权威情形相同的数学结构，使抽象内容变得可触及。
Honest about residual authority
对权威残余作用诚实
Yudkowsky does not claim argument fully eliminates authority. He carves out a genuine small niche for credentials — counterevidence access, experience-based intuition — preventing the essay from becoming a manifesto against expertise.
Yudkowsky 没有声称论据完全消除权威。他为资质划出了真实而小的利基——反驳证据的获取、经验直觉——防止了文章变成一份反专业知识的宣言。
Rhetorically self-demonstrating
修辞上自我验证
Closing with the Jaynes erratum enacts the thesis rather than just restating it — a rare and satisfying move in argumentative writing.
以贾因斯勘误作结，是在实践论点而非仅仅重申它——在论辩性写作中是罕见且令人满足的手法。

▲Limits & Critiques局限 / 批评

The causal model is stipulated, not derived
因果模型是被规定的，而非被推导的
The key insight depends entirely on positing Truth → Argument → Expert Belief as the right causal direction. In adversarial or motivated-reasoning contexts, experts sometimes generate post-hoc arguments for preferred conclusions — reversing the arrow. The essay does not confront this.
核心洞见完全取决于将「真相→论据→专家信念」设定为正确的因果方向。在对抗性或动机性推理的情境中，专家有时会为偏好的结论事后生成论据——箭头方向逆转。文章没有正视这一点。
Assumes the listener can fully evaluate the argument
假设听者能完全评估论据
Yudkowsky explicitly notes this caveat ("otherwise they're just impressive noises"), but the essay then proceeds as if the ideal case were the general case. In practice, most listeners cannot fully evaluate technical arguments, making the residual role of authority much larger than the essay implies.
Yudkowsky 明确注意到了这一警告（「否则只是令人印象深刻的噪音」），但文章随后像把理想情形当作一般情形那样继续。实践中，大多数听者无法完全评估技术论据，使得权威的残余作用远大于文章所暗示的。
D-separation is presented as obvious, but its application is subtle
D-分离被呈现为显而易见，但其应用是微妙的
Yudkowsky calls path-blocking "pretty obvious in this case" and refers readers to Pearl for the full criterion. In fact, D-separation rules are notoriously tricky — especially in graphs with colliders. The apparent ease may mislead readers into thinking they can apply the concept without the technical depth.
Yudkowsky 称路径阻断在这种情况下「相当明显」，并将读者引向珀尔。实际上，D-分离规则出了名地棘手——尤其在有对撞节点的图中。表面上的简易可能误导读者认为他们无需技术深度就能应用这一概念。
"Best argument" may not equal "complete argument"
「最佳论据」未必等于「完整论据」
The screenoff result requires the argument to encode all inferential steps and all evidence considered. But what an expert presents as their "best argument" may omit implicitly relied-upon evidence, tacit domain knowledge, or steps they cannot articulate — meaning Expert Belief is not truly screened off even when a full argument is presented.
屏蔽结果要求论据编码所有推理步骤和所有被考量的证据。但专家呈现为「最佳论据」的内容可能省略了隐性依赖的证据、默会领域知识或无法言明的步骤——这意味着即使呈现了「完整」论据，专家信念节点也并未被真正屏蔽。

Bottom line

总评

One of the Sequences' most rigorous essays, offering a clean formal account of why explicit reasoning dominates credentialism. The causal-graph framing is both correct and genuinely useful. The main limitation is that its ideal-case result — listener can fully evaluate, argument is truly complete — rarely holds in practice, leaving more room for authority than the essay lets on.

这是序列中最严谨的文章之一，提供了为何显式推理优于资历主义的清晰形式解释。因果图框架既正确又真正有用。主要局限在于：其理想情形结果——听者能完全评估、论据真正完整——在实践中鲜少成立，这为权威留下了比文章所承认的更多空间。

Original Text原文

Read on LessWrong ↗在 LessWrong 阅读原文 ↗

Scenario 1: Barry is a famous geologist. Charles is a fourteen-year-old juvenile delinquent with a long arrest record and occasional psychotic episodes. Barry flatly asserts to Arthur some counterintuitive statement about rocks, and Arthur judges it 90% probable. Then Charles makes an equally counterintuitive flat assertion about rocks, and Arthur judges it 10% probable. Clearly, Arthur is taking the speaker’s authority into account in deciding whether to believe the speaker’s assertions.

Scenario 2: David makes a counterintuitive statement about physics and gives Arthur a detailed explanation of the arguments, including references. Ernie makes an equally counterintuitive statement, but gives an unconvincing argument involving several leaps of faith. Both David and Ernie assert that this is the best explanation they can possibly give (to anyone, not just Arthur). Arthur assigns 90% probability to David’s statement after hearing his explanation, but assigns a 10% probability to Ernie’s statement.

It might seem like these two scenarios are roughly symmetrical: both involve taking into account useful evidence, whether strong versus weak authority, or strong versus weak argument.

But now suppose that Arthur asks Barry and Charles to make full technical cases, with references; and that Barry and Charles present equally good cases, and Arthur looks up the references and they check out. Then Arthur asks David and Ernie for their credentials, and it turns out that David and Ernie have roughly the same credentials—maybe they’re both clowns, maybe they’re both physicists.

Assuming that Arthur is knowledgeable enough to understand all the technical arguments—otherwise they’re just impressive noises—it seems that Arthur should view David as having a great advantage in plausibility over Ernie, while Barry has at best a minor advantage over Charles.

Indeed, if the technical arguments are good enough, Barry’s advantage over Charles may not be worth tracking. A good technical argument is one that eliminates reliance on the personal authority of the speaker.

Similarly, if we really believe Ernie that the argument he gave is the best argument he could give, which includes all of the inferential steps that Ernie executed, and all of the support that Ernie took into account—citing any authorities that Ernie may have listened to himself—then we can pretty much ignore any information about Ernie’s credentials. Ernie can be a physicist or a clown, it shouldn’t matter. (Again, this assumes we have enough technical ability to process the argument. Otherwise, Ernie is simply uttering mystical syllables, and whether we “believe” these syllables depends a great deal on his authority.)

So it seems there’s an asymmetry between argument and authority. If we know authority we are still interested in hearing the arguments; but if we know the arguments fully, we have very little left to learn from authority.

Clearly (says the novice) authority and argument are fundamentally different kinds of evidence, a difference unaccountable in the boringly clean methods of Bayesian probability theory.^1^ For while the strength of the evidences—90% versus 10%—is just the same in both cases, they do not behave similarly when combined. How will we account for this?

Here’s half a technical demonstration of how to represent this difference in probability theory. (The rest you can take on my personal authority, or look up in the references.)

If P(H|E1) = 90% and P(H|E2) = 9%, what is the probability P(H|E1,E2)? If learning E1 is true leads us to assign 90% probability to H, and learning E2 is true leads us to assign 9% probability to H, then what probability should we assign to H if we learn both E1 and E2? This is simply not something you can calculate in probability theory from the information given. No, the missing information is not the prior probability of H. The events E1 and E2 may not be independent of each other.

Suppose that H is “My sidewalk is slippery,” E1 is “My sprinkler is running,” and E2 is “It’s night.” The sidewalk is slippery starting from one minute after the sprinkler starts, until just after the sprinkler finishes, and the sprinkler runs for ten minutes. So if we know the sprinkler is on, the probability is 90% that the sidewalk is slippery. The sprinkler is on during 10% of the nighttime, so if we know that it’s night, the probability of the sidewalk being slippery is 9%. If we know that it’s night and the sprinkler is on—that is, if we know both facts—the probability of the sidewalk being slippery is 90%.

We can represent this in a graphical model as follows:

Whether or not it’s Night causes the Sprinkler to be on or off, and whether the Sprinkler is on causes the sidewalk to be Slippery or unSlippery.

The direction of the arrows is meaningful. Say we had:

This would mean that, if I didn’t know anything about the sprinkler, the probability of Nighttime and Slipperiness would be independent of each other. For example, suppose that I roll Die One and Die Two, and add up the showing numbers to get the Sum:

If you don’t tell me the sum of the two numbers, and you tell me the first die showed 6, this doesn’t tell me anything about the result of the second die, yet. But if you now also tell me the sum is 7, I know the second die showed 1.

Figuring out when various pieces of information are dependent or independent of each other, given various background knowledge, actually turns into a quite technical topic. The books to read are Judea Pearl’s Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference and Causality: Models, Reasoning, and Inference. (If you only have time to read one book, read the first one.)

If you know how to read causal graphs, then you look at the dice-roll graph and immediately see:

P(Die 1,Die 2) = P(Die 1) ✕ P(Die 2)

P(Die 1,Die 2|Sum) ≠ P(Die 1)|Sum) ✕ P(Die 2|Sum) .

If you look at the correct sidewalk diagram, you see facts like:

P(Slippery|Night) ≠ P(Slippery)

P(Slippery|Sprinkler) ≠ P(Slippery)

P(Slippery|Night,Sprinkler) = P(Slippery|Sprinkler) .

That is, the probability of the sidewalk being Slippery, given knowledge about the Sprinkler and the Night, is the same probability we would assign if we knew only about the Sprinkler. Knowledge of the Sprinkler has made knowledge of the Night irrelevant to inferences about Slipperiness.

This is known as screening off, and the criterion that lets us read such conditional independences off causal graphs is known as D-separation.

For the case of argument and authority, the causal diagram looks like this:

If something is true, then it therefore tends to have arguments in favor of it, and the experts therefore observe these evidences and change their opinions. (In theory!)

If we see that an expert believes something, we infer back to the existence of evidence-in-the-abstract (even though we don’t know what that evidence is exactly), and from the existence of this abstract evidence, we infer back to the truth of the proposition.

But if we know the value of the Argument node, this D-separates the node “Truth” from the node “Expert Belief” by blocking all paths between them, according to certain technical criteria for “path blocking” that seem pretty obvious in this case. So even without checking the exact probability distribution, we can read off from the graph that:

P(truth|argument,expert) = P(truth|argument) .

This does not represent a contradiction of ordinary probability theory. It’s just a more compact way of expressing certain probabilistic facts. You could read the same equalities and inequalities off an unadorned probability distribution—but it would be harder to see it by eyeballing. Authority and argument don’t need two different kinds of probability, any more than sprinklers are made out of ontologically different stuff than sunlight.

In practice you can never completely eliminate reliance on authority. Good authorities are more likely to know about any counterevidence that exists and should be taken into account; a lesser authority is less likely to know this, which makes their arguments less reliable. This is not a factor you can eliminate merely by hearing the evidence they did take into account.

It’s also very hard to reduce arguments to pure math; and otherwise, judging the strength of an inferential step may rely on intuitions you can’t duplicate without the same thirty years of experience.

There is an ineradicable legitimacy to assigning slightly higher probability to what E. T. Jaynes tells you about Bayesian probability, than you assign to Eliezer Yudkowsky making the exact same statement. Fifty additional years of experience should not count for literally zero influence.

But this slight strength of authority is only ceteris paribus, and can easily be overwhelmed by stronger arguments. I have a minor erratum in one of Jaynes’s books—because algebra trumps authority.

^1^See “What Is Evidence?” in Map and Territory.

场景一：巴里是著名的地质学家。查尔斯是一个有长期逮捕记录和偶发精神病发作的十四岁少年犯。巴里向亚瑟断然断言了一些关于岩石的反直觉陈述，亚瑟判断其为 90% 可能正确。然后查尔斯做出了同样反直觉的关于岩石的断言，亚瑟判断其为 10% 可能正确。很明显，亚瑟在决定是否相信说话者的断言时，考虑了说话者的权威。

场景二：大卫对物理学做出了一个反直觉的陈述，并给了亚瑟一个详细的论证解释，包括参考文献。欧尼做出了同样反直觉的陈述，但给出了一个包含若干信仰跨越的不令人信服的论证。大卫和欧尼都声称这是他们能给出的最佳解释（对任何人，不仅仅是亚瑟）。亚瑟在听了大卫的解释后，给大卫的陈述赋予 90% 的概率，但给欧尼的陈述赋予 10% 的概率。

这两个场景看起来大致对称：两者都涉及考虑有用的证据，无论是强与弱的权威，还是强与弱的论证。

但现在假设亚瑟要求巴里和查尔斯提出完整的技术案例，附带参考文献；巴里和查尔斯呈现了同等好的案例，亚瑟查阅了参考文献，均属实。然后亚瑟询问大卫和欧尼的资质，结果发现大卫和欧尼的资质大致相同——也许两人都是小丑，也许两人都是物理学家。

假设亚瑟足够有知识来理解所有技术论证——否则这些论证只是令人印象深刻的噪音——亚瑟应该认为大卫比欧尼有很大的可信度优势，而巴里相对于查尔斯最多只有微小的优势。

事实上，如果技术论证足够好，巴里相对于查尔斯的优势可能不值得追踪。好的技术论证是那种消除对说话者个人权威依赖的论证。

同样，如果我们真的相信欧尼说的——他给出的论证是他能够给出的最佳论证，包括欧尼执行的所有推理步骤，以及欧尼所考虑的所有支持——引用欧尼自己可能听说过的任何权威——那么我们几乎可以忽略任何关于欧尼资质的信息。欧尼可以是物理学家或小丑，这都无所谓。（同样，这假设我们有足够的技术能力来处理这个论证。否则，欧尼只是在发出神秘的音节，而我们是否「相信」这些音节在很大程度上取决于他的权威。）

所以论证和权威之间似乎存在不对称性。如果我们知道权威，我们仍然有兴趣听论证；但如果我们完全了解论证，我们从权威那里几乎没有什么可学的了。

显然（新手说）权威和论证是从根本上不同的两种证据，这种差异无法用贝叶斯概率论那种无聊简洁的方法来解释。^1^ 因为虽然两种证据的强度——90% 对 10%——是完全相同的，但它们在组合时表现不同。我们如何解释这一点？

以下是如何在概率论中表示这种差异的一半技术演示。（其余部分你可以依赖我个人的权威，或者查阅参考文献。）

如果 P(H|E1) = 90% 且 P(H|E2) = 9%，那么概率 P(H|E1,E2) 是多少？如果得知 E1 为真使我们将 H 的概率赋为 90%，得知 E2 为真使我们将 H 的概率赋为 9%，那么如果我们同时得知 E1 和 E2，我们应该将 H 的概率赋为多少？这不是你能从给定信息在概率论中计算出来的事情。不，缺少的信息不是 H 的先验概率。事件 E1 和 E2 可能不是相互独立的。

假设 H 是「我的人行道是滑的」，E1 是「我的洒水器正在运行」，E2 是「现在是夜晚」。从洒水器启动一分钟后到洒水器结束后不久，人行道是滑的，洒水器运行十分钟。因此，如果我们知道洒水器在运行，人行道滑的概率是 90%。洒水器在 10% 的夜间时间里运行，所以如果我们知道是夜晚，人行道滑的概率是 9%。如果我们知道现在是夜晚且洒水器在运行——也就是说，如果我们同时知道这两个事实——人行道滑的概率是 90%。

我们可以用图形模型如下表示：

是否是夜晚会导致洒水器开启或关闭，而洒水器是否开启会导致人行道滑或不滑。

箭头的方向是有意义的。假设我们有：

这将意味着，如果我不知道洒水器的任何信息，夜晚和湿滑的概率将彼此独立。例如，假设我掷第一个骰子和第二个骰子，并将显示的数字相加得到总和：

如果你不告诉我两个数字的总和，而你告诉我第一个骰子显示 6，这还不能告诉我第二个骰子的结果。但如果你现在还告诉我总和是 7，我就知道第二个骰子显示 1。

弄清楚在各种背景知识条件下，各种信息片段彼此是依赖还是独立的，实际上变成了一个相当技术性的话题。要读的书是朱迪亚·珀尔的《智能系统中的概率推理：似然推断网络》和《因果论：模型、推理与推断》。（如果你只有时间读一本书，读第一本。）

如果你知道如何阅读因果图，那么你看着骰子滚动图，立即看到：

P(骰子1, 骰子2) = P(骰子1) × P(骰子2)

P(骰子1, 骰子2|总和) ≠ P(骰子1|总和) × P(骰子2|总和)。

如果你看正确的人行道图，你会看到这样的事实：

P(湿滑|夜晚) ≠ P(湿滑)

P(湿滑|洒水器) ≠ P(湿滑)

P(湿滑|夜晚, 洒水器) = P(湿滑|洒水器)。

也就是说，在已知洒水器和夜晚信息的情况下，人行道湿滑的概率，与我们只知道洒水器信息时赋予的概率相同。对洒水器的了解使得对夜晚的了解与推断湿滑无关。

这被称为屏蔽，允许我们从因果图中读取此类条件独立性的标准被称为 D-分离。

对于论证和权威的情况，因果图如下所示：

如果某事是真的，那么它因此倾向于有支持它的论证，专家因此观察这些证据并改变他们的意见。（理论上！）

如果我们看到一位专家相信某事，我们推断出抽象意义上的证据的存在（即使我们不知道那个证据究竟是什么），并从这个抽象证据的存在，推断回命题的真实性。

但如果我们知道「论证」节点的值，这就根据某些技术性的「路径阻断」标准（在这种情况下看起来相当明显），D-分离了「真相」节点与「专家信念」节点，阻断了它们之间的所有路径。因此，即使不检查精确的概率分布，我们也可以从图中读出：

P(真相|论证, 专家) = P(真相|论证)。

这并不代表普通概率论的矛盾。它只是表达某些概率事实的更紧凑方式。你可以从未经修饰的概率分布中读取相同的等式和不等式——但通过直觉观察会更难看出。权威和论证不需要两种不同类型的概率，就像洒水器不是由与阳光本体论不同的材料构成一样。

在实践中，你永远无法完全消除对权威的依赖。好的权威更可能了解任何存在的、应当被考虑的反驳证据；较差的权威不太可能了解这些，这使得他们的论证不那么可靠。这不是仅仅通过听取他们确实考虑过的证据就能消除的因素。

将论证简化为纯粹的数学也非常困难；否则，判断推理步骤的强度可能依赖于没有同样三十年经验就无法复制的直觉。

在贝叶斯概率方面，给 E. T. 贾因斯告诉你的事情赋予略高的概率，而不是给 Eliezer Yudkowsky 做出完全相同陈述时赋予，存在不可消除的正当性。五十年的额外经验不应该被字面意义上地计为零影响。

但这种微弱的权威优势仅在其他条件相同时成立，很容易被更强的论证所压倒。我在贾因斯的一本书里发现了一个小勘误——因为代数胜过权威。

^1^参见《地图与疆域》中的「什么是证据？」。

Argument Screens Off Authority论据屏蔽权威