云厂商核心账本架构解析
Internal Settlement Architecture of Cloud Service Providers: Engineering Constraints and Theoretical Realities
云服务商内部结算架构:工程约束与理论现实
I. Direct Conclusion: The Fundamental Anchor
一、直接结论(先给你定锚)
EN:
This report articulates an industry reality that is publicly accessible yet rarely elucidated in its entirety. The following section provides the conclusion immediately, followed by a layer-by-layer explanation based on engineering and theoretical constraints, rather than the external marketing narratives typically propagated by cloud vendors.
CN:
这是一个业内“公开但很少被完整讲清楚”的事实。我直接给出结论,然后从工程与理论约束逐层解释,而不是云厂商的对外叙述。
EN:
The cloud vendors' own "monetary ledger / core settlement / final resource billing ledger" is not, and theoretically cannot be, directly underpinned by a "purely distributed database."
CN:
云厂商自己的“钱账 / 核心结算 / 资源计费最终账本”,并不是、也不能是一个“纯分布式数据库”在直接承担。
EN:
This remains true even though these vendors:
Actively sell distributed database products to their customers;
Encourage customers to design systems that rely on eventual consistency and partition tolerance;
Emphasize the virtues of infinite scale and elasticity in their academic publications.
CN:
即使它们:
向客户出售分布式数据库
鼓励客户用最终一致、分区容忍
在论文里强调规模与弹性
EN:
However, within their internal systems—specifically those where "an error of a single cent is strictly prohibited"—there inevitably exists a system characterized by:
Strong Consistency: Ensuring all nodes see the same data at the same time.
Linearizability: Guaranteeing that operations appear to occur instantaneously at some point between their invocation and response.
A Definite Total Order Commit Point: A single timeline of events that is absolute.
Clear Boundaries of Responsibility: Unambiguous definition of liability.
CN:
但它们内部真正“不能错 1 分钱”的系统,一定存在一个:
强一致
可线性化
明确的全序提交点
明确的责任边界
II. Why "Pure Distributed Databases" Are Unsuitable for Internal Settlement
二、为什么“纯分布式数据库”不适合内部结算
EN:
In this context, the term "purely distributed" refers to systems exhibiting the following characteristics:
Multi-Active / Active-Active Architectures: Allowing writes to be accepted at multiple locations simultaneously.
Eventual Consistency / Configurable Consistency: Where data may be temporarily inconsistent across nodes.
Continuous Write Availability During Network Partitions: The system continues to accept data even when communication between nodes is broken.
Absence of a Single Authoritative Commit Point: Lacking a central authority to determine the final state.
CN:
这里的“纯分布式”指的是:
多活
最终一致 / 可配置一致
网络分区下继续写
没有单一权威提交点
1. The Constraints of Settlement Systems Differ Fundamentally from User Business Logic
1️⃣ 结算系统的约束与用户业务完全不同
EN:
Internal settlement systems are governed by four non-negotiable hard constraints:
CN:
内部结算具备四个不可妥协的硬约束:
EN:
👉 These four constraints are theoretically in conflict with the principles of "high-availability distributed writes."
CN:
👉 这四点与“高可用分布式写入”在理论上是冲突的。
2. The CAP Theorem is Not a Philosophical Question, But a Legal One
2️⃣ CAP 不是哲学问题,而是法律问题
EN:
In typical user business scenarios:
"Transient inconsistency" is generally acceptable.
"Correction at a later time" is generally acceptable.
CN:
在用户业务里:
“短暂不一致”可以接受
“稍后修正”可以接受
EN:
However, within the internal operations of cloud vendors:
The ledger is not merely a representation of business state; it constitutes a legal fact.
The following anomalies are strictly prohibited:
Double Submission: Processing the same transaction twice.
Double Rollback: Reverting a transaction more than once or erroneously.
Ambiguous Attribution: Uncertainty regarding ownership or liability.
CN:
在云厂商内部:
账本不是业务状态,是法律事实
不允许:
双重提交
双重回滚
模糊归属
EN:
Once financial auditing is involved, the architectural choice within the CAP theorem is not AP (Availability + Partition Tolerance), but rather Strong CP (Consistency + Partition Tolerance) combined with a Definite Arbiter.
CN:
一旦涉及财务审计,CAP 的选择不是 AP,而是 强 CP + 明确裁决者。
III. The Actual Architecture Adopted by Cloud Vendors (Critical Section)
三、云厂商真实采用的架构(非常关键)
⚠️ Note
⚠️ 注意
EN:
The following description represents the consensus structure at the level of engineering facts, distinct from the marketing architecture diagrams typically presented by any specific cloud provider.
CN:
下面是工程事实层面的共识结构,不是某一家云的营销架构图。
1. Frontend Layer: Highly Distributed, Eventually Consistent, Tolerant of Failure
1️⃣ 前端:高度分布式、最终一致、可失败
EN:
This layer comprises components distributed across:
Various Regions
Various Availability Zones (AZs)
Various Services
Various Agents
CN:
各区域
各 AZ
各服务
各代理
EN:
These components are responsible for:
Collecting usage metrics
Estimating incurred costs
Caching billing events
Temporarily displaying billing information
CN:
它们负责:
采集用量
估算费用
缓存计费事件
临时展示账单
EN:
At this layer, the system architecture permits:
Post-processing compensation for packet loss
Latency and delays
Retries
Data de-duplication
CN:
这里可以:
丢包后补
延迟
重试
去重
2. Middleware Layer: Event Aggregation + De-duplication + Validation
2️⃣ 中间层:事件归并 + 去重 + 校验
EN:
All incoming billing events are transformed into an immutable event stream.
Each event is characterized by:
A Unique Identifier (ID)
A Timestamp
A Source Indication
A Digital Signature / Validation mechanism
The system allows for events to arrive out of order.
CN:
所有计费事件变成 不可变事件流
每条事件有:
唯一 ID
时间戳
来源
签名 / 校验
允许乱序到达
EN:
However:
This layer is not yet the "Final Ledger."
It serves merely as the "Input Material for the Ledger."
CN:
但:
这里仍然不是“最终账本”
只是“账本输入材料”
3. Core Layer: The Strong Consistency "Final Ledger System"
3️⃣ 核心层:强一致的“最终账本系统”
EN:
This is the critical juncture of the architecture.
CN:
这是关键点。
Characteristics are Almost Identical Across Vendors:
特征几乎一致:
EN:
Single Master / Definite Master: A clear, singular authority for writes.
Total Order Writes: All transactions are processed in a strictly defined sequence.
Strict Transactions: Adherence to ACID properties without compromise.
Strong Audit Logs: Comprehensive and immutable recording of all actions.
Replayable: The ability to reconstruct the state from logs deterministically.
CN:
单主 / 明确主
全序写入
严格事务
强审计日志
可重放
EN:
The implementation methods may vary, but typically include:
High-reliability Relational Database Management Systems (internal systems)
Customized Transaction Processing Systems
Mainframe-based architectures
Or the cloud vendor's own "extremely conservative" internal databases (distinct from the commercial products sold externally).
CN:
实现方式可能是:
高可靠关系数据库(内部系统)
定制事务系统
或大型主机体系
或云厂商自己“极其保守”的内部数据库(不是对外卖的那种)
EN:
👉 This specific step will never operate in an AP (Availability first) mode, nor will it allow multi-active writes.
CN:
👉 这一步,绝不会是 AP 模式,也不会允许多活写。
IV. Why Is "Just Using Raft / Paxos" Insufficient?
四、为什么不能“用 Raft / Paxos 就好了”?
EN:
This question represents a common intuitive misconception held by many engineers.
CN:
这是很多工程师的直觉误区。
What Do Raft / Paxos Actually Guarantee?
Raft / Paxos 能保证什么?
EN:
Under the premise that the algorithms are "correctly implemented";
Under the premise that the system can "tolerate the failure of a minority of nodes";
They guarantee Log Consistency.
CN:
在“假设正确实现”的前提下
在“可容忍少数节点失效”的前提下
保证日志一致
Why This Is Not Enough for Internal Cloud Settlement:
但在云厂商内部结算中,还不够:
❌ Issue 1: Network Partitioning in Reality ≠ Theoretical Partitioning
❌ 问题 1:网络分区 ≠ 理论分区
EN:
Cloud vendors must contend with catastrophic physical events:
Large-scale fiber optic cable failures
Degradation of intercontinental communication links
Anomalies at the Control Plane level
These events trigger complex failure states:
Extended periods where the leader node is unreachable
Quorum jitter (instability in establishing a majority)
CN:
云厂商要处理的是:
大规模光纤故障
跨洲链路退化
控制平面级别异常
这类事件会触发:
长时间 leader 不可达
仲裁抖动
EN:
The ledger cannot simply "wait for the network to recover before deciding who is right or wrong."
CN:
账本不能“等网络恢复再决定谁对谁错”。
❌ Issue 2: Consistency ≠ Auditability
❌ 问题 2:一致 ≠ 可审计
EN:
Distributed log consistency does not equate to:
Legally enforceable traceability of accounts
The capability to recalculate balances based on a specific point in time
The ability to mathematically prove "this is the final, authoritative version"
CN:
分布式日志一致,不等于:
法律意义上的账目可追溯
可按时间点重算
可证明“这是最终版本”
EN:
Internal settlement systems require a higher standard:
"Even if the entire system fails, I must be able to use the logs and rules to deductively derive the single, unique, and correct result."
CN:
内部结算需要的是:
“即使全系统都挂了,我也能用日志和规则,推导出唯一正确结果。”
❌ Issue 3: Unclear Liability Boundaries Are a Disaster
❌ 问题 3:责任边界不清晰是灾难
EN:
In AP (Availability/Partition Tolerance) or Multi-Active systems:
Who is responsible for the final adjudication of the state?
Who bears the responsibility for incorrect account entries?
What is the definitive basis for a rollback operation?
CN:
在 AP / 多活系统中:
谁负责最终裁决?
谁对错误账目负责?
回滚依据是什么?
EN:
Cloud vendors cannot expose the "gray areas of the CAP theorem" to their Legal and Finance departments.
CN:
云厂商不能把“CAP 的灰色地带”暴露给法务与财务。
V. Why Do They Sell "Distributed Databases" But Not Use Them Internally?
五、为什么对外卖“分布式数据库”,自己不用?
EN:
This is a very realistic, yet rarely explicitly stated point:
CN:
这是非常现实、但很少直说的一点:
EN:
A failure in a customer's business is classified as an "SLA (Service Level Agreement) Event";
A failure in the cloud vendor's own ledger is classified as a "Financial Accident / Legal Accident."
CN:
客户业务的失败,叫“SLA 事件”;
云厂商自己账本的失败,叫“财务事故 / 法律事故”。
EN:
The risk levels associated with these two scenarios are completely different.
CN:
风险等级完全不同。
EN:
Therefore:
External Facing: Focus on configurable consistency, eventual consistency, and cross-region capabilities.
Internal Facing: Focus on conservatism, determinism, and a single source of truth.
CN:
所以:
对外:可配置一致性、最终一致、跨区域
对内:保守、确定、单一事实源
VI. A Crucial Cognitive Turning Point
六、一个非常重要的认知转折点
EN:
"Large Scale" does not equate to "Suitable for Distributed Writes."
CN:
“规模大”不等于“适合分布式写入”。
EN:
Search Indices: Suitable
Log Analysis: Suitable
Image Storage: Suitable
User Content: Suitable
CN:
搜索索引:适合
日志分析:适合
图片存储:适合
用户内容:适合
EN:
But:
Money
Inventory
Clearing / Settlement
Core Counters
CN:
但:
钱
库存
清算
核心计数器
EN:
👉 The essence of these problems is not throughput, but "Adjudication."
CN:
👉 这些问题的本质不是吞吐,而是“裁决”。
VII. One-Sentence Summary (Please Remember This)
七、一句话总结(请记住这句话)
EN:
Cloud vendors are not "ignorant of distributed systems," but rather they are acutely aware:
Some problems must have a place where "the final decision is made."
CN:
云厂商不是“不懂分布式”,而是非常清楚:
有些问题,必须有一个“最后说了算”的地方。
EN:
Therefore:
They delegate "Scale" to distributed systems.
They reserve "Truth" for the strong consistency ledger.
CN:
所以:
他们把“规模”留给分布式
把“真相”留给强一致账本
EN:
This is not a sign of backwardness, but the hallmark of mature engineering.
CN:
这不是落后,而是成熟工程的标志。
Postscript: Future Analytical Directions
后续讨论选项
EN:
If you are willing, I can proceed to delve deeper into one of the following directions:
CN:
如果你愿意,下一步我可以继续深入其中一个方向:
EN:
Why "Money / Inventory / Counters" are theoretically incapable of infinite horizontal scaling.
Why core banking ledgers remain "Centralized + Extremely Reliable" to this day.
Why many "Global Multi-Active Ledgers" ultimately revert to a single adjudication point.
CN:
1️⃣ 为什么“钱 / 库存 / 计数器”在理论上不可无限横向扩展
2️⃣ 为什么银行核心账本至今仍然是“集中式 + 极端可靠”
3️⃣ 为什么很多“全球多活账本”最终都会回退到单裁决点
EN:
You may select one, and I can continue to explain down to the "paper-grade + engineering-grade" details.
CN:
你选一个,我可以继续讲到“论文级 + 工程级”细节。