feat: 完整中文翻译 maths-cs-ai-compendium（数学·计算机科学·AI 知识大全）

翻译自英文原版 maths-cs-ai-compendium，共 20 章全部完成。第01章向量 | 第02章矩阵 | 第03章微积分第04章统计学 | 第05章概率论 | 第06章机器学习第07章计算语言学 | 第08章计算机视觉 | 第09章音频与语音第10章多模态学习 | 第11章自主系统 | 第12章图神经网络第13章计算与操作系统 | 第14章数据结构与算法第15章生产级软件工程 | 第16章 SIMD与GPU编程第17章 AI推理 | 第18章 ML系统设计第19章应用人工智能 | 第20章前沿人工智能翻译说明： - 所有数学公式 $...$ / $$...$$、代码块、图片引用完整保留 - mkdocs.yml 配置中文导航 + language: zh - README.md 已翻译为中文（兼 docs/index.md） - docs/ 目录包含指向各章文件的 symlink - 约 29,000 行中文内容，排除 .cache/ 构建缓存
2026-05-03 10:23:20 +08:00
commit 2536c937e3
400 changed files with 49040 additions and 0 deletions
@@ -0,0 +1,322 @@
+# 测试与质量保障
+
+*测试是你如何确保代码正常工作的方法——不仅是现在，而且在每次更改后都能正常工作。本文涵盖测试金字塔、使用 pytest 进行的单元测试、Mock、测试机器学习特定代码、CI/CD 管道、代码检查、格式化和代码审查——这些实践能在错误到达生产环境之前捕获它们。*
+
+- 机器学习代码以缺乏测试而闻名。"能训练，所以能工作"是普遍态度。这会导致静默错误：一个错误地打乱数据的数据加载器、一个有符号错误的损失函数、一个丢弃 5% 数据的预处理步骤。这些错误不会使你的程序崩溃。它们只是让你的模型悄悄变差，然后你浪费数周时间调试"本应更高"的指标。
+
+- 测试不是额外负担。它是快速前进而不破坏东西的最快方式。
+
+## 测试金字塔
+
+- 测试按层级组织，从快速且狭窄到慢速且广泛：
+
+    - **单元测试**（底层）：隔离测试单个函数和类。快速（毫秒级），数量多（数百到数千）。"`normalise_image` 是否产生 [0, 1] 范围内的值？"
+
+    - **集成测试**（中层）：测试组件协同工作。较慢（秒级）。"数据加载器是否以模型期望的格式产生批次？"
+
+    - **端到端测试**（顶层）：测试从输入到输出的完整管道。较慢（分钟级）。"`python train.py --config test.yaml` 是否无错误完成并产生有效的检查点？"
+
+- 金字塔形状意味着：编写大量单元测试，较少数量的集成测试，以及少量端到端测试。单元测试捕获大多数错误，并在几秒钟内运行。端到端测试捕获集成问题，但慢且脆弱。
+
+## 使用 pytest 进行单元测试
+
+- **pytest** 是标准的 Python 测试框架。测试是以 `test_` 开头的函数，放在以 `test_` 开头的文件中：
+
+```python
+# tests/test_utils.py
+
+def test_normalise_image():
+    import numpy as np
+    image = np.array([0, 128, 255], dtype=np.uint8)
+    result = normalise_image(image, mean=128, std=128)
+    assert result.min() >= -1.0
+    assert result.max() <= 1.0
+    assert abs(result[1]) < 1e-6  # 128 被 mean=128 归一化后应约为 0
+
+def test_normalise_empty():
+    import numpy as np
+    image = np.array([], dtype=np.uint8)
+    result = normalise_image(image, mean=128, std=128)
+    assert len(result) == 0
+```
+
+```bash
+pytest tests/                     # 运行所有测试
+pytest tests/test_utils.py        # 运行一个文件
+pytest -v                         # 详细输出
+pytest -x                         # 在第一个失败时停止
+pytest -k "normalise"             # 运行匹配名称模式的测试
+pytest --tb=short                 # 更短的追溯信息
+```
+
+### 夹具
+
+- **夹具**为测试提供可复用的设置。无需在每个测试中重复设置代码，只需定义一次：
+
+```python
+import pytest
+
+@pytest.fixture
+def sample_dataset():
+    """创建一个用于测试的小型数据集。"""
+    return {
+        "inputs": torch.randn(10, 3, 32, 32),
+        "labels": torch.randint(0, 10, (10,))
+    }
+
+@pytest.fixture
+def trained_model():
+    """加载一个小型预训练模型。"""
+    model = SmallModel()
+    model.load_state_dict(torch.load("tests/fixtures/small_model.pt"))
+    return model
+
+def test_model_output_shape(trained_model, sample_dataset):
+    output = trained_model(sample_dataset["inputs"])
+    assert output.shape == (10, 10)  # batch_size x num_classes
+```
+
+- 夹具可以有**作用域**：`scope="function"`（默认，每次测试重新创建）、`scope="module"`（每个文件一次）、`scope="session"`（每次测试运行一次）。对于加载模型等昂贵设置，使用 `scope="session"`。
+
+### 参数化测试
+
+- 使用多个输入测试同一个函数，无需重复代码：
+
+```python
+@pytest.mark.parametrize("input,expected", [
+    ([1, 2, 3], 6),
+    ([], 0),
+    ([-1, 1], 0),
+    ([1000000, 1000000], 2000000),
+])
+def test_sum(input, expected):
+    assert sum(input) == expected
+```
+
+## Mock 与补丁
+
+- **Mock** 在测试期间用假依赖替换真实依赖。这让你可以隔离测试函数，而无需数据库、API 或 GPU。
+
+```python
+from unittest.mock import patch, MagicMock
+
+def test_training_logs_metrics():
+    mock_logger = MagicMock()
+
+    with patch("my_project.training.trainer.wandb") as mock_wandb:
+        trainer = Trainer(logger=mock_logger)
+        trainer.train_one_epoch()
+
+        # 验证训练器记录了指标
+        mock_logger.log.assert_called()
+        # 验证它记录了损失值
+        call_args = mock_logger.log.call_args
+        assert "loss" in call_args[1]
+```
+
+- **何时使用 Mock**：外部服务（API、数据库、云存储）、昂贵操作（GPU 计算、大型文件 I/O）和非确定性行为（随机数生成器、时间戳）。
+
+- **何时不要 Mock**：你自己的代码。如果你 Mock 了所有内容，你的测试验证的是 Mock 的行为符合预期，而不是你的代码能工作。在边界处进行 Mock，直接测试你的逻辑。
+
+## 测试机器学习代码
+
+- 机器学习代码有独特的测试挑战：输出是概率性的，训练很慢，而且"正确"是模糊的。
+
+### 确定性种子
+
+- 在所有地方设置随机种子，使测试可重现：
+
+```python
+import random
+import numpy as np
+import torch
+
+def set_seed(seed=42):
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed_all(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+```
+
+### 数值容差
+
+- 浮点数比较需要容差（第 13 章，IEEE 754）：
+
+```python
+# 糟糕：由于浮点数问题，精确比较会失败
+assert model_output == 0.5
+
+# 良好：近似比较
+import numpy as np
+assert np.isclose(model_output, 0.5, atol=1e-5)
+
+# 对于张量
+assert torch.allclose(output, expected, atol=1e-4)
+```
+
+### 机器学习中需要测试什么
+
+- **形状测试**：验证输出具有预期的维度。
+
+```python
+def test_model_output_shape():
+    model = MyModel(d_model=256, n_classes=10)
+    x = torch.randn(8, 32, 256)  # batch=8, seq=32, dim=256
+    output = model(x)
+    assert output.shape == (8, 10)
+```
+
+- **梯度流**：验证可训练参数具有非零梯度。
+
+```python
+def test_gradients_flow():
+    model = MyModel()
+    x = torch.randn(4, 3, 32, 32)
+    y = torch.randint(0, 10, (4,))
+
+    output = model(x)
+    loss = F.cross_entropy(output, y)
+    loss.backward()
+
+    for name, param in model.named_parameters():
+        assert param.grad is not None, f"没有 {name} 的梯度"
+        assert param.grad.abs().sum() > 0, f"{name} 的梯度为零"
+```
+
+- **在一个批次上过拟合**：模型应该能够记忆单个批次。如果不能，说明某处存在根本性问题。
+
+```python
+def test_overfit_one_batch():
+    model = MyModel()
+    optimiser = torch.optim.Adam(model.parameters(), lr=1e-3)
+    x, y = get_single_batch()
+
+    for _ in range(100):
+        loss = F.cross_entropy(model(x), y)
+        loss.backward()
+        optimiser.step()
+        optimiser.zero_grad()
+
+    assert loss.item() < 0.01, f"无法过拟合单个批次：loss={loss.item()}"
+```
+
+- **数据验证**：验证数据加载产生有效输出。
+
+```python
+def test_dataset_basics():
+    dataset = MyDataset("tests/fixtures/small_data.csv")
+    assert len(dataset) > 0
+    x, y = dataset[0]
+    assert x.shape == (3, 224, 224)
+    assert 0 <= y < 10
+    assert not torch.isnan(x).any()
+    assert not torch.isinf(x).any()
+```
+
+- **确定性**：相同输入 + 相同种子 → 相同输出。
+
+```python
+def test_determinism():
+    set_seed(42)
+    output1 = model(input_data)
+    set_seed(42)
+    output2 = model(input_data)
+    assert torch.allclose(output1, output2)
+```
+
+## CI/CD 管道
+
+- **持续集成（CI）**：在每次提交或 PR 上自动运行测试。如果测试失败，PR 不能合并。这防止了损坏的代码到达 `main`。
+
+- **GitHub Actions** 示例（`.github/workflows/ci.yml`）：
+
+```yaml
+name: CI
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - run: pip install -e ".[dev]"
+      - run: ruff check src/
+      - run: mypy src/
+      - run: pytest tests/ -v --tb=short
+```
+
+- **预提交钩子**：在每次提交前（本地）运行检查，在它们到达 CI 之前捕获问题：
+
+```yaml
+# .pre-commit-config.yaml
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.3.0
+    hooks:
+      - id: ruff
+        args: [--fix]
+      - id: ruff-format
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.5.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-yaml
+```
+
+```bash
+pip install pre-commit
+pre-commit install    # 现在每次 git 提交时都会运行钩子
+```
+
+## 代码检查与格式化
+
+- **代码检查**无需运行代码即可捕获错误和风格问题。**格式化**自动强制执行一致的风格。
+
+- **Ruff**：一个快速的 Python 代码检查器和格式化器（在一个工具中替代 flake8、isort 和 black）：
+
+```bash
+ruff check src/          # 代码检查
+ruff check --fix src/    # 代码检查并自动修复
+ruff format src/         # 格式化
+```
+
+- **mypy**：Python 静态类型检查器。在运行时之前捕获类型错误：
+
+```bash
+mypy src/
+# src/model.py:42: error: Argument 1 to "forward" has incompatible type "int"; expected "Tensor"
+```
+
+- 类型提示使代码自文档化并捕获错误：
+
+```python
+def train(
+    model: nn.Module,
+    dataloader: DataLoader,
+    optimiser: torch.optim.Optimizer,
+    num_epochs: int = 10,
+) -> float:
+    """训练模型并返回最终损失。"""
+    ...
+```
+
+## 代码审查最佳实践
+
+- **对于作者**：
+    - 在请求审查之前先自我审查你的差异。你会发现明显的问题。
+    - 保持 PR 小而专注。一个 PR 聚焦一个问题。
+    - 写清晰的描述：什么、为什么、如何测试。
+    - 回复每条评论（即使只是"已修改"）。
+
+- **对于审查者**：
+    - 保持友善。批评代码，而不是人。"这里可以更清晰"而不是"这很令人困惑。"
+    - 区分阻塞性问题（错误、安全）和建议（风格、命名）。使用标签："nit:"、"suggestion:"、"blocking:"。
+    - 提问而不是发号施令。"如果这个列表为空会怎样？"比"处理空的情况"更有帮助。
+    - 及时批准。等待数天的 PR 会阻塞作者，并鼓励大型、批量的 PR（这些更难审查）。