2536c937e3
翻译自英文原版 maths-cs-ai-compendium,共 20 章全部完成。 第01章 向量 | 第02章 矩阵 | 第03章 微积分 第04章 统计学 | 第05章 概率论 | 第06章 机器学习 第07章 计算语言学 | 第08章 计算机视觉 | 第09章 音频与语音 第10章 多模态学习 | 第11章 自主系统 | 第12章 图神经网络 第13章 计算与操作系统 | 第14章 数据结构与算法 第15章 生产级软件工程 | 第16章 SIMD与GPU编程 第17章 AI推理 | 第18章 ML系统设计 第19章 应用人工智能 | 第20章 前沿人工智能 翻译说明: - 所有数学公式 $...$ / $$...$$、代码块、图片引用完整保留 - mkdocs.yml 配置中文导航 + language: zh - README.md 已翻译为中文(兼 docs/index.md) - docs/ 目录包含指向各章文件的 symlink - 约 29,000 行中文内容,排除 .cache/ 构建缓存
114 lines
8.5 KiB
XML
114 lines
8.5 KiB
XML
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 750 300" width="750" height="300">
|
|
<defs>
|
|
<marker id="arrow-mwm" viewBox="0 0 10 7" refX="10" refY="3.5" markerWidth="8" markerHeight="6" orient="auto-start-reverse">
|
|
<path d="M0,0 L10,3.5 L0,7z" fill="#666"/>
|
|
</marker>
|
|
</defs>
|
|
|
|
<!-- Title -->
|
|
<text x="375" y="22" text-anchor="middle" font-family="Arial, sans-serif" font-size="14" font-weight="bold" fill="#333">Multimodal World Model: Predicting Future States</text>
|
|
|
|
<!-- ===== LEFT: Current State ===== -->
|
|
<text x="100" y="50" text-anchor="middle" font-family="Arial, sans-serif" font-size="11" font-weight="bold" fill="#666">Current State (t)</text>
|
|
|
|
<!-- Image frame -->
|
|
<rect x="30" y="62" width="80" height="55" rx="6" fill="#3498db" fill-opacity="0.12" stroke="#3498db" stroke-width="1.5"/>
|
|
<rect x="42" y="72" width="56" height="35" rx="3" fill="none" stroke="#3498db" stroke-width="0.8"/>
|
|
<!-- Simple scene inside frame -->
|
|
<line x1="42" y1="100" x2="98" y2="100" stroke="#3498db" stroke-width="0.6"/>
|
|
<circle cx="60" cy="86" r="5" fill="#3498db" fill-opacity="0.2" stroke="#3498db" stroke-width="0.5"/>
|
|
<rect x="78" y="88" width="12" height="14" rx="1" fill="#3498db" fill-opacity="0.15" stroke="#3498db" stroke-width="0.5"/>
|
|
<text x="70" y="125" text-anchor="middle" font-family="Arial, sans-serif" font-size="9" fill="#3498db">Image Frame</text>
|
|
|
|
<!-- Text description -->
|
|
<rect x="125" y="62" width="80" height="55" rx="6" fill="#e74c3c" fill-opacity="0.12" stroke="#e74c3c" stroke-width="1.5"/>
|
|
<line x1="137" y1="75" x2="193" y2="75" stroke="#e74c3c" stroke-width="0.8"/>
|
|
<line x1="137" y1="83" x2="185" y2="83" stroke="#e74c3c" stroke-width="0.8"/>
|
|
<line x1="137" y1="91" x2="190" y2="91" stroke="#e74c3c" stroke-width="0.8"/>
|
|
<line x1="137" y1="99" x2="170" y2="99" stroke="#e74c3c" stroke-width="0.8"/>
|
|
<text x="165" y="125" text-anchor="middle" font-family="Arial, sans-serif" font-size="9" fill="#e74c3c">Text Description</text>
|
|
|
|
<!-- Audio waveform -->
|
|
<rect x="30" y="138" width="80" height="40" rx="6" fill="#27ae60" fill-opacity="0.12" stroke="#27ae60" stroke-width="1.5"/>
|
|
<path d="M42,158 Q48,148 54,158 Q58,165 62,155 Q66,148 70,158 Q74,165 78,155 Q82,150 86,158 Q90,163 94,158 Q98,153 102,158" fill="none" stroke="#27ae60" stroke-width="1.2"/>
|
|
<text x="70" y="190" text-anchor="middle" font-family="Arial, sans-serif" font-size="9" fill="#27ae60">Audio Waveform</text>
|
|
|
|
<!-- Arrows: current state → world model -->
|
|
<line x1="150" y1="130" x2="268" y2="115" stroke="#666" stroke-width="1.2" marker-end="url(#arrow-mwm)"/>
|
|
<line x1="110" y1="158" x2="268" y2="130" stroke="#666" stroke-width="1.2" marker-end="url(#arrow-mwm)"/>
|
|
|
|
<!-- ===== Action input from below ===== -->
|
|
<rect x="300" y="200" width="110" height="34" rx="8" fill="#f39c12" fill-opacity="0.12" stroke="#f39c12" stroke-width="1.5"/>
|
|
<text x="355" y="215" text-anchor="middle" font-family="Arial, sans-serif" font-size="10" font-weight="bold" fill="#f39c12">Action</text>
|
|
<text x="355" y="228" text-anchor="middle" font-family="Arial, sans-serif" font-size="8" fill="#f39c12">(e.g., "move left")</text>
|
|
|
|
<!-- Arrow: action → world model -->
|
|
<line x1="355" y1="200" x2="355" y2="170" stroke="#f39c12" stroke-width="1.5" marker-end="url(#arrow-mwm)"/>
|
|
|
|
<!-- ===== MIDDLE: World Model ===== -->
|
|
<rect x="270" y="60" width="170" height="108" rx="10" fill="#9b59b6" fill-opacity="0.12" stroke="#9b59b6" stroke-width="2"/>
|
|
<text x="355" y="90" text-anchor="middle" font-family="Arial, sans-serif" font-size="13" font-weight="bold" fill="#9b59b6">World Model</text>
|
|
|
|
<!-- Internal blocks -->
|
|
<rect x="286" y="100" width="60" height="22" rx="4" fill="#9b59b6" fill-opacity="0.08" stroke="#9b59b6" stroke-width="0.8"/>
|
|
<text x="316" y="114" text-anchor="middle" font-family="Arial, sans-serif" font-size="7" fill="#9b59b6">Encoder</text>
|
|
|
|
<rect x="355" y="100" width="70" height="22" rx="4" fill="#9b59b6" fill-opacity="0.08" stroke="#9b59b6" stroke-width="0.8"/>
|
|
<text x="390" y="114" text-anchor="middle" font-family="Arial, sans-serif" font-size="7" fill="#9b59b6">Predictor</text>
|
|
|
|
<rect x="290" y="130" width="140" height="22" rx="4" fill="#9b59b6" fill-opacity="0.08" stroke="#9b59b6" stroke-width="0.8"/>
|
|
<text x="360" y="144" text-anchor="middle" font-family="Arial, sans-serif" font-size="7" fill="#9b59b6">Latent Dynamics + Decoder</text>
|
|
|
|
<!-- Arrows: world model → predicted outputs -->
|
|
<line x1="440" y1="100" x2="510" y2="75" stroke="#666" stroke-width="1.2" marker-end="url(#arrow-mwm)"/>
|
|
<line x1="440" y1="115" x2="510" y2="115" stroke="#666" stroke-width="1.2" marker-end="url(#arrow-mwm)"/>
|
|
<line x1="440" y1="142" x2="510" y2="152" stroke="#666" stroke-width="1.2" marker-end="url(#arrow-mwm)"/>
|
|
|
|
<!-- ===== RIGHT: Predicted Future States ===== -->
|
|
<text x="615" y="50" text-anchor="middle" font-family="Arial, sans-serif" font-size="11" font-weight="bold" fill="#666">Predicted States</text>
|
|
|
|
<!-- Predicted future frame -->
|
|
<rect x="512" y="58" width="80" height="45" rx="6" fill="#3498db" fill-opacity="0.08" stroke="#3498db" stroke-width="1.5" stroke-dasharray="4,2"/>
|
|
<rect x="524" y="65" width="56" height="28" rx="3" fill="none" stroke="#3498db" stroke-width="0.6" stroke-dasharray="3,2"/>
|
|
<circle cx="542" cy="78" r="5" fill="#3498db" fill-opacity="0.1" stroke="#3498db" stroke-width="0.4"/>
|
|
<rect x="558" y="78" width="12" height="10" rx="1" fill="#3498db" fill-opacity="0.08" stroke="#3498db" stroke-width="0.4"/>
|
|
<text x="552" y="112" text-anchor="middle" font-family="Arial, sans-serif" font-size="8" fill="#3498db">Predicted Frame</text>
|
|
|
|
<!-- Predicted text -->
|
|
<rect x="610" y="58" width="80" height="45" rx="6" fill="#e74c3c" fill-opacity="0.08" stroke="#e74c3c" stroke-width="1.5" stroke-dasharray="4,2"/>
|
|
<line x1="622" y1="71" x2="678" y2="71" stroke="#e74c3c" stroke-width="0.6" stroke-dasharray="3,2"/>
|
|
<line x1="622" y1="79" x2="670" y2="79" stroke="#e74c3c" stroke-width="0.6" stroke-dasharray="3,2"/>
|
|
<line x1="622" y1="87" x2="660" y2="87" stroke="#e74c3c" stroke-width="0.6" stroke-dasharray="3,2"/>
|
|
<text x="650" y="112" text-anchor="middle" font-family="Arial, sans-serif" font-size="8" fill="#e74c3c">Predicted Text</text>
|
|
|
|
<!-- Predicted audio -->
|
|
<rect x="512" y="126" width="80" height="40" rx="6" fill="#27ae60" fill-opacity="0.08" stroke="#27ae60" stroke-width="1.5" stroke-dasharray="4,2"/>
|
|
<path d="M524,146 Q530,138 536,146 Q540,152 544,143 Q548,138 552,146 Q556,152 560,143 Q564,138 568,146 Q572,150 576,146 Q580,142 584,146" fill="none" stroke="#27ae60" stroke-width="0.8" stroke-dasharray="2,2"/>
|
|
<text x="552" y="178" text-anchor="middle" font-family="Arial, sans-serif" font-size="8" fill="#27ae60">Predicted Audio</text>
|
|
|
|
<!-- Additional predicted frames (fading) for t+2 -->
|
|
<rect x="630" y="130" width="60" height="34" rx="5" fill="#9b59b6" fill-opacity="0.06" stroke="#9b59b6" stroke-width="1" stroke-dasharray="3,3"/>
|
|
<text x="660" y="150" text-anchor="middle" font-family="Arial, sans-serif" font-size="7" fill="#9b59b6">t+2 state</text>
|
|
<text x="660" y="160" text-anchor="middle" font-family="Arial, sans-serif" font-size="7" fill="#999">(recursive)</text>
|
|
|
|
<!-- Arrow from t+1 predictions to t+2 -->
|
|
<line x1="592" y1="148" x2="628" y2="148" stroke="#999" stroke-width="0.8" stroke-dasharray="3,2" marker-end="url(#arrow-mwm)"/>
|
|
|
|
<!-- ===== BOTTOM: Timeline ===== -->
|
|
<line x1="70" y1="260" x2="700" y2="260" stroke="#999" stroke-width="1.5" marker-end="url(#arrow-mwm)"/>
|
|
|
|
<!-- Time markers -->
|
|
<line x1="100" y1="255" x2="100" y2="265" stroke="#999" stroke-width="1.5"/>
|
|
<text x="100" y="280" text-anchor="middle" font-family="Arial, sans-serif" font-size="10" font-weight="bold" fill="#666">t</text>
|
|
<text x="100" y="293" text-anchor="middle" font-family="Arial, sans-serif" font-size="8" fill="#999">current</text>
|
|
|
|
<line x1="400" y1="255" x2="400" y2="265" stroke="#999" stroke-width="1.5"/>
|
|
<text x="400" y="280" text-anchor="middle" font-family="Arial, sans-serif" font-size="10" font-weight="bold" fill="#666">t + 1</text>
|
|
<text x="400" y="293" text-anchor="middle" font-family="Arial, sans-serif" font-size="8" fill="#999">predicted</text>
|
|
|
|
<line x1="600" y1="255" x2="600" y2="265" stroke="#999" stroke-width="1.5"/>
|
|
<text x="600" y="280" text-anchor="middle" font-family="Arial, sans-serif" font-size="10" font-weight="bold" fill="#666">t + 2</text>
|
|
<text x="600" y="293" text-anchor="middle" font-family="Arial, sans-serif" font-size="8" fill="#999">predicted</text>
|
|
|
|
<text x="680" y="280" text-anchor="middle" font-family="Arial, sans-serif" font-size="10" fill="#999">...</text>
|
|
</svg> |