【原】LLMs之Prover：《DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning

處女座的程序猿 2025-06-02 發布于上海

展開全文

LLMs之Prover：《DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition》翻譯與解讀

導讀：DeepSeek-Prover-V2通過結合大型語言模型的推理能力和形式化驗證系統的嚴格性，提出了一種新的形式化定理證明方法。該方法通過子目標分解、遞歸求解、課程學習和強化學習等技術，有效地提升了模型在各種數學基準測試中的性能，并縮小了非形式化推理和形式化證明之間的差距，為未來的自動定理證明研究奠定了基礎。

>> 背景痛點

● 推理有效性：大型語言模型（LLMs）在數學問題求解方面取得了顯著進展，這主要得益于推理時規模擴展，特別是自然語言鏈式思考（CoT）推理。

● 形式化定理的挑戰：盡管自然語言推理在解決競賽級數學問題上很成功，但將其應用于形式化定理證明仍然具有根本性的挑戰。

● 非形式化的劣勢和形式化的特點：LLMs的自然語言推理本質上是非形式化的，依賴于啟發式方法、近似和數據驅動的猜測模式，這些模式通常缺乏形式驗證系統所需的嚴格結構。形式化驗證系統（如Lean, Isabelle, Coq）基于嚴格的邏輯基礎，每個證明步驟都必須顯式構造和形式驗證，不允許任何歧義、隱含假設或細節遺漏。

● 兩者的挑戰：如何彌合非形式化的高級推理與形式化驗證系統的句法嚴格性之間的差距，是神經定理證明中長期存在的挑戰。

>> 解決方案

● 提出了DeepSeek-Prover-V2，一個用于Lean 4形式化定理證明的開源大型語言模型。

● 利用DeepSeek-V3驅動的遞歸定理證明流程收集初始化數據。通過提示DeepSeek-V3將復雜問題分解為一系列子目標來啟動冷啟動訓練過程。將已解決子目標的證明合成為鏈式思考過程，并結合DeepSeek-V3的逐步推理，為強化學習創建一個初始冷啟動。

● 將非形式化和形式化數學推理集成到一個統一的模型中。

● 構建了一個簡單的遞歸定理證明pipeline，利用DeepSeek-V3作為子目標分解和形式化的統一工具。

● 提示DeepSeek-V3將定理分解為高級證明草圖，同時將這些證明步驟形式化為Lean 4中的子目標序列。使用較小的7B模型處理每個子目標的證明搜索，從而減少計算負擔。

● 引入課程學習框架，利用分解的子目標生成推測性定理，逐步增加訓練任務的難度，以更好地指導模型的學習過程。將完整的逐步形式證明與DeepSeek-V3的相應鏈式思考配對，以創建冷啟動推理數據。

● 應用強化學習階段，以進一步加強非形式化數學推理和形式證明構造之間的聯系。

>> 核心思路步驟

● 子目標分解：使用DeepSeek-V3將復雜定理分解為更小的、可管理的子目標（lemma）。

● 形式化草圖：將分解的子目標轉換為Lean 4的形式化語句，但省略證明細節，用"sorry"占位符表示。

● 遞歸求解：使用較小的7B prover模型遞歸地解決每個子目標，利用先前的子目標作為前提。

● 合成完整證明：將子目標的證明組合成原始問題的完整形式化證明。

● 冷啟動數據生成：將完整的形式化證明與DeepSeek-V3的鏈式思考過程相結合，創建高質量的冷啟動訓練數據。

● 課程學習：利用子目標生成難度遞增的訓練任務，逐步引導prover模型解決更具挑戰性的問題。

● 強化學習：使用二元正確/錯誤反饋作為獎勵信號，并加入一致性獎勵，確保生成的證明結構與鏈式思考的分解一致。

>> 優勢

● DeepSeek-Prover-V2-671B在神經定理證明方面達到了最先進的性能。在MiniF2F-test上達到了88.9%的pass ratio。解決了PutnamBench中的658個問題中的49個。在ProverBench的15個AIME問題中成功解決了6個。

● 縮小了大型語言模型中形式和非形式數學推理之間的差距。

● 通過將一般用途LLM與輕量級專用7B prover集成，實現了90.2％的miniF2F-valid成功率。

● CoT推理模式在形式數學推理中比非CoT模式具有顯著的性能優勢。

● 7B模型在PutnamBench數據集上使用非CoT生成模式表現出色，成功解決了671B版本未解決的13個問題。

>> 結論和觀點

● 通過合成冷啟動推理數據，可以有效提升形式化定理證明的能力。

● 遞歸定理證明框架，結合子目標分解和形式化，是一種有前景的方法。

● 課程學習和強化學習可以進一步增強模型在形式化定理證明方面的能力。

● 高容量模型即使在沒有明確CoT提示的情況下，也可能內化和外化中間推理。

● 建議未來的工作重點是將該范例擴展到類似AlphaProof的系統，目標是解決代表自動定理證明挑戰前沿的IMO級數學問題。

● 建議進一步探索如何利用大型語言模型的非形式化推理能力來指導形式化證明的構建。

● 建議研究如何設計更有效的獎勵函數，以鼓勵模型生成結構良好、易于理解的形式化證明。

LLMs之Prover：《DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition》翻譯與解讀

地址	論文地址：[2504.21801] DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
時間	2025年4月30日
作者	DeepSeek-AI

Abstract

We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduce ProverBench, a collection of 325 formalized problems, to enrich our evaluation, including 15 selected problems from the recent AIME competitions (years 24-25). Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them. In comparison, DeepSeek-V3 solves 8 of these problems using majority voting, highlighting that the gap between formal and informal mathematical reasoning in large language models is substantially narrowing.

我們推出了 DeepSeek-Prover-V2，這是一款開源的大規模語言模型，專為?Lean 4 中的形式化定理證明而設計，其初始化數據通過由 DeepSeek-V3 驅動的遞歸定理證明管道收集。冷啟動訓練過程首先提示 DeepSeek-V3 將復雜問題分解為一系列子目標。已解決子目標的證明被合成為一條思維鏈過程，并與 DeepSeek-V3 的逐步推理相結合，以創建強化學習的初始冷啟動。這一過程使我們能夠將非正式和正式的數學推理整合到一個統一的模型中。最終得到的模型 DeepSeek-Prover-V2-671B 在神經定理證明方面達到了最先進的性能，在 MiniF2F 測試中達到了 88.9% 的通過率，并解決了 PutnamBench 中 658 個問題中的 49 個。除了標準基準測試外，我們還引入了 ProverBench，這是一個包含 325 個形式化問題的集合，以豐富我們的評估，其中包括從最近的 AIME 競賽（第 24 - 25 年）中精選的 15 個問題。對這 15 個 AIME 問題的進一步評估表明，該模型成功解決了其中的 6 個。相比之下，DeepSeek-V3 通過多數投票解決了其中的 8 個問題，這表明大型語言模型中形式化數學推理與非形式化數學推理之間的差距正在大幅縮小。

Figure 1:Benchmark performance of DeepSeek-Prover-V2. On the AIME benchmark, DeepSeek-V3 is evaluated using the standard find-answer task for natural-language reasoning, while prover models generate Lean code to construct formal proofs for a given correct answer.圖 1：DeepSeek-Prover-V2 的基準性能。在 AIME 基準測試中，DeepSeek-V3 通過標準的自然語言推理找答案任務進行評估，而證明模型則生成 Lean 代碼來為給定的正確答案構建形式化證明。

1、Introduction

The emergence of reasoning capabilities in large language models (LLMs) has revolutionized numerous areas of artificial intelligence, particularly in the domain of mathematical problem solving?(DeepSeek-AI,?2025). These advancements are largely enabled by the paradigm of inference-time scaling, most notably through natural language chain-of-thought reasoning?(Jaech et?al.,?2024). Rather than relying solely on a single forward pass to arrive at an answer, LLMs can reflect on intermediate reasoning steps, improving both accuracy and interpretability. Despite the success of natural language reasoning in solving competition-level mathematical problems, its application to formal theorem proving remains fundamentally challenging. LLMs perform natural language reasoning in an inherently informal manner, relying on heuristics, approximations, and data-driven guessing patterns that often lack the rigorous structure required by formal verification systems. In contrast, proof assistants such as Lean?(Moura and Ullrich,?2021), Isabelle?(Paulson,?1994), and Coq?(Barras et?al.,?1999)?operate on strict logical foundations, where every proof step must be explicitly constructed and formally verified. These systems permit no ambiguity, implicit assumptions, or omission of details. Bridging the gap between informal, high-level reasoning and the syntactic rigor of formal verification systems remains a longstanding research challenge in neural theorem proving?(Yang et?al.,?2024).

To harness the strengths of informal mathematical reasoning in support of formal theorem proving, a classical approach is to hierarchically decompose formal proofs based on the guidance of natural-language proof sketches.?Jiang et?al. (2023)?proposed a framework, called?Draft, Sketch, and Prove?(DSP), that leverages a large language model to generate proof sketches in natural language, which are subsequently translated into formal proof steps. This informal-to-formal theorem proving paradigm closely mirrors the concept of subgoals in hierarchical reinforcement learning?(Barto and Mahadevan,?2003; Nachum et?al.,?2018; Eppe et?al.,?2022), where complex tasks are broken down into a hierarchy of simpler subtasks that can be solved independently to progressively achieve the overarching objective. In formal theorem proving, a subgoal is typically an intermediate proposition or lemma that contributes to the proof of a larger theorem?(Zhao et?al.,?2023,?2024). This hierarchical decomposition aligns with human problem-solving strategies and supports modularity, reusability, and more efficient proof search?(Wang et?al.,?2024b; Zheng et?al.,?2024). Recent studies have extended this paradigm by employing multi-tiered hierarchies for structured proof generation?(Wang et?al.,?2024a), and by leveraging reinforcement learning techniques to optimize the decomposition of complex theorems into manageable subgoals?(Dong et?al.,?2024).

大型語言模型（LLMs）推理能力的出現徹底改變了人工智能的眾多領域，尤其是在數學問題求解方面（DeepSeek-AI，2025）。這些進步很大程度上得益于推理時間縮放的范式，尤其是通過自然語言的推理鏈（Jaech 等人，2024）。LLMs 不再僅僅依靠單次前向傳遞得出答案，而是能夠反思中間的推理步驟，從而提高準確性和可解釋性。盡管自然語言推理在解決競賽級別的數學問題方面取得了成功，但將其應用于形式化定理證明仍然存在根本性的挑戰。LLMs 進行自然語言推理的方式本質上是非正式的，依賴于啟發式方法、近似值和數據驅動的猜測模式，這些往往缺乏形式驗證系統所要求的嚴格結構。相比之下，諸如 Lean（Moura 和 Ullrich，2021 年）、Isabelle（Paulson，1994 年）和 Coq（Barras 等人，1999 年）這樣的證明助手則基于嚴格的邏輯基礎運行，其中每一步證明都必須明確構建并經過形式驗證。這些系統不允許存在任何模糊性、隱含假設或細節遺漏。在神經定理證明領域（Yang 等人，2024 年），如何彌合非正式的高層次推理與形式驗證系統語法嚴謹性之間的差距，一直是一個長期存在的研究挑戰。

為了利用非正式數學推理的優勢來支持形式定理證明，一種經典的方法是根據自然語言證明草圖的指導，對形式證明進行分層分解。Jiang 等人（2023 年）提出了一種名為“草稿、草圖和證明”（DSP）的框架，該框架利用大型語言模型生成自然語言形式的證明草圖，隨后將其轉換為形式證明步驟。這種非正式到正式的定理證明范式與分層強化學習中的子目標概念緊密相關（Barto 和 Mahadevan，2003 年；Nachum 等人，2018 年；在 Eppe 等人（2022 年）的研究中，復雜任務被分解為一系列更簡單的子任務，這些子任務可以獨立解決，從而逐步實現總體目標。在形式化定理證明中，子目標通常是有助于證明更大定理的中間命題或引理（Zhao 等人，2023 年，2024 年）。這種分層分解與人類解決問題的策略相一致，并支持模塊化、可重用性和更高效的證明搜索（Wang 等人，2024 年 b；Zheng 等人，2024 年）。最近的研究通過采用多層級結構進行結構化證明生成（Wang 等人，2024 年 a），以及利用強化學習技術優化復雜定理向可管理子目標的分解（Dong 等人，2024 年），對這一范式進行了擴展。

In this paper, we develop a reasoning model for subgoal decomposition, leveraging a suite of synthetic cold-start data and large-scale reinforcement learning to enhance its performance. To construct the cold-start dataset, we develop a simple yet effective pipeline for recursive theorem proving, utilizing DeepSeek-V3?(DeepSeek-AI,?2024)?as a unified tool for both subgoal decomposition and formalization. We prompt DeepSeek-V3 to decompose theorems into high-level proof sketches while simultaneously formalizing these proof steps in Lean 4, resulting in a sequence of subgoals. Since the subgoal decomposition is powered by a large general-purpose model, we use a smaller 7B model to handle the proof search for each subgoal, thereby reducing the associated computational burden. Additionally, we introduce a curriculum learning framework that leverages the decomposed subgoals to generate conjectural theorems, progressively increasing the difficulty of training tasks to better guide the model’s learning process. Once the decomposed steps of a challenging problem are resolved, we pair the complete step-by-step formal proof with the corresponding chain-of-thought from DeepSeek-V3 to create cold-start reasoning data. Based on the cold start, a subsequent reinforcement learning stage is applied to further strengthen the connection between informal mathematical reasoning and formal proof construction. Our experiments show that reinforcement learning starting from the cold start of informal reasoning in task decomposition significantly enhances the model’s capabilities in formal theorem proving. The resulting DeepSeek-Prover-V2-671B model establishes a new state-of-the-art in neural theorem proving across multiple benchmarks. On MiniF2F-test, it achieves?82.4%?accuracy with Pass@32, improving to?88.9%?with Pass@8192. The model shows strong generalization capabilities to college-level theorem proving, solving?37.1%?of ProofNet-test problems with Pass@1024 and tackling 49 out of 658 challenging PutnamBench problems. Additionally, we contribute ProverBench, a benchmark dataset containing 325 formalized problems to advance neural theorem proving research, including 15 from the prestigious AIME competitions (years 24-25). DeepSeek-Prover-V2-671B successfully solves 6 of these 15 challenging AIME problems, further demonstrating its sophisticated mathematical reasoning capabilities.

在本文中，我們開發了一種用于子目標分解的推理模型，利用一套合成的冷啟動數據和大規模強化學習來提升其性能。為了構建冷啟動數據集，我們開發了一個簡單而有效的遞歸定理證明流水線，利用 DeepSeek-V3（DeepSeek-AI，2024）作為子目標分解和形式化的統一工具。我們提示 DeepSeek-V3 將定理分解為高級證明草圖，同時在 Lean 4 中對這些證明步驟進行形式化，從而形成一系列子目標。由于子目標分解是由一個大型通用模型驅動的，我們使用一個較小的 7B 模型來處理每個子目標的證明搜索，從而減輕相關的計算負擔。此外，我們引入了一個課程學習框架，利用分解的子目標生成推測性定理，逐步增加訓練任務的難度，以更好地引導模型的學習過程。一旦具有挑戰性問題的分解步驟得到解決，我們就將完整的分步形式證明與 DeepSeek-V3 對應的思維鏈配對，以創建冷啟動推理數據。基于冷啟動，隨后應用強化學習階段，進一步加強非形式化數學推理與形式化證明構建之間的聯系。我們的實驗表明，從任務分解中的非形式化推理冷啟動開始的強化學習顯著增強了模型在形式化定理證明方面的能力。由此產生的 DeepSeek-Prover-V2-671B 模型在多個基準測試中確立了神經定理證明的新標桿。在 MiniF2F-test 上，它實現了 82.4% 的準確率，Pass@32 為 88.9%，Pass@8192 為 88.9%。該模型在大學水平的定理證明方面表現出強大的泛化能力，在 ProofNet-test 上解決了 37.1% 的問題，Pass@1024 為 49 個，解決了 658 個具有挑戰性的 PutnamBench 問題中的 49 個。此外，我們貢獻了 ProverBench，這是一個包含 325 個形式化問題的基準數據集，旨在推動神經定理證明研究的發展，其中包括 15 道來自著名的 AIME 競賽（第 24 至 25 屆）的題目。DeepSeek-Prover-V2-671B 成功解決了這 15 道極具挑戰性的 AIME 題目中的 6 道，進一步證明了其復雜的數學推理能力。

Conclusion

In this work, we propose a comprehensive pipeline for synthesizing cold-start reasoning data to advance formal theorem proving. Our data construction process is grounded in a recursive theorem-proving framework, wherein DeepSeek-V3 serves as a unified model for both subgoal decomposition and lemma formalization within the Lean 4 proof assistant. Our approach combines high-level proof sketches with formal steps, creating a sequence of manageable subgoals that can be efficiently solved using a smaller 7B model, significantly reducing computational requirements. The curriculum learning framework we developed uses these decomposed subgoals to generate increasingly difficult training tasks, creating a more effective learning progression. By pairing complete formal proofs with DeepSeek-V3’s chain-of-thought reasoning, we established valuable cold-start reasoning data that bridges informal mathematical thinking with formal proof structures. The subsequent reinforcement learning stage substantially enhanced this connection, leading to significant improvements in formal theorem proving capabilities. The resulting model, DeepSeek-Prover-V2-671B, consistently outperforms all baselines across a range of benchmarks, spanning both high-school competition problems and undergraduate-level mathematics. Our future work will focus on scaling this paradigm to an AlphaProof-like system with the ultimate aim of tackling IMO-level mathematical problems that represent the frontier of automated theorem proving challenges.

在本研究中，我們提出了一套全面的流程，用于合成冷啟動推理數據以推進形式化定理證明。我們的數據構建過程基于一個遞歸定理證明框架，在此框架中，DeepSeek-V3 在 Lean 4 證明助手內統一充當子目標分解和引理形式化的模型。我們的方法將高層次的證明草圖與形式化步驟相結合，生成一系列易于處理的子目標，這些子目標可以使用較小的 7B 模型高效解決，從而大幅降低計算需求。我們開發的課程學習框架利用這些分解的子目標生成難度逐漸增加的訓練任務，形成更有效的學習進程。通過將完整的形式化證明與 DeepSeek-V3 的鏈式思維推理相結合，我們建立了寶貴的冷啟動推理數據，將非形式化的數學思維與形式化證明結構連接起來。隨后的強化學習階段極大地加強了這種聯系，顯著提升了形式化定理證明的能力。由此產生的模型 DeepSeek-Prover-V2-671B 在一系列基準測試中始終優于所有基線模型，涵蓋了高中競賽題和大學水平的數學題。我們未來的工作將致力于將這一范例擴展為類似 AlphaProof 的系統，最終目標是解決代表自動化定理證明挑戰前沿的國際數學奧林匹克競賽級別的數學問題。