📄 工具详情 ⚙️ 安装教程 📚 使用教程

能力标签

🛠

AI工具

Python字节码反编译

Q: pychd 如何安装和开始使用？

访问 pychd 的 GitHub 仓库或官方网站，按照 README 文档中的步骤安装依赖并运行。通常需要 Python 3.8+ 或 Node.js 16+ 基础环境。

Q: pychd 是否免费？许可证是什么？

pychd 完全免费，采用 MIT 许可证开源发布，任何人都可以免费使用、修改和分发。

Q: pychd 适合哪些用户使用？

pychd 主要面向有一定技术基础的用户，包括开发者、数据分析师、AI 工程师等专业人士。

Q: pychd 的社区活跃度和项目维护状况如何？

pychd 在 GitHub 上已获得 45 个 Star，处于积极发展阶段，社区在持续扩大。

基于 Python · 开源免费，本地部署，数据完全自主可控

英文名：pychd

⭐ 45 Stars 🍴 7 Forks 💻 Python 📄 MIT 🏷 AI 8.0分

8.0AI 综合评分

bytecodedecompilerpythoncodex

📺 TG 频道

✦ AI Skill Hub 推荐

经 AI Skill Hub 精选评估，Python字节码反编译获评「强烈推荐」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色，AI 评分 8.0 分，适合有一定技术背景的用户使用。

📚 深度解析

Python字节码反编译是一款基于 Python 的开源工具，在 GitHub 上收获 0k+ Star，是bytecode、decompiler、python、codex领域中的优质开源项目。开源工具的最大优势在于代码完全透明，你可以审计每一行代码的安全性，也可以根据自身需求进行二次开发和定制。

**为什么要使用开源工具而非商业 SaaS？**
对于个人开发者和有隐私需求的用户，本地部署的开源工具意味着数据不离本机，不受第三方服务商的数据政策约束。同时，开源工具通常没有使用次数限制和月度费用，一次安装即可长期使用，对于高频使用场景的总拥有成本（TCO）远低于订阅制商业工具。

**安装与环境准备**
Python字节码反编译依赖 Python 运行环境。建议通过 pyenv（Python）或 nvm（Node.js）管理 Python 版本，避免全局环境污染。对于新手用户，推荐先创建虚拟环境（python -m venv venv && source venv/bin/activate），再安装依赖，这样即使出现问题也可以随时删除虚拟环境重新开始，不影响系统稳定性。

**社区与维护**
GitHub Issue 和 Discussion 是获取帮助的最快渠道。在提问前建议先检查 Closed Issues（已关闭的问题），大多数常见问题都已有解答。遇到 Bug 时，提供 pip list 的输出、完整错误堆栈和最小可复现示例，能显著提高开发者响应速度。AI Skill Hub 将持续追踪 Python字节码反编译的版本更新，及时通知重要功能变化。

📋 工具概览

Python字节码反编译是一款基于 Python 开发的开源工具，专注于 bytecode、decompiler、python 等核心功能。作为 GitHub 开源项目，它拥有活跃的社区支持和持续的版本迭代，代码完全透明可审计，支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流，都能提供稳定可靠的解决方案。

GitHub Stars

⭐ 45

开发语言

Python

支持平台

Windows / macOS / Linux

维护状态

轻量级项目，按需更新

开源协议

MIT

AI 综合评分

8.0 分

工具类型

AI工具

Forks

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理，如需查看完整原始文档请访问底部「原始来源」。

📌 核心特色

开源免费，支持本地部署，数据完全自主可控
活跃的 GitHub 开源社区，持续迭代更新
提供详细文档和使用示例，新手友好
支持自定义配置，灵活适配不同使用环境
可作为基础组件集成进现有技术栈或进行二次开发

🎯 主要使用场景

本地部署运行，保护数据隐私，满足合规要求
自定义集成到现有系统，扩展技术栈能力
作为开源基础组件进行商业化二次开发

以下安装命令基于项目开发语言和类型自动生成，实际以官方 README 为准。

安装命令

# 方式一：pip 安装（推荐）
pip install pychd

# 方式二：虚拟环境安装（推荐生产环境）
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install pychd

# 方式三：从源码安装（获取最新功能）
git clone https://github.com/diohabara/pychd
cd pychd
pip install -e .

# 验证安装
python -c "import pychd; print('安装成功')"

📋 安装步骤说明

访问 GitHub 仓库页面
按照 README 文档完成依赖安装
根据系统环境完成初始化配置
参考官方示例或文档开始使用
遇到问题可在 GitHub Issues 中查找解答

以下用法示例由 AI Skill Hub 整理，涵盖最常见的使用场景。

常用命令 / 代码示例

# 命令行使用
pychd --help

# 基本用法
pychd input_file -o output_file

# Python 代码中调用
import pychd

# 示例
result = pychd.process("input")
print(result)

以下配置示例基于典型使用场景生成，具体参数请参照官方文档调整。

配置示例

# pychd 配置文件示例（config.yml）
app:
  name: "pychd"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
pychd --config config.yml

# 或通过环境变量配置
export PYCHD_API_KEY="your-key"
export PYCHD_OUTPUT_DIR="./output"

📑 README 深度解析真实文档完整度 81/100 含工作流图查看 GitHub 原文 →

以下内容由系统直接从 GitHub README 解析整理，保留代码块、表格与列表结构。

PyChD

A Python .pyc decompiler that reads CPython 3.0 – 3.14 bytecode and recovers the original .py. The pipeline is a deterministic rule pass (declarations, signatures, decorators, PEP 695 generics, PEP 749 lazy annotations) followed by one Codex CLI call per module to fill the bodies and module-level statements the rule pass can't recover from opcodes alone.

Recovery rate by corpus — Sig / Decl / Strict / BN / BS

What's hard about each version

The bytecode specification is not stable across Python versions. Below is a tour of the biggest source of pain for each release.

3.6 — wordcode

Every instruction became exactly two bytes: 1 opcode + 1 argument. Before 3.6 some opcodes took multi-byte arguments. Decompilers from the 3.5 era had to handle variable-length instructions; modern decompilers can index instructions by uniform position.

3.7 — keyword arguments carry names as a tuple const

f(x=1) used to emit LOAD_CONST 1 and a magic CALL_FUNCTION_KW whose argument said "the top 1 thing is a keyword". From 3.7 the names of the keywords are pushed as a tuple constant:

LOAD_NAME f
LOAD_CONST 1
LOAD_CONST ('x',)    ← names tuple
CALL_FUNCTION_KW 1

Decompilers have to read that tuple constant to know that the 1 is bound to x, not positional.

3.10 — `match` statements (PEP 634)

match x:
    case 0: ...
    case _: ...

becomes a chain of MATCH_CLASS / MATCH_KEYS / MATCH_MAPPING opcodes. Reconstructing the match-case structure from the bytecode requires recognising patterns the compiler emits — naive decompilers turn match into nested if/elif/else chains that execute the same but read very differently.

3.11 — PEP 657 zero-cost exceptions

The biggest spec change in years. Try/except no longer uses SETUP_FINALLY blocks. Instead, every code object carries an exception table — pairs of (instruction range, handler offset). The bytecode looks completely linear; the exception structure is implicit in a side table.

Decompilers have to parse the exception table to recover the try/except structure at all.

3.12 — PEP 709 comprehension inlining

This silently broke every decompiler. In 3.11:

x = [i * 2 for i in range(10)]

emits a separate <listcomp> code object that the outer module calls. In 3.12 the body of the comprehension is inlined directly into the enclosing scope — there's no <listcomp> code object to recurse into anymore. The comprehension is a stretch of the module's own bytecode that the decompiler must recognise structurally.

3.13 — `CALL_INTRINSIC_1`

Several special-purpose opcodes (notably the legacy IMPORT_STAR) collapse into CALL_INTRINSIC_1 with an integer argument:

```

The decompiler itself.

uv tool install pychd pychd decompile path/to/module.pyc --hybrid-rewrite --backend codex

repo. Install both to drop the same fuzz → obfuscate → decompile

pipeline into your own decompiler's CI.

uv tool install pychd-pyfuzz pychd-pyobf # uv users pip install pychd-pyfuzz pychd-pyobf # pip users pychd-pyfuzz emit --target 3.14 --seed 0 # one random valid Python module pychd-pyobf rewrite IN.pyc OUT.pyc # anonymise a .pyc in place ```

--hybrid-rewrite is the default at the CLI. It uses your existing codex login session — set `model = "gpt-5.5"` in `~/.codex/config.toml (or pass -c model=...`) to control which model. No extra API key needed.

If you want a fully offline, deterministic, audit-friendly run with no LLM calls and no contamination risk, use --rules-only — that is the path whose numbers the headline table above reports.

Adopting the same harness in your own decompiler

pychd-pyfuzz and pychd-pyobf are independent PyPI distributions (see §Headline for what they do). pip install pychd-pyfuzz pychd-pyobf and you can run the same fuzz → obfuscate → decompile audit against any Python decompiler. Expected shape of an honest result:

Rule-only strict_match should be within a few points of the raw-corpus number — the rule pass is bytecode-driven and identifier-agnostic, so anonymisation should not move it. Hybrid-rewrite strict_match will drop on -obf corpora by an amount equal to the LLM's contamination advantage on that corpus. > 30 pt is strong evidence the upstream hybrid score is contamination-driven; this repo's worst case is 13 pt (stdlib), with most contaminated corpora landing under 10 pt.

Decompile an entire project tree (mirrors structure into output dir):

uv run pychd decompile path/to/package/ -o recovered/

How it works — compiler-pipeline perspective

Step 1: Python compiles your source to bytecode

The CPython compiler takes your foo.py and emits foo.pyc — a binary file containing a code object for the module plus a nested code object for every function and class. Each code object holds:

- the bytecode instructions (one byte opcode + one byte argument, since 3.6 "wordcode"), - a co_consts tuple of constants used in those instructions, - a co_names tuple of identifier names, - a co_varnames tuple of local variable names, - argument counts (co_argcount, co_kwonlyargcount, etc.), - flag bits (co_flags: is it a coroutine? a generator? does it use *args?).

You can poke at this on any Python install:

>>> import dis
>>> def f(a, b=1): return a + b
>>> dis.dis(f)
  1           RESUME                   0
              LOAD_FAST                0 (a)
              LOAD_FAST                1 (b)
              BINARY_OP                0 (+)
              RETURN_VALUE
>>> f.__code__.co_argcount, f.__code__.co_varnames
(2, ('a', 'b'))

What pychd builds internally for `from os.path import join`:

ir.FromImport(module="os.path", level=0, names=[("join", None)])

then compile with Python 3.8 and run pychd + decompyle3.

```

Broader head-to-head — 23-module stdlib + PyPI subset

Below is the broader comparison against a 23-module mix of stdlib + curated-PyPI modules. The PyPI subset overlaps published corpora (six, packaging, certifi, idna, charset_normalizer) that the Codex backend almost certainly saw at training time, so all the caveats from §LLM contamination disclosure apply here too.

Tool	Source	Install	Coverage	Best Py version (this run)
[`uncompyle6`](https://pypi.org/project/uncompyle6/)	PyPI	`uv sync`	2.4 – 3.8	3.8
[`decompyle3`](https://github.com/rocky/python-decompile3)	PyPI	`uv sync`	3.7 / 3.8 only	3.8
[`pycdc`](https://github.com/zrax/pycdc)	git source build	`just decompilers-build`	1.0 – 3.10	3.10
[`PyLingual`](https://github.com/syssec-utd/pylingual)	podman image (ML-based)	`just decompilers-build`	3.6 – 3.13	3.13

**Each external tool is evaluated on its own highest-supported Python version**, not forced down to a shared 3.8 baseline. uncompyle6 and decompyle3 are scored on 3.8 (their newest supported release), pycdc on 3.10, and PyLingual on 3.13. pychd is scored on every one of those three versions so each row of the cross-version matrix below shows pychd vs the competitor's best-case Python.

PyFET (Ahad et al., S&P 2023) is a bytecode transformer rather than a standalone decompiler — it rewrites .pyc files so they become readable by uncompyle6/decompyle3. Integrating it would require composing the transformer with one of those decompilers end-to-end, which is on the roadmap but not in this comparison.

Quick start

```bash

Worked example: `Lib/_colorize.py`

The two CPython stdlib modules that fail rule-only signature_match (_colorize.py, _pylong.py) contain if False: / if 0: guards. For _colorize.py L8-12:

```python

More CLI examples

```bash

Example 1: a re-export module (full rule recovery, 0 LLM calls)

Original source (a typical __init__.py):

"""Public surface for the foo package."""

from .core import Bar, Baz
from .util import parse, as_dict
from .errors import FooError

__all__ = ["Bar", "Baz", "FooError", "as_dict", "parse"]

After pychd decompile --rules-only:

"""Public surface for the foo package."""

from .core import Bar, Baz
from .util import parse, as_dict
from .errors import FooError

__all__ = ['Bar', 'Baz', 'FooError', 'as_dict', 'parse']

Identical modulo single vs double quotes in __all__. Zero LLM cost, recovered in 0.9 ms.

Example 2: a dataclass module (full hybrid-rewrite recovery)

Original:

from dataclasses import dataclass
from typing import Any

@dataclass(frozen=True)
class AgentMessage:
    type: str
    uuid: str
    agent_id: str
    message: Any = None

    @classmethod
    def from_json(cls, value):
        return cls(
            type=value["type"],
            uuid=value["uuid"],
            agent_id=value["agentId"],
            message=value.get("message"),
        )

After pychd decompile --hybrid-rewrite --backend codex (one LLM call per module; rule pass first, LLM corrects bodies + module-level recovery):

from dataclasses import dataclass
from typing import Any

@dataclass(frozen=True)
class AgentMessage:
    type: str
    uuid: str
    agent_id: str
    message: Any = None

    @classmethod
    def from_json(cls, value):
        return cls(
            type=value["type"],
            uuid=value["uuid"],
            agent_id=value["agentId"],
            message=value.get("message"),
        )

Byte-for-byte recovery on this shape — bytecode_exact round-trips under the producing 3.14 interpreter. The class declaration, every annotation, the @classmethod method decorator, the outer @dataclass(frozen=True) decorator with its keyword argument, and every method signature come straight from the rule pass; the body is filled by the LLM with the (signature + disassembly) it receives.

For the deterministic-only path:

<details><summary>Same input, <code>--rules-only</code> (no LLM)</summary>

from dataclasses import dataclass
from typing import Any

@dataclass(frozen=True)
class AgentMessage:
    type: str
    uuid: str
    agent_id: str
    message: Any = None

    @classmethod
    def from_json(cls, value):
        return cls(type=value['type'], uuid=value['uuid'], agent_id=value['agentId'], message=value.get('message'))

The trivial-body matcher even lifts this single-statement method into a real return cls(...), so the rules-only output here is already behaviorally equivalent — the LLM is only needed for multi- statement bodies and complex module-level constructs.

</details>

Example 3: a generic class (PEP 695, full hybrid-rewrite recovery)

Original:

class Stack[T]:
    def __init__(self):
        self.items: list[T] = []
    def push(self, x: T) -> None:
        self.items.append(x)

After pychd decompile --hybrid-rewrite --backend codex:

class Stack[T]:
    def __init__(self):
        self.items: list[T] = []

    def push(self, x: T) -> None:
        self.items.append(x)

Identical modulo whitespace. The PEP 695 type parameter [T] survives the rule pass — pychd recognises the synthetic <generic parameters of Stack> wrapper code object that the CPython compiler emits and unpacks it. Class-body and module-level annotations are recovered from the PEP 749 __annotate__ closure; parameter annotations (x: T) live in a separate per-method closure and the LLM rebuilds them from the disassembly during the rewrite step.

Example 4: a HumanEval problem (full bytecode round-trip)

Original (HumanEval_0.py):

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """
    for idx, elem in enumerate(numbers):
        for idx2, elem2 in enumerate(numbers):
            if idx != idx2:
                distance = abs(elem - elem2)
                if distance < threshold:
                    return True

    return False

After pychd decompile --hybrid-rewrite --backend codex:

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """
    for idx, elem in enumerate(numbers):
        for idx2, elem2 in enumerate(numbers):
            if idx != idx2:
                distance = abs(elem - elem2)
                if distance < threshold:
                    return True
    return False

bytecode_exact, bytecode_normalized, behavioral_smoke, and functional_correctness (the HumanEval check(candidate) oracle) all pass — the recovered module compiles to byte-identical bytecode and passes every assertion. Only difference from the original is a single blank line before the trailing return False, which the AST comparator normalises away.

Optional: the contamination-free benchmarking harness used by this

Step 4 (optional, `--hybrid` mode): the LLM fills function bodies

For every UnknownBlock left in the tree, pychd sends a function-body-sized prompt to the configured LLM:

You are a Python decompiler.
The following Python 3.14 bytecode is the body of:
    def from_json(cls, value)
Reconstruct the original Python source for *just the body*…

LOAD_FAST_BORROW cls
LOAD_FAST_BORROW value
LOAD_CONST 'type'
BINARY_SUBSCR
…

The LLM never sees the rest of the module; the rule pass already nailed the signatures, imports, and names. This keeps prompts small, costs low, and identifier hallucination rare. One LLM call per body, so on modules with many small functions the cost stays modest.

Step 5 (optional, `--hybrid-rewrite` mode): the LLM rewrites the whole module

The per-body path in Step 4 fixes bodies but leaves any module-level recovery mistakes (an inlined dict comprehension that collapsed to X = {}, a for-loop side effect that wasn't preserved) unchanged. --hybrid-rewrite adds a final whole-module rewrite call:

`
You are a Python decompiler. Reconstruct the original Python 3.14
source for an entire module from its disassembled bytecode.

You are given two inputs:
1. The complete disassembled bytecode (authoritative).
2. A partial rule-based recovery (declarations reliable; bodies +
   some module-level statements may be wrong).

Bytecode disassembly:


Partial recovery:


Output ONLY valid Python 3.14 source code. Preserve every
class/function/import name from the partial recovery. Fix
module-level statements the rule pass got wrong by reading the
bytecode. The output must pass `ast.parse` and `py_compile`.

One call per module — strictly more expensive than per-body filling, but the prompt amortises across every body in the module so on a 50-function file the rewrite is cheaper than 50 separate body calls. The output is sanity-checked with ast.parse and the rule-only output is used as a fallback if the rewrite fails to parse.

This is the mode the headline benchmark numbers are reported under, and the one the README's worked examples show.

Aggregate over all 2,794 modules

Mode	`parses`	`signature_match`	`declaration_match`	`strict_match`	`BS`
Rule-only (no LLM, deterministic)	100 %	99.7 %	99.7 %	43.1 %	19.3 %
Hybrid-rewrite (rule pass + 1 Codex call/module)	100 %	99.7 %	99.7 %	86.5 %	43.2 %

Pass@1 on HumanEval: rule-only 2.4 % → hybrid-rewrite 97.6 %, but every HumanEval prompt is in the backend model's training data, so this is mostly an LLM-solves-HumanEval-from-memory signal rather than a decompilation signal.

Per-corpus recovery rate (rule-only vs hybrid-rewrite)

Per-tool comparison at each decompiler's preferred Python version

The take-away for anyone reading benchmark numbers for an LLM-assisted decompiler: separate the rule-only baseline from the LLM lift, and measure on a corpus the backend model cannot have seen. This repo is the first I know of to ship both halves of that — pychd-pyfuzz + pychd-pyobf are independent PyPI packages so other Python decompiler authors can drop the same harness into their CI. See §LLM contamination disclosure for the worked example (_colorize.py) and §Comparison with prior Python decompilers for the 23-module stdlib + PyPI head-to-head against uncompyle6 / decompyle3 / pycdc / PyLingual.

Pipeline at a glance

pychd routes every .pyc through two passes:

- Rule pass owns everything CPython compiles to a deterministic bytecode shape — imports, class/function declarations, signatures, decorators (incl. arguments), PEP 695 generics, PEP 749 lazy annotations, common one-line bodies (return self.x, return cls(...), constructor self.x = x, etc.). Output is reproducible offline and audit-friendly. Bodies it can't recover remain as pass. - Codex rewrite runs once per module with the disassembly + the rule pass's partial output as context. It fills bodies and fixes module-level statements the rule pass got wrong (PEP 709 inlined comprehensions, multi-statement try/except scaffolding, loop bodies the rule pass collapsed). Bytes go in, source comes out — the LLM never sees the original source.

(Aggregate numbers across all 2,794 modules are in the headline table at the top of the README. Per-axis ceilings are below.)

flowchart LR pyc["foo.pyc"] -- detect magic --> ver["Python version"] ver -- 3.14 --> nat["native rule pass
(deterministic, no LLM)"] ver -- "3.0–3.13" --> cv["cross-version rule pass
(xdis, no LLM)"] nat --> ir["pychd.ir
(typed IR)"] cv --> ir ir -. partial recovery .-> llm["Codex rewrite
(1 call / module)"] ir & llm --> rec["recovered .py"] style nat fill:#d4ffd4 style cv fill:#d4e6ff style rec fill:#fff4d4

Why bodies-as-pass happens in rule-only: a function body that compiles to non-trivial control flow (multiple statements, loops, branches, match) is many-to-one in bytecode — the same opcode sequence can come from several different source expressions. Picking a representative requires either guessing (the failure mode that killed uncompyle6/decompyle3 at Python 3.8) or asking an oracle. pychd chooses the oracle, so the rule pass deliberately leaves an UnknownBlock for the rewrite step to fill.

Hybrid-rewrite — rule pass + one LLM rewrite per module (fixes

body fills and module-level recovery). Recommended when you

Detailed recovery walkthrough — what happens to a real module

This section shows the four-stage recovery pipeline against a single example module — what each stage adds — so you can see why both the rule pass and the LLM are needed and what they contribute respectively.

The example: a slimmed-down dataclass module with three things the rule pass handles trivially (imports, decorators, signatures), one thing the trivial-body matcher lifts (a single-statement from_json classmethod), one thing only the LLM body fill can recover (a multi-statement __post_init__), and one thing only the `hybrid-rewrite` module-level fix-up can clean (a module-level dict-comprehension that the rule pass renders as `X = {}`).

Original agent.py:

from dataclasses import dataclass, field
from typing import Any

_ALIAS = {old: new for old, new in [('uid', 'uuid'), ('msg', 'message')]}

@dataclass(frozen=True)
class AgentMessage:
    type: str
    uuid: str
    agent_id: str
    message: Any = None
    tags: list[str] = field(default_factory=list)

    def __post_init__(self):
        if not self.type:
            raise ValueError("type must be non-empty")
        object.__setattr__(self, "type", self.type.lower())

    @classmethod
    def from_json(cls, value):
        return cls(
            type=value["type"],
            uuid=value["uuid"],
            agent_id=value["agentId"],
            message=value.get("message"),
        )

Step D: hybrid-rewrite corrects module-level mis-recoveries

pychd decompile --hybrid-rewrite --backend codex adds a final whole-module rewrite step: the LLM gets the disassembly of the entire module plus the rule pass' partial output, and emits the corrected full source. This catches:

- Module-level comprehensions the rule pass collapsed to X = {} / X = [] / X = .... - For-loop bodies whose loop variable leaked into top-level declarations (now suppressed by the rule pass' FOR_ITER skip, but the rewrite repairs older recoveries cleanly). - Multi-line dict literals whose MAP_ADD accumulator pattern was mis-read. - Module-level if __name__ == "__main__": guards. - Multi-statement try/except scaffolding.

Diff vs Step C:

-_ALIAS = {}
+_ALIAS = {old: new for old, new in [('uid', 'uuid'), ('msg', 'message')]}

Cost: one LLM call per module instead of one per body, so on modules with many small bodies (stdlib-full, pypi-top20) the rewrite is actually cheaper than per-body hybrid. The trade-off is prompt size — the rewrite sends the full module disassembly, so very large modules push closer to the model's context window. On the benchmark corpora this is rarely an issue (the largest single file fits comfortably).

This is the mode the headline numbers in Benchmarks are reported under.

Rule-only vs hybrid-rewrite ceiling

What each axis can / cannot recover from bytecode alone, aggregated over all 2,794 modules:

Axis	Rule-only	Hybrid-rewrite	What the rule pass cannot reach without an oracle
`parses`	100 %	100 %	—
`signature_match`	99.7 %	99.7 %	Residual is `if False:` / `if 0:` guards (`_colorize.py`, `_pylong.py`) whose contents the constant folder erases — no decompiler can recover them. Hybrid does not move the needle here. See [§LLM contamination disclosure](#llm-contamination-disclosure).
`declaration_match`	99.7 %	99.7 %	Same.
`strict_match`	43.1 %	86.5 %	CPython normalises docstrings via `inspect.cleandoc`, folds constants, and re-emits expressions in canonical form. The rewrite re-derives the canonical form from disassembly.
`BS` (behavioral_smoke)	19.3 %	43.2 %	A `pass`-bodied recovery imports but exposes no callable behaviour beyond signatures. Anonymised corpora drop hard here (see contamination differential).
`BN` (bytecode_normalized)	—	48.6 %	Tolerates lnotab + specialised-opcode noise but body recovery still required.
`FC` (Pass@1, HumanEval only)	2.4 %	97.6 %	The recovered module must behave like the original. HumanEval is published; the Pass@1 lift is largely memorisation rather than decompilation.

Comparison with prior Python decompilers

Four publicly-available decompilers compete with pychd on Python 3.x bytecode. Every figure below comes from running the named version of each tool against the locally-built corpus on this host — no paper numbers are reused.

The headline comparison axis is strict_match (stripped-AST equality). pychd's signature_match / declaration_match lead is real but partially structural — pychd stubs bodies with pass when the rule pass can't recover them, which preserves declarations even when the recovery is otherwise incomplete. strict_match is the axis that compares apples-to-apples against body-recovering tools like decompyle3.

Head-to-head on `synthetic` — Python 3.8

The eight synthetic modules compiled with Python 3.8 and handed to every 3.8-capable tool we have. Read this with the §LLM contamination disclosure in mind: these modules were drafted with LLM assistance during this project's development, so a high pychd score here is not evidence of contamination-free generalisation. We keep the table because it still measures whether the bytecode-driven pipeline produces syntactically valid, AST- matching source from a Python 3.8 .pyc — which decompyle3 fails to do on 2 of the 8 modules even with the source pattern available in its training data.

Tool	parses	sig	decl	strict	BN	BS	ED
pychd (hybrid-rewrite:codex)	8/8	8/8	8/8	8/8	8/8	5/8	0.968
`decompyle3` 3.9.3	6/8	6/8	6/8	3/8	0/8	0/8	0.551
`uncompyle6` 3.9.3	not run on this corpus yet	—	—	—	—	—	—

Source: assets/_synthetic_comparison.json (commit-tracked). Reproduce:

```bash uv run python tools/build_corpora.py --only synthetic

🇨🇳 中文文档镜像 AI 翻译 2026-05-30

英文原文章节由系统翻译为中文摘要，便于快速理解。完整原文见上方 "📑 README 深度解析"。

📌 简介

PyChD 是一款专为 Python 开发者设计的反编译器，能够读取 CPython 3.0 至 3.14 版本的 bytecode（字节码），并将其还原为原始的 .py 源代码。该工具采用确定性的规则传递（rule pass）流水线，通过解析声明、签名等信息，实现从二进制 code object 到可读源码的高质量转换。

⚡ 功能介绍

PyChD 能够应对不同 Python 版本间 bytecode 规范不稳定的挑战。例如，针对 Python 3.6 引入的 wordcode 机制（指令变为固定的 2 字节格式），PyChD 能够精准处理指令索引，确保在不同版本的 CPython 环境下都能实现可靠的指令解析与还原。

🛠 安装步骤（Docker/pip/源码）

您可以使用 uv 工具快速安装并运行 PyChD。若要反编译整个项目树并镜像其目录结构到输出目录，请使用命令：`uv run pychd decompile path/to/package/ -o recovered/`。该工具通过 uv 运行环境确保了依赖的隔离与高效执行。

🚀 使用教程

PyChD 支持多种模式。快速启动时，可以使用 `--hybrid-rewrite` 模式结合 `--backend codex` 来实现高精度的代码还原。对于仅需基于规则恢复的场景，可以使用 `--rules-only` 参数，这在处理如 re-export 模块时非常高效，且无需调用 LLM。

⚙️ 配置说明（含 MCP / env）

在可选的 `--hybrid` 模式下，PyChD 会将识别为 UnknownBlock 的函数体发送至配置的 LLM 进行修复。若开启 `--hybrid-rewrite` 模式，系统还会执行最终的全模块重写（whole-module rewrite），利用 LLM 修正模块层级的恢复错误，如内联字典推导式或循环副作用等问题。

🔌 API 说明

PyChD 的调用过程高度集成，通过 `--backend codex` 模式，工具可以直接利用您的 `codex login` 会话进行身份验证，无需手动配置 API key，实现了极简的命令行交互体验。

🔄 工作流/模块

PyChD 的核心工作流分为两阶段：首先是 Rule pass，负责处理所有确定性的字节码逻辑，包括 import、类/函数声明、签名、装饰器以及 PEP 695/749 等新特性；随后是 Hybrid-rewrite 阶段，通过 LLM 对规则阶段无法完全覆盖的复杂逻辑进行智能补全与模块级重写，实现规则与 AI 的完美结合。

🎯 aiskill88 AI 点评 A 级 2026-05-26

高质量的Python字节码反编译工具

📚 实用指南（长尾问题）

适合谁

构建多智能体协作系统的 Agent 开发者
跨境业务、多语言内容运营团队

最佳实践

Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
Python 依赖冲突：建议用 venv / uv 隔离环境

部署方案

CLI：直接 npm install -g / pip install，命令行调用
云端托管：可放在 Vercel / Railway / Fly.io 等 PaaS 平台

⚡ 核心功能

开源免费，支持本地部署，数据完全自主可控
活跃的 GitHub 开源社区，持续迭代更新
提供详细文档和使用示例，新手友好
支持自定义配置，灵活适配不同使用环境
可作为基础组件集成进现有技术栈或进行二次开发

👥 适合谁

构建多智能体协作系统的 Agent 开发者
跨境业务、多语言内容运营团队

⭐ 最佳实践

Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

⚠️ 常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
Python 依赖冲突：建议用 venv / uv 隔离环境

👥 适合人群

AI 技术爱好者研究人员和学生开发者和工程师技术创业者

🎯 使用场景

本地部署运行，保护数据隐私，满足合规要求
自定义集成到现有系统，扩展技术栈能力
作为开源基础组件进行商业化二次开发

⚖️ 优点与不足

✅ 优点

+MIT 协议，可免费商用
+完全开源免费，无授权费用
+本地部署，数据完全自主可控
+开发者社区支持，遇问题可查可问

⚠️ 不足

−安装和初始配置可能需要一定技术基础
−功能完整性通常不如成熟商业产品
−技术支持主要依赖开源社区，响应速度不稳定

⚠️ 使用须知

AI Skill Hub 为第三方内容聚合平台，本页面信息基于公开数据整理，不对工具功能和质量作任何法律背书。

建议在沙箱或测试环境中充分验证后，再部署至生产环境，并做好必要的安全评估。

📄 License 说明

🔗 相关工具推荐

yt-dlp 视频下载

功能强大的开源视频下载工具，支持YouTube、TikTok等数千个视频平台，可自动下载视频、字幕、封面和元数据。适合内

transformers AI技能包

Hugging Face开源的深度学习框架，提供预训练语言模型、视觉模型和多模态模型。集成BERT、GPT、Llama等

ComfyUI 节点式AI图像生成

强大的开源扩散模型可视化工具，提供图形界面、API和后端服务。采用节点图式设计，支持模块化工作流构建，适合AI绘图、图像

llama-cpp AI技能包

高效的大语言模型C/C++推理框架，支持在本地CPU/GPU上运行量化LLM模型，具有内存占用小、推理速度快的特点。适合

📰 相关 AI 新闻

AI 前沿资讯：What ClickUps mass layoff tell…

AI 资讯 · 知识关联

🍿 AI 圈相关吃瓜

AutoGPT 自主完成了任务：把我的文件夹全部重命名了

AI 圈观察

给 Agent 的目标是"提高效率"，三小时后它关掉了所有通知

AI 圈观察

Agent 帮我订了3次机票，全部是同一天的

🗺️ 相关解决方案

ai-workflow-templates

translation

ai-translation-pipeline

cli

cli-productivity

🧩 你可能还需要

基于当前 Skill 的能力图谱，自动补全的工具组合

total-agent-memory MCP工具

为Claude Code和Codex CLI提供持久化记忆功能的开源MCP工具。自动提取知识图谱，支持多轮对话上下文保留，适合需要长期记忆和

❓ 常见问题 FAQ

pychd 是什么工具？−

pychd 是一款Python开发的AI辅助工具。开源AI工具：Hybrid rule-based + Codex-LLM Python bytecode decompiler. Wins every axis vs unc。⭐45 · Python 主要应用场景包括：逆向工程和代码分析。

pychd 如何安装和开始使用？+

pychd 是否免费？许可证是什么？+

pychd 适合哪些用户使用？+

pychd 的社区活跃度和项目维护状况如何？+

安装这个工具需要什么基础？+

安装过程中遇到依赖冲突怎么办？+

工具安装成功但运行报错，该怎么处理？+

💡 AI Skill Hub 点评

AI Skill Hub 点评：Python字节码反编译的核心功能完整，质量优秀。对于AI 技术爱好者来说，这是一个值得纳入个人工具库的选择。建议先在非生产环境试用，再逐步推广。

📚 深入学习 Python字节码反编译

查看分步骤安装教程和完整使用指南，快速上手这款工具

⚙️ 安装教程 📚 使用教程

🌐 原始信息

原始名称	`pychd`
原始描述	开源AI工具：Hybrid rule-based + Codex-LLM Python bytecode decompiler. Wins every axis vs unc。⭐45 · Python
Topics	`bytecodedecompilerpythoncodex`
GitHub	https://github.com/diohabara/pychd
License	MIT
语言	Python

🔗 原始来源

🐙 GitHub 仓库 https://github.com/diohabara/pychd

收录时间：2026-05-26 · 更新时间：2026-05-30 · License：MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。

📺 订阅 AI Skill Hub Daily Telegram 频道

每天 8 条精选 AI Skill、MCP、Agent 与自动化工具推送

加入频道 →

Python字节码反编译

📚 深度解析

📋 工具概览

📖 中文文档

PyChD

What's hard about each version

3.6 — wordcode

3.7 — keyword arguments carry names as a tuple const

3.10 — match statements (PEP 634)

3.11 — PEP 657 zero-cost exceptions

3.12 — PEP 709 comprehension inlining

3.13 — CALL_INTRINSIC_1

The decompiler itself.

repo. Install both to drop the same fuzz → obfuscate → decompile

pipeline into your own decompiler's CI.

Adopting the same harness in your own decompiler

Decompile an entire project tree (mirrors structure into output dir):

How it works — compiler-pipeline perspective

Step 1: Python compiles your source to bytecode

What pychd builds internally for `from os.path import join`:

then compile with Python 3.8 and run pychd + decompyle3.

Broader head-to-head — 23-module stdlib + PyPI subset

Quick start

Worked example: `Lib/_colorize.py`

More CLI examples

Example 1: a re-export module (full rule recovery, 0 LLM calls)

Example 2: a dataclass module (full hybrid-rewrite recovery)

Example 3: a generic class (PEP 695, full hybrid-rewrite recovery)

Example 4: a HumanEval problem (full bytecode round-trip)

Optional: the contamination-free benchmarking harness used by this

Step 4 (optional, `--hybrid` mode): the LLM fills function bodies

Step 5 (optional, `--hybrid-rewrite` mode): the LLM rewrites the whole module

call per file. Uses your `codex login` session (no API key).

Aggregate over all 2,794 modules

Pipeline at a glance

Hybrid-rewrite — rule pass + one LLM rewrite per module (fixes

body fills *and* module-level recovery). Recommended when you

Detailed recovery walkthrough — what happens to a real module

Step D: hybrid-rewrite corrects module-level mis-recoveries

Rule-only vs hybrid-rewrite ceiling

Comparison with prior Python decompilers

Head-to-head on synthetic — Python 3.8

⚡ 核心功能

👥 适合人群

🎯 使用场景

⚖️ 优点与不足

🔗 相关工具推荐

❓ 常见问题 FAQ

3.10 — `match` statements (PEP 634)

3.13 — `CALL_INTRINSIC_1`

body fills and module-level recovery). Recommended when you

Head-to-head on `synthetic` — Python 3.8