经 AI Skill Hub 精选评估,Python字节码反编译 获评「强烈推荐」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 8.0 分,适合有一定技术背景的用户使用。
Python字节码反编译 是一款基于 Python 开发的开源工具,专注于 bytecode、decompiler、python 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
Python字节码反编译 是一款基于 Python 开发的开源工具,专注于 bytecode、decompiler、python 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install pychd
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install pychd
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/diohabara/pychd
cd pychd
pip install -e .
# 验证安装
python -c "import pychd; print('安装成功')"
# 命令行使用
pychd --help
# 基本用法
pychd input_file -o output_file
# Python 代码中调用
import pychd
# 示例
result = pychd.process("input")
print(result)
# pychd 配置文件示例(config.yml) app: name: "pychd" debug: false log_level: "INFO" # 运行时指定配置文件 pychd --config config.yml # 或通过环境变量配置 export PYCHD_API_KEY="your-key" export PYCHD_OUTPUT_DIR="./output"
A Python .pyc decompiler that reads CPython 3.0 – 3.14 bytecode and recovers the original .py. The pipeline is a deterministic rule pass (declarations, signatures, decorators, PEP 695 generics, PEP 749 lazy annotations) followed by one Codex CLI call per module to fill the bodies and module-level statements the rule pass can't recover from opcodes alone.
Headline result — hybrid-rewrite mode, 1,228 modules, 514K LoC:
| Metric | Score |
|---|---|
| **Strict AST match** | **1144/1228 (93.2%)** |
| **Pass@1 (HumanEval re-executability)** | **160/164 (97.6%)** |
| Behavioral smoke (import + public API) | 836/1228 (68.1%) |
| Bytecode-normalised round-trip | 600/1228 (48.9%) |
| Parses as valid Python 3.14 | 1228/1228 (100.0%) |
| Edit similarity (mean) | 0.753 |
Head-to-head on contamination-resistant code (8 modules written 2026-05-26, compiled with Python 3.8, no prior tool or LLM has seen the source):
| Tool | strict_match | parses | BN |
|---|---|---|---|
| **pychd (hybrid-rewrite, Codex)** | **8/8** | 8/8 | 8/8 |
decompyle3 (3.9.3) | 3/8 | 6/8 | 0/8 |
See §Contamination resistance for the methodology and §Comparison with prior Python decompilers for the broader 23-module stdlib + PyPI head-to-head.
A note on LLM contamination. A meaningful share of the headline corpus (stdlib, popular PyPI packages, HumanEval) was almost certainly inside the Codex backend model's training set. That's why the synthetic corpus exists — it's hand-written code committed in this repo today, so no LLM can have memorised it. The result above shows the hybrid-rewrite path generalises to bytecode the model has never seen.
The bytecode specification is not stable across Python versions. Below is a tour of the biggest source of pain for each release.
Every instruction became exactly two bytes: 1 opcode + 1 argument. Before 3.6 some opcodes took multi-byte arguments. Decompilers from the 3.5 era had to handle variable-length instructions; modern decompilers can index instructions by uniform position.
f(x=1) used to emit LOAD_CONST 1 and a magic CALL_FUNCTION_KW whose argument said "the top 1 thing is a keyword". From 3.7 the names of the keywords are pushed as a tuple constant:
LOAD_NAME f
LOAD_CONST 1
LOAD_CONST ('x',) ← names tuple
CALL_FUNCTION_KW 1
Decompilers have to read that tuple constant to know that the 1 is bound to x, not positional.
match statements (PEP 634)match x:
case 0: ...
case _: ...
becomes a chain of MATCH_CLASS / MATCH_KEYS / MATCH_MAPPING opcodes. Reconstructing the match-case structure from the bytecode requires recognising patterns the compiler emits — naive decompilers turn match into nested if/elif/else chains that execute the same but read very differently.
The biggest spec change in years. Try/except no longer uses SETUP_FINALLY blocks. Instead, every code object carries an exception table — pairs of (instruction range, handler offset). The bytecode looks completely linear; the exception structure is implicit in a side table.
Decompilers have to parse the exception table to recover the try/except structure at all.
This silently broke every decompiler. In 3.11:
x = [i * 2 for i in range(10)]
emits a separate <listcomp> code object that the outer module calls. In 3.12 the body of the comprehension is inlined directly into the enclosing scope — there's no <listcomp> code object to recurse into anymore. The comprehension is a stretch of the module's own bytecode that the decompiler must recognise structurally.
CALL_INTRINSIC_1Several special-purpose opcodes (notably the legacy IMPORT_STAR) collapse into CALL_INTRINSIC_1 with an integer argument:
```
uv run pychd decompile path/to/package/ -o recovered/
The CPython compiler takes your foo.py and emits foo.pyc — a binary file containing a code object for the module plus a nested code object for every function and class. Each code object holds:
- the bytecode instructions (one byte opcode + one byte argument, since 3.6 "wordcode"), - a co_consts tuple of constants used in those instructions, - a co_names tuple of identifier names, - a co_varnames tuple of local variable names, - argument counts (co_argcount, co_kwonlyargcount, etc.), - flag bits (co_flags: is it a coroutine? a generator? does it use *args?).
You can poke at this on any Python install:
>>> import dis
>>> def f(a, b=1): return a + b
>>> dis.dis(f)
1 RESUME 0
LOAD_FAST 0 (a)
LOAD_FAST 1 (b)
BINARY_OP 0 (+)
RETURN_VALUE
>>> f.__code__.co_argcount, f.__code__.co_varnames
(2, ('a', 'b'))
ir.FromImport(module="os.path", level=0, names=[("join", None)])
```bash uv tool install pychd pychd decompile path/to/module.pyc --hybrid-rewrite --backend codex
```bash
Original source (a typical __init__.py):
"""Public surface for the foo package."""
from .core import Bar, Baz
from .util import parse, as_dict
from .errors import FooError
__all__ = ["Bar", "Baz", "FooError", "as_dict", "parse"]
After pychd decompile --rules-only:
"""Public surface for the foo package."""
from .core import Bar, Baz
from .util import parse, as_dict
from .errors import FooError
__all__ = ['Bar', 'Baz', 'FooError', 'as_dict', 'parse']
Identical modulo single vs double quotes in __all__. Zero LLM cost, recovered in 0.9 ms.
Original:
from dataclasses import dataclass
from typing import Any
@dataclass(frozen=True)
class AgentMessage:
type: str
uuid: str
agent_id: str
message: Any = None
@classmethod
def from_json(cls, value):
return cls(
type=value["type"],
uuid=value["uuid"],
agent_id=value["agentId"],
message=value.get("message"),
)
After pychd decompile --hybrid-rewrite --backend codex (one LLM call per module; rule pass first, LLM corrects bodies + module-level recovery):
from dataclasses import dataclass
from typing import Any
@dataclass(frozen=True)
class AgentMessage:
type: str
uuid: str
agent_id: str
message: Any = None
@classmethod
def from_json(cls, value):
return cls(
type=value["type"],
uuid=value["uuid"],
agent_id=value["agentId"],
message=value.get("message"),
)
Byte-for-byte recovery on this shape — bytecode_exact round-trips under the producing 3.14 interpreter. The class declaration, every annotation, the @classmethod method decorator, the outer @dataclass(frozen=True) decorator with its keyword argument, and every method signature come straight from the rule pass; the body is filled by the LLM with the (signature + disassembly) it receives.
For the deterministic-only path:
<details><summary>Same input, <code>--rules-only</code> (no LLM)</summary>
from dataclasses import dataclass
from typing import Any
@dataclass(frozen=True)
class AgentMessage:
type: str
uuid: str
agent_id: str
message: Any = None
@classmethod
def from_json(cls, value):
return cls(type=value['type'], uuid=value['uuid'], agent_id=value['agentId'], message=value.get('message'))
The trivial-body matcher even lifts this single-statement method into a real return cls(...), so the rules-only output here is already behaviorally equivalent — the LLM is only needed for multi- statement bodies and complex module-level constructs.
</details>
Original:
class Stack[T]:
def __init__(self):
self.items: list[T] = []
def push(self, x: T) -> None:
self.items.append(x)
After pychd decompile --hybrid-rewrite --backend codex:
class Stack[T]:
def __init__(self):
self.items: list[T] = []
def push(self, x: T) -> None:
self.items.append(x)
Identical modulo whitespace. The PEP 695 type parameter [T] survives the rule pass — pychd recognises the synthetic <generic parameters of Stack> wrapper code object that the CPython compiler emits and unpacks it. Class-body and module-level annotations are recovered from the PEP 749 __annotate__ closure; parameter annotations (x: T) live in a separate per-method closure and the LLM rebuilds them from the disassembly during the rewrite step.
Original (HumanEval_0.py):
from typing import List
def has_close_elements(numbers: List[float], threshold: float) -> bool:
""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
False
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
True
"""
for idx, elem in enumerate(numbers):
for idx2, elem2 in enumerate(numbers):
if idx != idx2:
distance = abs(elem - elem2)
if distance < threshold:
return True
return False
After pychd decompile --hybrid-rewrite --backend codex:
from typing import List
def has_close_elements(numbers: List[float], threshold: float) -> bool:
""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
False
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
True
"""
for idx, elem in enumerate(numbers):
for idx2, elem2 in enumerate(numbers):
if idx != idx2:
distance = abs(elem - elem2)
if distance < threshold:
return True
return False
bytecode_exact, bytecode_normalized, behavioral_smoke, and functional_correctness (the HumanEval check(candidate) oracle) all pass — the recovered module compiles to byte-identical bytecode and passes every assertion. Only difference from the original is a single blank line before the trailing return False, which the AST comparator normalises away.
For every UnknownBlock left in the tree, pychd sends a function-body-sized prompt to the configured LLM:
You are a Python decompiler.
The following Python 3.14 bytecode is the body of:
def from_json(cls, value)
Reconstruct the original Python source for *just the body*…
LOAD_FAST_BORROW cls
LOAD_FAST_BORROW value
LOAD_CONST 'type'
BINARY_SUBSCR
…
The LLM never sees the rest of the module; the rule pass already nailed the signatures, imports, and names. This keeps prompts small, costs low, and identifier hallucination rare. One LLM call per body, so on modules with many small functions the cost stays modest.
The per-body path in Step 4 fixes bodies but leaves any module-level recovery mistakes (an inlined dict comprehension that collapsed to X = {}, a for-loop side effect that wasn't preserved) unchanged. --hybrid-rewrite adds a final whole-module rewrite call:
`
You are a Python decompiler. Reconstruct the original Python 3.14
source for an entire module from its disassembled bytecode.
You are given two inputs:
1. The complete disassembled bytecode (authoritative).
2. A partial rule-based recovery (declarations reliable; bodies +
some module-level statements may be wrong).
Bytecode disassembly: <full module disassembly>
Partial recovery: <rule pass output>
Output ONLY valid Python 3.14 source code. Preserve every
class/function/import name from the partial recovery. Fix
module-level statements the rule pass got wrong by reading the
bytecode. The output must pass `ast.parse` and `py_compile`.`
One call per module — strictly more expensive than per-body filling, but the prompt amortises across every body in the module so on a 50-function file the rewrite is cheaper than 50 separate body calls. The output is sanity-checked with ast.parse and the rule-only output is used as a fallback if the rewrite fails to parse.
This is the mode the headline benchmark numbers are reported under, and the one the README's worked examples show.
uv run pychd decompile path/to/module.pyc --hybrid-rewrite --backend codex
pychd routes every .pyc through two passes:
- Rule pass owns everything CPython compiles to a deterministic bytecode shape — imports, class/function declarations, signatures, decorators (incl. arguments), PEP 695 generics, PEP 749 lazy annotations, common one-line bodies (return self.x, return cls(...), constructor self.x = x, etc.). Output is reproducible offline and audit-friendly. Bodies it can't recover remain as pass. - Codex rewrite runs once per module with the disassembly + the rule pass's partial output as context. It fills bodies and fixes module-level statements the rule pass got wrong (PEP 709 inlined comprehensions, multi-statement try/except scaffolding, loop bodies the rule pass collapsed). Bytes go in, source comes out — the LLM never sees the original source.
| Mode | strict | BX | BN | BS | FC (HumanEval) | ED |
|---|---|---|---|---|---|---|
| **Hybrid-rewrite** (one Codex call per module — **default**) | **93.2%** (1144/1228) | 16.8% | 48.9% | 68.1% | **97.6%** (160/164) | 0.753 |
| Rules-only (deterministic, no LLM, offline, free) | 36.0% (438/1217) | 0.9% | 7.2% | 42.1% | 2.4% (4/164) | 0.445 |
Why bodies-as-pass happens in rule-only: a function body that compiles to non-trivial control flow (multiple statements, loops, branches, match) is many-to-one in bytecode — the same opcode sequence can come from several different source expressions. Picking a representative requires either guessing (the failure mode that killed uncompyle6/decompyle3 at Python 3.8) or asking an oracle. pychd chooses the oracle, so the rule pass deliberately leaves an UnknownBlock for the rewrite step to fill.
This section shows the four-stage recovery pipeline against a single example module — what each stage adds — so you can see why both the rule pass and the LLM are needed and what they contribute respectively.
The example: a slimmed-down dataclass module with three things the rule pass handles trivially (imports, decorators, signatures), one thing the trivial-body matcher lifts (a single-statement from_json classmethod), one thing only the LLM body fill can recover (a multi-statement __post_init__), and one thing only the `hybrid-rewrite` module-level fix-up can clean (a module-level dict-comprehension that the rule pass renders as `X = {}`).
Original agent.py:
from dataclasses import dataclass, field
from typing import Any
_ALIAS = {old: new for old, new in [('uid', 'uuid'), ('msg', 'message')]}
@dataclass(frozen=True)
class AgentMessage:
type: str
uuid: str
agent_id: str
message: Any = None
tags: list[str] = field(default_factory=list)
def __post_init__(self):
if not self.type:
raise ValueError("type must be non-empty")
object.__setattr__(self, "type", self.type.lower())
@classmethod
def from_json(cls, value):
return cls(
type=value["type"],
uuid=value["uuid"],
agent_id=value["agentId"],
message=value.get("message"),
)
pychd decompile --hybrid-rewrite --backend codex adds a final whole-module rewrite step: the LLM gets the disassembly of the entire module plus the rule pass' partial output, and emits the corrected full source. This catches:
- Module-level comprehensions the rule pass collapsed to X = {} / X = [] / X = .... - For-loop bodies whose loop variable leaked into top-level declarations (now suppressed by the rule pass' FOR_ITER skip, but the rewrite repairs older recoveries cleanly). - Multi-line dict literals whose MAP_ADD accumulator pattern was mis-read. - Module-level if __name__ == "__main__": guards. - Multi-statement try/except scaffolding.
Diff vs Step C:
-_ALIAS = {}
+_ALIAS = {old: new for old, new in [('uid', 'uuid'), ('msg', 'message')]}
Cost: one LLM call per module instead of one per body, so on modules with many small bodies (stdlib-full, pypi-top20) the rewrite is actually cheaper than per-body hybrid. The trade-off is prompt size — the rewrite sends the full module disassembly, so very large modules push closer to the model's context window. On the benchmark corpora this is rarely an issue (the largest single file fits comfortably).
This is the mode the headline numbers in Benchmarks are reported under.
| Axis | Rule-only achieved | Hybrid-rewrite achieved | What the rule pass cannot reach without an oracle |
|---|---|---|---|
parses | 100% | 100% | — |
signature_match | 99.8% | 100% | 0.2% rule-only residual: CPython constant-folded if False: blocks. |
declaration_match | 99.6% | 99.8% | Same. |
strict_match | 36.0% | **93.2%** | CPython normalises docstrings via inspect.cleandoc, folds constants, and re-emits expressions in canonical form. The rewrite re-derives the canonical form from disassembly. |
BS (behavioral_smoke) | 42.1% | **68.1%** | A pass-bodied recovery imports but exposes no callable behaviour beyond signatures. |
BX (bytecode_exact) | 0.9% | 16.8% | Identical Python source compiles to different co_consts ordering across runs. |
BN (bytecode_normalized) | 7.2% | **48.9%** | Tolerates lnotab + specialised-opcode noise but body recovery still required. |
FC (Pass@1) | 2.4% | **97.6%** on HumanEval | The recovered module must *behave* like the original. |
Four publicly-available decompilers compete with pychd on Python 3.x bytecode. Every figure below comes from running the named version of each tool against the locally-built corpus on this host — no paper numbers are reused.
The headline comparison axis is strict_match (stripped-AST equality). pychd's signature_match / declaration_match lead is real but partially structural — pychd stubs bodies with pass when the rule pass can't recover them, which preserves declarations even when the recovery is otherwise incomplete. strict_match is the axis that compares apples-to-apples against body-recovering tools like decompyle3.
The eight synthetic modules (see §Contamination resistance) compiled with Python 3.8 and decompiled by each available 3.8-capable tool. None of these modules existed before 2026-05-26, so any tool that scores high here cannot be memorising.
| Tool | parses | sig | decl | **strict** | BN | BS | ED |
|---|---|---|---|---|---|---|---|
| **pychd (hybrid-rewrite:codex)** | 8/8 | 8/8 | 8/8 | **8/8** | **8/8** | 5/8 | 0.968 |
decompyle3 3.9.3 | 6/8 | 6/8 | 6/8 | 3/8 | 0/8 | 0/8 | 0.551 |
uncompyle6 3.9.3 | not run on this corpus yet | — | — | — | — | — | — |
Source: assets/_synthetic_comparison.json (commit-tracked). Reproduce:
```bash uv run python tools/build_corpora.py --only synthetic
高质量的Python字节码反编译工具
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
AI Skill Hub 点评:Python字节码反编译 的核心功能完整,质量优秀。对于AI 技术爱好者来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | pychd |
| Topics | bytecodedecompilerpythoncodex |
| GitHub | https://github.com/diohabara/pychd |
| License | MIT |
| 语言 | Python |
收录时间:2026-05-26 · 更新时间:2026-05-26 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。