kordoc
Copyright 2026 chrisryugj
Licensed under the MIT License

This project includes code derived from the following third-party works:

================================================================================

OpenDataLoader PDF (opendataloader-pdf)
https://github.com/opendataloader-project/opendataloader-pdf

Copyright 2025-2026 Hancom, Inc.
Licensed under the Apache License, Version 2.0

Used in:
- src/pdf/line-detector.ts — TableBorderBuilder algorithm (line-based table
  detection via PDF graphics commands, grid construction from line intersections)
- src/pdf/cluster-detector.ts — ClusterTableConsumer algorithm (cluster-based
  borderless table detection via text alignment patterns)

Modifications:
- TypeScript rewrite from Java/Kotlin
- pdfjs-dist v4/v5 compatibility layer
- Korean PDF-specific optimizations (character spacing, cell text merging)

You may obtain a copy of the Apache License at:
http://www.apache.org/licenses/LICENSE-2.0

================================================================================

hml-equation-parser
https://github.com/OpenBapul/hml-equation-parser

Copyright 2018 Open Bapul
Licensed under the Apache License, Version 2.0

Used in:
- src/hwpx/equation.ts — hmlToLatex() (HULK-style HWPX equation script →
  LaTeX) derived from hulkEqParser.py and hulkReplaceMethod.py

Modifications:
- TypeScript rewrite from Python 3
- Inlined convertMap.json as TS constants
- Graceful error handling so failed conversion drops the equation rather
  than leaking the HWP alt-text ("수식입니다") through the document

See THIRD_PARTY/hml-equation-parser.LICENSE for the full license text.

You may obtain a copy of the Apache License at:
http://www.apache.org/licenses/LICENSE-2.0

================================================================================

cfb (Compound File Binary)
https://github.com/SheetJS/js-cfb

Copyright 2013-present SheetJS LLC
Licensed under the Apache License, Version 2.0

Used as a dependency for HWP 5.x OLE2 container parsing.
No modifications — used as a bundled npm package.

================================================================================

pdfjs-dist (PDF.js)
https://github.com/mozilla/pdf.js

Copyright Mozilla Foundation
Licensed under the Apache License, Version 2.0

Used as an optional peer dependency for PDF text extraction.
No modifications — used as an npm package.

================================================================================

JSZip
https://github.com/Stuk/jszip

Copyright 2009-2016 Stuart Knightley, David Duponchel, Franz Buchinger, Antonio Afonso
Dual-licensed under the MIT License OR GPL-3.0-or-later
This project uses JSZip under the MIT License.

Used for ZIP-based format parsing (HWPX, XLSX, DOCX).
No modifications — used as an npm package.

================================================================================

rhwp
https://github.com/edwardkim/rhwp

Copyright edwardkim
Licensed under the MIT License

Used in:
- src/hwp5/cfb-lenient.ts — LenientCfbReader algorithm (FAT/directory/mini-stream
  parsing for corrupted CFB containers)
- src/hwp5/crypto.ts — Distribution document decryption algorithm (AES key
  extraction from MSVC LCG-scrambled payload)

Modifications:
- TypeScript rewrite from Rust
- Added sector size validation, chain length limits, and cycle detection
- Pure JS AES-128 ECB implementation (no native dependency)

================================================================================

Pix2Text — Mathematical Formula Detection & Recognition
https://github.com/breezedeus/Pix2Text

Copyright 2023-2024 breezedeus
MFD (Mathematical Formula Detection): trained on Ultralytics YOLOv8 — MFD model
weights themselves are distributed under MIT via the HuggingFace repository
`breezedeus/pix2text-mfd`, but the underlying YOLOv8 architecture carries an
AGPL-3.0 license from Ultralytics. Users distributing derivative works built on
MFD outputs should review Ultralytics' licensing terms.
MFR (Mathematical Formula Recognition): HuggingFace `breezedeus/pix2text-mfr`,
MIT License.

Used in:
- src/pdf/formula/detector.ts — YOLOv8 letterbox preprocessing, NMS post-processing
- src/pdf/formula/recognizer.ts — DeiT encoder + TrOCR decoder greedy decoding
- src/pdf/formula/postprocess.ts — LaTeX output cleanup (derived from Pix2Text
  `latex_ocr.py` post_process)

Models are downloaded at runtime from HuggingFace with SHA-256 verification.
No model weights are redistributed with this package. Algorithm implementations
are original TypeScript ports referencing the Pix2Text algorithm descriptions.

You may obtain a copy of the MIT License at:
https://opensource.org/licenses/MIT
