# anonde bench Python deps.
#
# These cover the engines the CI matrix runs (gliner-py sidecar +
# corpus loaders + scoring scripts). The local-only matrix also
# uses presidio-analyzer + spacy + en_core_web_lg for the Presidio
# baseline cell — those are intentionally NOT here so CI doesn't
# pay the ~700 MB spaCy-model cost on every run. Install them
# locally with:
#
#   pip install presidio-analyzer spacy
#   python -m spacy download en_core_web_lg

# HuggingFace Datasets — used by every corpus loader that streams gold
# from HF (conll2003_en/de, germeval_14, wnut_17, wikiann_de,
# pharmaconer_es, ai4privacy_en). Without this, those loaders all exit
# 2 with "missing dep: pip install datasets" and the matrix cells fail
# at the data-fetch step. Not pinned: dataset access is auto-converted
# parquet refs which are stable across recent versions.
datasets

# GLiNER Python package — used by bench/runners/gliner_sidecar.py.
# Pinned: bench numbers shift with gliner version.
gliner==0.2.26

# transformers — pulled transitively by gliner. gliner==0.2.26 pins
# transformers<5.2.0,>=4.51.3, so we pin to that band explicitly to keep
# the shared venv's resolution deterministic across CI runs.
#
# NOTE: this transformers is NOT usable for the openai-pf cell. The
# openai/privacy-filter model is a transformers-5.6+ native architecture
# (model_type "openai_privacy_filter"); transformers<5.2 raises
# KeyError: 'openai_privacy_filter' on load. gliner has no release that
# allows transformers>=5.6, so the two engines cannot share one venv.
# openai_pf.py therefore runs from its own isolated venv built from
# bench/requirements-openai-pf.txt — see CELL_openai_pf in bench/Makefile.
transformers<5.2.0,>=4.51.3

# pyyaml — corpus loaders + scoring scripts read label_map.yaml.
pyyaml

# torch is a transitive dep of gliner; do not pin a version here — let
# gliner's constraint pick a compatible one. But DO force the CPU-only
# wheel: the default PyPI torch drags in the full CUDA stack (nvidia-cu*
# libs, triton — several GB) which the bench never uses (gliner-py runs
# CPU-only) and which, together with the openai-pf venv's torch,
# overflows the ~14 GB GitHub runner disk. The CPU index supplies a
# CPU-only build of the same torch version.
--extra-index-url https://download.pytorch.org/whl/cpu
torch
