Making Pipelex setup and dry-run work without a network — by adding a primed, last-resort cache for the Gateway remote config, with honest provenance tracking and no silent fallbacks.
Pipelex's Gateway needs a remote config — the catalogue of which models exist and on which backends. On every setup() that touches model specs, Pipelex fetched that config over HTTP. If the network was down, setup crashed — even for commands that never make an inference call.
That hurt in two real places: agent CLI commands like validate, inputs, and run --dry-run all pass needs_model_specs=True, and sandboxed environments (Codex local sandbox) have no outbound network at all. None of these run a model — they just need to know the catalogue. Yet a missing network broke them.
pipelex.py:235 unconditionally called RemoteConfigFetcher.fetch_remote_config() whenever the Gateway was enabled and model specs were needed. Any network failure raised RemoteConfigFetchError straight out of setup. The only escape hatch was a Codex-Cloud-specific short-circuit — local sandboxes and plain offline use were not covered.
Gateway disabled → no remote fetch is ever attempted. Setup completes with zero network.
Gateway enabled but remote temporarily unreachable → fall back to a cache primed at init time.
Unknown Gateway models fail loudly against fresh or cached specs. Stale data is always labelled.
Non-objective: we did not make inference itself work offline. A real model call still needs the network. This PR is strictly about setup and dry-run.
The whole thing hinges on one idea: the fetcher returns a result that knows where it came from. Everything downstream branches on that provenance.
source = FRESH~/.pipelex/cache/remote_config.json → source = CACHEDRemoteConfigUnavailableErrorsource flows downstream │GatewayUnknownModelError (message branches on source)RemoteConfigStaleWarning · disable telemetry · surface a warnings field on the agent-CLI JSON envelopeIf the remote responds with JSON that fails validation, we raise RemoteConfigValidationError and stop — no cache fallback. We control both ends of that URL (it's versioned in pipelex-back-office), so a schema-rejecting payload is a real server bug, not an operational state. Failing loudly is correct. The cache is for network failures only.
remote_config_cache.pyRemoteConfigCache.store() / .load(). Stores the raw JSON dict from the response — not a re-serialised Pydantic dump — so the cache survives minor schema drift. Atomic writes (temp file + os.replace). Schema-versioned: a CACHE_SCHEMA_VERSION bump rejects old caches cleanly.
RemoteConfigSourceFRESH | CACHED. Lives in its own tiny types.py so cogt/ code can import it without dragging in httpx + tenacity. Exposes .is_cached so callers never write == CACHED.
RemoteConfigUnavailableError — fetch failed and no usable cache. Message names the cache path + remediation.GatewayUnknownModelError — a bundle references a Gateway model absent from the specs. Message branches on source (a CACHED source hints "run pipelex init while online").RemoteConfigStaleWarning — emitted when setup completes off a cached config.RemoteConfigFetchError — kept as the internal retry-layer exception. Removing it would have broken doctor, the CLI factory, and agent hints.The behavioural heart of the PR. The fetcher stopped returning a bare RemoteConfig and started returning a RemoteConfigResult that carries provenance:
- def fetch_remote_config(cls) -> RemoteConfig: + def fetch_remote_config(cls, require_fresh: bool = False) -> RemoteConfigResult: url = PipelexDetails.remote_config_url() try: payload, config = cls._fetch_fresh(url) + RemoteConfigCache.store(payload) # opportunistic refresh + return RemoteConfigResult(config=config, source=RemoteConfigSource.FRESH) except RemoteConfigFetchError as fetch_error: # network / HTTP only + if require_fresh: # doc generators refuse stale data + raise cls._build_unavailable_error(fetch_error, cache_refused=True) from fetch_error + cached = RemoteConfigCache.load() + if cached is None: + raise cls._build_unavailable_error(fetch_error) from fetch_error + return RemoteConfigResult( + config=cached.to_remote_config(), + source=RemoteConfigSource.CACHED, + cached_at=cached.cached_at, + )
And setup() consumes the provenance — warning, and tightening telemetry, when the data is stale:
- remote_config = RemoteConfigFetcher.fetch_remote_config() + remote_config_result = RemoteConfigFetcher.fetch_remote_config() + remote_config = remote_config_result.config + gateway_config_source = remote_config_result.source ... + if gateway_config_source.is_cached: + warnings.warn(f"Pipelex Gateway is running off a cached remote config " + f"(snapshot: {cached_at_iso}). Run `pipelex init` while online.", + RemoteConfigStaleWarning, stacklevel=2) # stale specs imply stale model identities — don't phone home in that state - is_pipelex_telemetry_enabled = is_pipelex_service_enabled and needs_inference + is_pipelex_telemetry_enabled = (is_pipelex_service_enabled and needs_inference + and not gateway_source_is_cached)
One deliberate design call: the warning is emitted in setup(), not in the fetcher. That keeps the fetcher a pure data-returning function, so test fixtures that swap in a cached fetcher don't have to replay warnings.
| Scenario | Gateway | Network | Cache | Outcome |
|---|---|---|---|---|
| BYOK offline | off | down | — | ● setup + validate + dry-run all OK — fetch never attempted |
| Gateway dry-run, fresh | on | up | — | ● fresh fetch, cache written |
| Gateway dry-run, cached | on | down | present | ● cache fallback + stale warning, dry-run OK |
| Gateway dry-run, cold | on | down | absent | ● RemoteConfigUnavailableError with remediation |
| Unknown model referenced | on | any | any | ● GatewayUnknownModelError, message branches on source |
| Doc generator offline | on | down | present | ● require_fresh=True refuses cache — no stale docs committed |
gateway_models_generator.py and preprocess_test_models_cmd.py regenerate committed reference docs and test fixtures. If they silently used the cache, they'd bake stale data into the repo. They pass require_fresh=True — any fallback becomes an immediate error instead.
The cache is only useful if it exists before the network goes down. So when pipelex init accepts the Gateway terms, it does one fetch and persists the result. If that is offline, init logs a yellow warning and continues — the cache stays empty, but the user has been told. The agent-CLI init mirrors this and surfaces cache_primed / cache_priming_error on its JSON envelope.
Built TDD across 7 phases — baseline lock-down → cache module → fetcher fallback → provenance plumbing → init priming → real-.mthds-bundle E2E → verification. Each phase is its own commit with a checkpoint block. make agent-check and make agent-test green throughout.
pipelex doctor. Pure additive.mthds-plugins/wip/codex-sandbox-escalation.md: --dry-run no longer needs escalation once Pipelex has been initialised online once.GatewayUnknownModelError — the E2E currently asserts the unknown handle is visible without pinning error_type (it fires at the pipe-operator layer). A deck-override test would pin the deck-level membership check end-to-end.