structuring_method = preliminary_text via build-time elaboration + new PipeStructure operator
Brings back the "text-then-object" capability removed in 16b775b8, but reshapes it as a build-time
rewrite into PipeSequence[PipeLLM(text), PipeStructure] instead of a runtime branch inside
PipeLLM. Adds a first-class PipeStructure operator that users can also call directly.
structuring_method = "preliminary_text" is now a directive. PipelexInterpreter rewrites the bundle through BundleElaborator before any pipe runs. The runtime never sees the directive.PipeLLM with the directive becomes a PipeSequence wrapping a synthetic PipeLLM (draft text) and a synthetic PipeStructure (structuring). The user-facing pipe code is preserved, so main_pipe, callers, and the run API are unaffected.PipeStructure takes one Text-compatible input, runs one LLM call against the canned structuring_prompt template, and produces any structured concept with full multiplicity support (Foo, Foo[], Foo[N]). Usable directly in MTHDS too — handy after PDF extraction, search results, or upstream prose.PipeLLM simplified. The field, validator, runtime branch, and execution-data line for structuring_method are gone. Less surface, fewer code paths, no NotImplementedError trap.PipelexBundleBlueprint.elaboration_metadata dict (excluded from serialization), not as a new field on every blueprint or runtime pipe.
Before this PR, structuring_method = "preliminary_text" existed as a field on the runtime
PipeLLM, but it raised NotImplementedError at run time — left over after the original
text-then-object implementation was removed in 16b775b8. The directive was effectively dead.
We have two kinds of users who want this:
PipeLLM.PipeLLM at all; they want a dedicated structuring step.
Solving both with a runtime branch on PipeLLM bloats the operator and hides a sequence behind one
pipe. Instead, we introduce PipeStructure as the underlying primitive, then make
structuring_method = "preliminary_text" a build-time shorthand that expands into it. The runtime
layer (and Temporal) never sees the directive — it sees only ordinary PipeSequence +
PipeLLM + PipeStructure.
Every code path that loads a bundle from a .mthds file goes through the elaborator.
The fast-path returns the input bundle unchanged (identity-preserving) when no
preliminary_text directive is present, so the common case pays no overhead.
preliminary_text rewrites into[pipe.review_restaurant]
type = "PipeLLM"
description = "..."
inputs = { transcript = "Text" }
output = "RestaurantReview"
structuring_method = "preliminary_text"
prompt = """Write a thorough review …
@transcript
"""
# original code reused for the wrapping sequence
[pipe.review_restaurant]
type = "PipeSequence"
inputs = { transcript = "Text" }
output = "RestaurantReview"
steps = [
{ pipe = "review_restaurant__draft_text",
result = "draft_text" },
{ pipe = "review_restaurant__structure",
result = "review_restaurant" },
]
[pipe.review_restaurant__draft_text]
type = "PipeLLM"
inputs = { transcript = "Text" }
output = "Text" # always single Text
prompt = "...verbatim..."
[pipe.review_restaurant__structure]
type = "PipeStructure"
inputs = { draft_text = "Text" }
output = "RestaurantReview" # multiplicity preserved
model = original.model_to_structure
The original pipe code is reused for the wrapping sequence, so anything that referenced
review_restaurant — other pipes, main_pipe, the run API — keeps working unchanged.
Step 1 always emits a single Text, even when the original output is Foo[]
or Foo[3]. Step 2 (PipeStructure) is the one that fans out: one preliminary text →
N structured objects. This matches the deleted make_text_then_object_list behavior verbatim.
pipelex/core/interpreter/bundle_elaborator.py
A single class with one public classmethod elaborate(bundle). The dispatch is intentionally narrow —
there is exactly one elaboration kind today. The structure is set up so adding a second kind is mechanical, but
no premature plugin registry.
@classmethod
def elaborate(cls, bundle: PipelexBundleBlueprint) -> PipelexBundleBlueprint:
if not bundle.pipe or not any(_is_preliminary_text_pipe(bp) for bp in bundle.pipe.values()):
return bundle # fast-path: identity-preserving short-circuit
existing_codes: set[str] = set(bundle.pipe.keys())
new_pipe_dict: dict[str, PipeBlueprintUnion] = {}
elaboration_metadata: dict[str, ElaborationMetadata] = {}
for pipe_code, pipe_blueprint in bundle.pipe.items():
if _is_preliminary_text_pipe(pipe_blueprint):
cls._elaborate_preliminary_text(
pipe_code=pipe_code,
pipe_blueprint=pipe_blueprint,
new_pipe_dict=new_pipe_dict,
elaboration_metadata=elaboration_metadata,
existing_codes=existing_codes,
)
else:
new_pipe_dict[pipe_code] = pipe_blueprint
# Defense in depth: synthesized pipes must never themselves carry the directive.
for synthetic_code, synthetic_blueprint in new_pipe_dict.items():
if synthetic_code in elaboration_metadata and _is_preliminary_text_pipe(synthetic_blueprint):
raise BundleElaboratorError(...)
elaborated = bundle.model_copy(update={
"pipe": new_pipe_dict,
"elaboration_metadata": elaboration_metadata,
})
# Re-run bundle-level validators against the synthetic pipes.
try:
PipelexBundleBlueprint.model_validate(elaborated.model_dump(by_alias=True))
except ValidationError as exc:
raise BundleElaboratorError(...) from exc
return elaborated
A module-level TypeGuard narrows the union type at the iteration site so the dispatch helper
receives a properly-typed PipeLLMBlueprint:
def _is_preliminary_text_pipe(pipe_blueprint: PipeBlueprintUnion) -> TypeGuard[PipeLLMBlueprint]:
if not isinstance(pipe_blueprint, PipeLLMBlueprint):
return False
method = pipe_blueprint.structuring_method
return method is not None and method.is_preliminary_text
@@ make_pipelex_bundle_blueprint @@
try:
pipelex_bundle_blueprint = PipelexBundleBlueprint.model_validate(blueprint_dict)
pipelex_bundle_blueprint.source = str(bundle_path) if bundle_path else None
- return pipelex_bundle_blueprint
except ValidationError as exc:
...
+
+try:
+ return BundleElaborator.elaborate(bundle=pipelex_bundle_blueprint)
+except BundleElaboratorError as exc:
+ raise PipelexInterpreterError(message=str(exc)) from exc
PipeStructure operator
Mirrors the shape of PipeLLM for object generation but stripped of everything that doesn't apply:
no user-controlled prompt template, no image/document inputs, no system prompt, exactly one Text input.
| Field | Type | Notes |
|---|---|---|
| type | "PipeStructure" | Discriminator |
| inputs | dict | Exactly one entry. Concept must be Text or refine Text. No multiplicity (use PipeBatch). |
| output | string | Any structured concept, with optional multiplicity (Foo, Foo[], Foo[N]). Cannot be Text. |
| model | LLMModelChoice | None | Falls back to llm_choice_overrides.for_object → llm_choice_defaults.for_object. |
_live_run_operator_pipetext_str = working_memory.get_stuff_as_str(name=self.text_input_name)
multiplicity_resolution = output_multiplicity_to_apply(
base_multiplicity=self.output_multiplicity,
override_multiplicity=pipe_run_params.output_multiplicity,
)
is_multiple_output = multiplicity_resolution.is_multiple_outputs_enabled
fixed_nb_output = multiplicity_resolution.specific_output_count
llm_choice_for_object = (
self.llm_choice
or model_deck.llm_choice_overrides.for_object
or model_deck.llm_choice_defaults.for_object
)
llm_setting_for_object = model_deck.get_llm_setting(llm_choice=llm_choice_for_object)
structuring_template = llm_config.get_template(template_name="structuring_prompt")
rendered_user_prompt = await render_template(
template=structuring_template,
category=TemplateCategory.LLM_PROMPT,
context={"text": text_str},
)
if llm_config.is_structure_prompt_enabled:
rendered_user_prompt += await get_output_structure_prompt(
concept_ref=self.output.concept.concept_ref
)
llm_prompt = LLMPrompt(user_text=rendered_user_prompt)
content_class = get_class_registry().get_required_subclass(
name=self.output.concept.structure_class_name, base_class=StuffContent,
)
if is_multiple_output:
generated_objects = await content_generator.make_object_list(
job_metadata=job_metadata, object_class=content_class,
llm_prompt_for_object_list=llm_prompt,
llm_setting_for_object_list=llm_setting_for_object,
nb_items=fixed_nb_output,
)
the_content = ListContent(items=generated_objects)
else:
the_content = await content_generator.make_object(...)
structuring_prompt = """
Read the following text carefully and produce the requested structured output from it.
---
{{ text }}
"""
The operator deliberately has no user-controlled prompt template — the only variable is text,
fed automatically from the declared input. Customization for preliminary_text is captured as a
follow-up; if a user really needs custom prompts they author the two pipes by hand.
PipeLLM
The runtime PipeLLM no longer carries structuring_method at all. The field, validator,
the NotImplementedError trap, the execution_data_dict entry, and the factory's forwarding
of the field are all removed.
- structuring_method: StructuringMethod | None = None
- @model_validator(mode="after")
- def validate_output_concept_consistency(self) -> Self:
- if self.structuring_method is not None and self.output.concept.structure_class_name == NativeConceptCode.TEXT:
- msg = (
- f"Output concept '{self.output.concept.code}' is considered a Text concept, "
- f"so it cannot be structured. Maybe you forgot to add '{NativeConceptCode.TEXT}' to the class registry?"
- )
- raise ValueError(msg)
- return self
# inside _live_run_operator_pipe:
- if self.structuring_method is not None:
- match self.structuring_method:
- case StructuringMethod.PRELIMINARY_TEXT:
- msg = (
- f"PipeLLM '{self.code}': structuring_method='preliminary_text' is not currently supported. "
- "The text-then-object mechanism was removed; a new implementation is planned."
- )
- raise NotImplementedError(msg)
- case StructuringMethod.DIRECT:
- pass
# inside execution_data_dict:
- if self.structuring_method is not None:
- execution_data_dict["structuring_method"] = self.structuring_method
structuring_method remains part of the language surface. PipeLLMBlueprint gets a
model_validator(mode="after") that mirrors the elaborator's pre-check, so the user gets the error
during model_validate (parse time) instead of at elaboration time. The elaborator's check stays
as defense-in-depth (only reachable via model_construct).
class StructuringMethod(StrEnum):
DIRECT = "direct"
PRELIMINARY_TEXT = "preliminary_text"
@property
def is_preliminary_text(self) -> bool: # avoid `==` against enum (project rule)
match self:
case StructuringMethod.PRELIMINARY_TEXT:
return True
case StructuringMethod.DIRECT:
return False
class PipeLLMBlueprint(PipeBlueprint):
...
structuring_method: StructuringMethod | None = None
@model_validator(mode="after")
def validate_preliminary_text_output(self) -> Self:
if self.structuring_method is None or not self.structuring_method.is_preliminary_text:
return self
output_parse_result = parse_concept_with_multiplicity(self.output)
if QualifiedRef.parse(output_parse_result.concept_ref_or_code).local_code == NativeConceptCode.TEXT:
raise ValueError(
f"PipeLLM with `structuring_method = preliminary_text` cannot have output `{self.output}`. "
"The output must be a structured concept, not Text."
)
return self
StructuringMethod moved from
pipelex.pipe_operators.llm.pipe_llm to
pipelex.pipe_operators.llm.pipe_llm_blueprint (it now lives next to the only consumer).
CHANGELOG entry calls this out.
PipeLLMSpec still exposes the directive
AI agents authoring via specs can still opt in. PipeLLMSpec gains a plain
structuring_method: StructuringMethod | None = None field (intentionally not
SkipJsonSchema, so it shows up in the JSON schema we hand to agents) and forwards it in
to_blueprint().
elaboration_metadata side-table
Synthetic-pipe metadata lives on the bundle, not on every pipe blueprint and not on the runtime
PipeAbstract. The user-facing per-pipe schema stays unpolluted.
class StepRole(StrEnum):
DRAFT_TEXT = "draft_text"
STRUCTURE = "structure"
class ElaborationMetadata(BaseModel):
parent_pipe_code: str
step_role: StepRole
class PipelexBundleBlueprint(BaseModel):
...
# Process-local. Survives model_copy. Dropped by any model_dump→model_validate
# round-trip (exclude=True keeps MTHDS / TOML / JSON exports clean).
elaboration_metadata: dict[str, ElaborationMetadata] | None = Field(default=None, exclude=True)
def get_elaboration_for(self, pipe_code: str) -> ElaborationMetadata | None:
if not self.elaboration_metadata:
return None
return self.elaboration_metadata.get(pipe_code)
BundleElaborator.elaborate(...), the metadata is reachable via the bundle reference.model_copy-built bundle (with metadata intact).
The dependency loader in LibraryManager._load_single_dependency walks the side-table to keep
synthetic helpers attached to exported parents. Without this, exporting review_restaurant from a
package would leave the wrapping PipeSequence referencing unresolved
review_restaurant__draft_text / __structure codes:
all_exported = resolved_dep.exported_pipe_codes | main_pipes
synthetic_helpers: set[str] = set()
for blueprint in dep_blueprints:
if not blueprint.elaboration_metadata:
continue
for synthetic_code, meta in blueprint.elaboration_metadata.items():
if meta.parent_pipe_code in all_exported:
synthetic_helpers.add(synthetic_code)
all_exported |= synthetic_helpers
Three layers guard against the only authoring mistake the elaborator can hit — combining
preliminary_text with a Text output:
| Layer | Where | What it catches |
|---|---|---|
| 1. Construction | PipeLLMBlueprint.validate_preliminary_text_output |
String-level: rejects "Text", "native.Text", "Text[]", "Text[N]". Fires during model_validate, before the elaborator runs. |
| 2. Defense-in-depth | BundleElaborator._elaborate_preliminary_text |
Same string-level check — only reachable if a caller bypassed validation via model_construct. Test suite exercises it. |
| 3. Library-time | PipeStructure.validate_output_with_library |
Concept-level: catches a domain concept that refines = "Text" — those slip past string-level guards because they don't read as Text in the source. |
The elaborator additionally guards against:
is_pipe_code_valid (snake_case + length).structuring_method = preliminary_text itself.PipelexBundleBlueprint.model_validate on the elaborated form, wraps any ValidationError in BundleElaboratorError with the originating-pipe context.
Test counts: PipeStructure-focused subset is 92 passed. Touched-source areas overall:
1506 passed, 1 xfailed. make agent-check and make docs-check clean.
tests/unit/pipelex/core/bundles/test_elaboration_metadata.pytests/unit/pipelex/core/interpreter/test_bundle_elaborator.pytests/unit/pipelex/core/interpreter/test_interpreter_preliminary_text.pytests/unit/pipelex/pipe_operators/pipe_structure/test_pipe_structure_blueprint.pytests/unit/pipelex/pipe_operators/pipe_structure/test_pipe_structure_factory.pytests/unit/pipelex/pipe_operators/pipe_structure/test_pipe_structure_validate_inputs.pytests/unit/pipelex/pipe_operators/pipe_structure/test_pipe_structure_validate_outputs.pytests/unit/pipelex/pipe_operators/pipe_structure/test_pipe_structure_kajson.pytests/unit/pipelex/builder/pipe/pipe_operator/pipe_structure/test_pipe_structure_spec.pytests/unit/pipelex/libraries/dependencies/test_dependency_preliminary_text_export.pytests/integration/pipelex/pipes/operator/pipe_structure/test_pipe_structure.pytests/integration/pipelex/pipes/operator/pipe_structure/test_pipe_structure_in_sequence.pytests/integration/pipelex/pipes/operator/pipe_structure/test_pipe_structure_in_batch.pytests/integration/pipelex/pipes/operator/pipe_structure/test_preliminary_text_e2e.pytests/integration/pipelex/pipes/operator/pipe_structure/test_preliminary_text_inline_e2e.pypreliminary_text with real LLM calls, parametrized over all three multiplicity forms (Foo, Foo[], Foo[2]), against both Python-class concepts (RestaurantReview, 8 fields) and inline-structure concepts (HikingTripReport, 12 fields with lists).PipeStructure inside a hand-authored PipeSequence (no elaboration sugar) — confirms the operator is usable on its own.PipeStructure inside a PipeBatch — three free-form review texts, per-item shape assertions.PipeStructureBlueprint, for a PipeLLMBlueprint carrying the directive, and for every blueprint in an elaborated bundle.type = "PipeStructure" directly in a .mthds
file currently fails plxt schema validation: the bundled schema in
vscode-pipelex/crates/taplo-common/schemas/mthds_schema.json predates this PR. The
preliminary_text path is unaffected (the synthesized PipeStructure lives in-memory
only). Cross-repo follow-up: ship a pipelex-tools release with the regenerated schema. Captured
in TODOS.md.
Listed in TODOS.md; none of these block this PR.
mthds-ui graph viewer integration.PipeStructure image-input support.preliminary_text.pipelex-dev elaborate-bundle <path> debugging CLI.StructuringMethod.DIRECT (functionally identical to None today; kept for symmetry).elaboration_metadata across serialization boundaries — drop exclude=True when a second cross-boundary consumer materializes (graph viewer over a serialized bundle, Temporal payload, persistent library cache). Today the only consumer is the in-process dependency loader.
Brief generated from the diff between bb9bdb32 and HEAD on
feature/Text-then-object. See TODOS.md for the full phase-by-phase plan, decisions
taken, and audit notes; docs/under-the-hood/build-time-elaboration.md for the user-facing
mechanism doc; docs/building-methods/pipes/pipe-operators/PipeStructure.md for the operator
reference.