PR #125 · AIML-758 · Slice 4C of 4

Prove S4C Local Parity Path

The first shared tool moved into core — now prove it still returns identical output through the new abstraction layer, and harden the auth pipeline against credential leaks.

Branch: AIML-758-s4c-prove-first-tool-local-parity Base: AIML-758-s4b-move-first-shared-tool (PR #124) Files: 6 changed Lines: +259 −2
1

Context

This PR is the final child of Slice 4 — the slice that extracts a shared API client so the same tool code can run in both the local stdio server and the future hosted remote server. Slice 4 is divided into three beads, each building on the last:

The core question S4C answers: does list_vulnerability_types produce identical output when invoked through SdkApiClient instead of the old reflection-injected SDK factories? If yes, the abstraction layer is safe, and the remaining 11 shared tools can be migrated with confidence in Slice 8. If no, we catch it here before any downstream work depends on the pattern.

Slice 4 Progression: From Abstraction to Proof S4A · PR #123 Client Boundary ContrastApiClient interface AuthenticationStrategy hook SdkApiClient local bridge DONE S4B · PR #124 First Tool Move list_vulnerability_types → core ToolContext forwarding Core boundary tests DONE S4C · THIS PR Local Parity Proof Golden JSON byte-equivalence Auth failure sanitization .setCause(e) stack trace removal Local-only regression guard Slice 5 (AIML-759) — Next Hosted server consumes contrast-mcp-core get_user_info adapts to shared BaseTool auth pipeline
Why this matters Without S4C, the module split in S4B is a structural change with no behavioral proof. The golden JSON test catches serialization drift, field-order changes, and accidental org ID leaks. The auth failure tests catch credential leaks in error responses — a gap discovered during PR #123 review. Together, they provide the confidence gate for migrating the remaining 11 tools in Slice 8.
2

Approach

S4C uses three complementary proof strategies — each catches a different class of regression that the others miss. No classes were moved in this PR (that was S4B). The scope here is two one-line production fixes and four new/expanded test files, distributed across both modules. The diagram below maps every changed file to its module and shows which test proves which production behavior.

PR #125 Change Map 2 production fixes · 4 test files · 1 gate script :contrast-mcp-core Shared between local & hosted servers PRODUCTION (src/main) SingleTool.java .setCause(e) → .addKeyValue(EXCEPTION_TYPE, ...)   +1 −1 PaginatedTool.java .setCause(e) → .addKeyValue(EXCEPTION_TYPE, ...)   +1 −1 TESTS (src/test) BaseToolAuthenticationStrategyTest.java +5 tests: auth failure (Single + Paginated), paginated happy path, paginated no-op, source-code regression   +108 Modified (production fix) New / expanded test :contrast-mcp-stdio-app Local stdio server only TESTS (src/test) ListVulnerabilityTypesLocalParityTest.java Golden JSON byte-equivalence through SdkApiClient   +84 NEW GetSastResultsToolTest.java Source-code guard: get_scan_results stays local-only   +16 hack/verify-s4c-local-parity.sh Temporary gate: runs targeted tests, scans for secrets   +49 NEW proves fix cross-module proof The parity test lives in stdio-app but exercises core’s ListVulnerabilityTypesTool through SdkApiClient Summary: 2 lines of production code changed · 257 lines of test code added Every production change has a mutation-sensitive test that fails if the change is reverted

1. Byte-equivalent golden JSON. The ListVulnerabilityTypesLocalParityTest serializes the tool's response to JSON bytes and compares against a hardcoded golden constant. This catches any change to field order, field names, null handling, or serialization behavior. If the SingleToolResponse record adds a field tomorrow, this test breaks — exactly what we want.

2. Auth failure sanitization. The BaseToolAuthenticationStrategyTest now covers the scenario where AuthenticationStrategy.authenticate() throws an exception. An attacker probing the hosted server with an expired token would trigger this path. The test proves the error response contains only a generic reference ID, not the exception message (which could contain token fragments or internal state).

3. Local-only regression guard. get_scan_results is the deprecated local-only tool that must never enter contrast-mcp-core. A source-code assertion in GetSastResultsToolTest verifies it still extends LocalSdkSingleTool and has no ContrastApiClient dependency. This prevents accidental migration during the Slice 8 bulk move.

This PR (S4C)
Golden JSON byte-equivalent parity test
Auth failure sanitization tests (Single + Paginated)
Stack trace removal (.setCause → exception type)
Local-only regression guard for get_scan_results
Temporary gate script (verify-s4c-local-parity.sh)
Not in This PR
Migrating remaining 11 shared tools (Slice 8)
BearerTokenApiClient for hosted server (Slice 5)
Rewriting all reflection-based test injection (Slice 8)
Removing LocalSdk* bridges (Slice 8)
3

The Security Fix: Stack Trace Removal

This is the only production code change in the entire PR — and it's a one-line fix applied to two files. Both PaginatedTool and SingleTool had a .setCause(e) call in their catch-all exception handler. That call attaches the full Java stack trace to the SLF4J log event, which means the stack trace — including internal class paths, method names, and potentially sensitive data from exception messages — flows into whatever log sink is configured.

Imagine an attacker sending a malformed request that triggers a NullPointerException deep in the SDK. With .setCause(e), the full stack trace (containing SDK class paths, connection pool internals, and possibly credential fragments in certain exception types) would be attached to the log entry. In the hosted server, log events could conceivably be exposed through monitoring dashboards. The fix replaces the stack trace with just the exception's simple class name — enough for debugging, without the information disclosure risk.

contrast-mcp-core/.../tool/base/SingleTool.java MODIFIED
 111    } catch (Exception e) {
 112      log.atError()
 113          .addKeyValue(LoggingKeys.REQUEST_ID, requestId)
-114          .setCause(e)
+114          .addKeyValue(LoggingKeys.EXCEPTION_TYPE, e.getClass().getSimpleName())
 115          .setMessage("Request failed unexpectedly")
 116          .log();
 117      return SingleToolResponse.error("An internal error occurred (ref: " + requestId + ")");
    
The tool response was already safe — it only returns the generic "An internal error occurred (ref: ...)" message. The issue was the log entry attached the full stack trace via .setCause(e). Now it logs only the exception class name (e.g., NullPointerException, IOException) as a structured key-value pair, which is sufficient for debugging without information disclosure.
contrast-mcp-core/.../tool/base/PaginatedTool.java MODIFIED
 128    } catch (Exception e) {
 129      log.atError()
 130          .addKeyValue(LoggingKeys.REQUEST_ID, requestId)
-131          .setCause(e)
+131          .addKeyValue(LoggingKeys.EXCEPTION_TYPE, e.getClass().getSimpleName())
 132          .setMessage("Request failed unexpectedly")
 133          .log();
 134      return PaginatedToolResponse.error(
    
Identical fix in the paginated pipeline. Both SingleTool and PaginatedTool share the same catch-all pattern — both needed the same treatment. The source-code regression test in Chapter 4 ensures .setCause(e) doesn't creep back into either file.
Security consideration This change was surfaced during PR #123 review. The auth strategy failure path (AuthenticationStrategy.authenticate() throwing) would pass through this same catch-all. In the hosted server, an expired OAuth token could trigger an IllegalStateException whose message contains the raw token value. With .setCause(e), that token would appear in logs. Now it doesn't.
4

Auth Pipeline Tests

BaseToolAuthenticationStrategyTest started in S4A with two tests for SingleTool: no-op when unconfigured, and strategy-before-doExecute when configured. S4C adds five new tests that cover the gaps identified during PR #123 review. The tests fall into three groups: auth failure handling, paginated pipeline coverage, and a source-code regression guard.

executePipeline() Auth Strategy Flow — Paths Under Test executePipeline() strategy configured? no NOOP → doExecute Tests 1 & 4 (from S4A) yes authenticate(ctx) throws? no doExecute close() Tests 2 & 5 yes Sanitized error (NEW) Tests 3 & 6 (this PR)

The diagram shows the seven test paths. Tests 1 and 2 (SingleTool no-op and happy path) existed from S4A. Tests 4 and 5 are the paginated equivalents added here. Tests 3 and 6 (red path) are the auth failure tests — the critical new coverage that proves exceptions from authenticate() produce sanitized error responses.

contrast-mcp-core/.../tool/base/BaseToolAuthenticationStrategyTest.java MODIFIED
+67  @Test
+68  void executePipeline_should_return_error_when_authentication_strategy_throws() {
+69    var events = new ArrayList<String>();
+70    var tool = new RecordingSingleTool(events);
+71    var context = new ToolContext(java.util.Map.of("requestId", "req-123"));
+72    tool.setAuthenticationStrategy(
+73        toolContext -> {
+74          events.add("authenticate");
+75          throw new IllegalStateException("raw-token-value must not reach tool output");
+76        });
+77
+78    var result = tool.executePipeline(TestParams::valid, context);
+79
+80    assertThat(result.isSuccess()).isFalse();
+81    assertThat(result.found()).isFalse();
+82    assertThat(result.data()).isNull();
+83    assertThat(result.errors()).hasSize(1);
+84    assertThat(result.errors().get(0)).matches("An internal error occurred \\(ref: [0-9a-f]{8}\\)");
+85    assertThat(String.join(" ", result.errors())).doesNotContain("raw-token-value");
+86    assertThat(events).containsExactly("authenticate");
+87  }
    
The exception message deliberately contains "raw-token-value" to simulate a real credential leak. Line 85 proves the error response doesn't contain it. Line 84 proves the response matches the generic reference-ID pattern. Line 86 proves doExecute was never reached — the exception short-circuited the pipeline at the authentication step.

The paginated pipeline has an identical set of three tests (no-op, happy path, auth failure) using a RecordingPaginatedTool inner class. The pattern is the same — the only difference is that executePipeline takes (page, pageSize, paramsSupplier, context) instead of (paramsSupplier, context), and the result type is PaginatedToolResponse with .items() instead of .data().

BaseToolAuthenticationStrategyTest.java — paginated auth failure MODIFIED
+121  @Test
+122  void paginatedExecutePipeline_should_return_error_when_authentication_strategy_throws() {
+123    var events = new ArrayList<String>();
+124    var tool = new RecordingPaginatedTool(events);
+125    var context = new ToolContext(java.util.Map.of("requestId", "req-123"));
+126    tool.setAuthenticationStrategy(
+127        toolContext -> {
+128          events.add("authenticate");
+129          throw new IllegalStateException("raw-token-value must not reach tool output");
+130        });
+131
+132    var result = tool.executePipeline(1, 2, TestParams::valid, context);
+133
+134    assertThat(result.isSuccess()).isFalse();
+135    assertThat(result.items()).isEmpty();
+136    assertThat(result.errors()).hasSize(1);
+137    assertThat(result.errors().get(0)).matches("An internal error occurred \\(ref: [0-9a-f]{8}\\)");
+138    assertThat(String.join(" ", result.errors())).doesNotContain("raw-token-value");
+139    assertThat(events).containsExactly("authenticate");
+140  }
    
Mirror of the SingleTool test above, but for PaginatedTool.executePipeline(page, pageSize, ...). Same assertions — sanitized error, no credential leak, doExecute never called. Both tool base classes share the same catch-all pattern, so both need the same test.
BaseToolAuthenticationStrategyTest.java — source-code regression MODIFIED
+30  private static final List<Path> PIPELINE_SOURCES =
+31      List.of(
+32          Path.of("src/main/java/.../tool/base/SingleTool.java"),
+33          Path.of("src/main/java/.../tool/base/PaginatedTool.java"));
+
+142  @Test
+143  void executePipeline_unexpected_error_logs_should_not_attach_stack_traces() throws Exception {
+144    for (var sourcePath : PIPELINE_SOURCES) {
+145      var source = Files.readString(sourcePath, StandardCharsets.UTF_8);
+146
+147      assertThat(source).doesNotContain(".setCause(e)");
+148    }
+149  }
    
This is a mutation-sensitive source-code assertion. It reads the actual Java source files and verifies .setCause(e) is absent. If someone reintroduces it (perhaps resolving a merge conflict carelessly, or adding a new catch block by copying old code), this test fails immediately. It's the enforcement arm of the production fix in Chapter 3.
BaseToolAuthenticationStrategyTest.java — test helper MODIFIED
+177  private static final class RecordingPaginatedTool extends PaginatedTool<TestParams, String> {
+178
+179    private final List<String> events;
+180
+181    private RecordingPaginatedTool(List<String> events) {
+182      this.events = events;
+183    }
+184
+185    @Override
+186    protected ExecutionResult<String> doExecute(
+187        PaginationParams pagination, TestParams params, WarningCollector collector) {
+188      events.add("doExecute");
+189      return ExecutionResult.of(List.of("ok"), 1);
+190    }
+191  }
    
Mirrors the existing RecordingSingleTool pattern established in S4A. Both inner classes record lifecycle events into a shared list so assertions can verify call order (authenticate → doExecute → close) without mocking framework complexity.
5

Golden JSON Parity Test

This is the centerpiece of S4C — a byte-for-byte comparison proving that list_vulnerability_types produces identical JSON output after being moved to contrast-mcp-core. Before S4B, the tool used reflection-injected sdkFactory and sdkExtensionFactory fields. After S4B, it receives a ContrastApiClient through constructor injection. This test proves the output is the same regardless of how the tool gets its data.

Here's how it works: the test constructs a real SdkApiClient wrapping mocked SDK factories, injects it into ListVulnerabilityTypesTool, calls listVulnerabilityTypes(null) with no ToolContext (simulating local stdio mode), serializes the response to JSON bytes, and compares byte-for-byte against a hardcoded golden constant.

contrast-mcp-stdio-app/.../vulnerability/ListVulnerabilityTypesLocalParityTest.java NEW
+37@ExtendWith(MockitoExtension.class)
+38class ListVulnerabilityTypesLocalParityTest {
+39
+40  private static final String TEST_ORG_ID = "local-org-id-should-not-serialize";
+41  private static final byte[] S4C_LOCAL_STDIO_GOLDEN_JSON =
+42      ("{\"data\":[\"cmd-injection\",\"sql-injection\",\"xss-reflected\"],"
+43       + "\"errors\":[],\"warnings\":[],\"found\":true,\"success\":true}")
+44          .getBytes(StandardCharsets.UTF_8);
    
The golden constant captures the exact JSON that MCP clients receive. Note the data array is sorted alphabetically (cmd-injection, sql-injection, xss-reflected) — this matches the .sorted() call in ListVulnerabilityTypesTool.doExecute(). The org ID is a distinctive sentinel ("local-org-id-should-not-serialize") so the final assertion can verify it doesn't leak into the output.
ListVulnerabilityTypesLocalParityTest.java — setUp and test body NEW
+53  @BeforeEach
+54  void setUp() {
+55    var sdkApiClient = new SdkApiClient(sdkFactory, sdkExtensionFactory);
+56    tool = new ListVulnerabilityTypesTool(sdkApiClient);
+57    when(sdkFactory.getSDK()).thenReturn(sdk);
+58    when(sdkFactory.getOrgId()).thenReturn(TEST_ORG_ID);
+59  }
+60
+61  @Test
+62  void listVulnerabilityTypes_should_match_s4c_local_stdio_golden_json() throws Exception {
+63    Rules.Rule rule1 = mock();
+64    Rules.Rule rule2 = mock();
+65    Rules.Rule rule3 = mock();
+66    when(rule1.getName()).thenReturn("xss-reflected");
+67    when(rule2.getName()).thenReturn("sql-injection");
+68    when(rule3.getName()).thenReturn("cmd-injection");
+69
+70    Rules rules = mock();
+71    when(rules.getRules()).thenReturn(List.of(rule1, rule2, rule3));
+72    when(sdk.getRules(TEST_ORG_ID)).thenReturn(rules);
+73
+74    var response = tool.listVulnerabilityTypes(null);
+75    var actualJson = objectMapper.writeValueAsBytes(response);
+76
+77    assertThat(actualJson)
+78        .as("S4C byte-equivalent golden JSON; update constant if "
+79            + "SingleToolResponse field order changes")
+80        .isEqualTo(S4C_LOCAL_STDIO_GOLDEN_JSON);
+81    assertThat(new String(actualJson, StandardCharsets.UTF_8))
+82        .doesNotContain(TEST_ORG_ID);
+83  }
    
Notice the wiring on line 55: a real SdkApiClient wrapping mocked factories, not a mocked ContrastApiClient. This is deliberate — the test exercises the full local call chain: ListVulnerabilityTypesToolContrastApiClient.getRules()SdkApiClient.getRules()sdkFactory.getSDK().getRules(orgId). A mocked ContrastApiClient would bypass SdkApiClient entirely and miss any bugs in the local bridge. Line 72 proves the org ID flows through correctly from the factory to the SDK call. Lines 80–82 prove the output matches the golden constant and the org ID doesn't leak into the serialized response.
Note The rules are deliberately given in non-alphabetical order (xss-reflected, sql-injection, cmd-injection) but the golden JSON has them sorted (cmd-injection, sql-injection, xss-reflected). This proves the .sorted() call in the tool's doExecute() method is actually working. If someone removed the sort, the golden comparison would fail.
6

Local-Only Regression Guard

get_scan_results is the one tool that must never leave the stdio app. It's deprecated, can return unbounded SARIF data, and uses the local Contrast Scan SDK directly. During Slice 8, when the remaining 11 shared tools migrate to core, there's a real risk someone moves this one too. This source-code assertion makes that mistake immediately visible.

contrast-mcp-stdio-app/.../tool/sast/GetSastResultsToolTest.java MODIFIED
+45  private static final Path TOOL_SOURCE =
+46      Path.of("src/main/java/.../tool/sast/GetSastResultsTool.java");
+
+204  @Test
+205  void getScanResults_should_remain_local_only_and_absent_from_contrast_api_client()
+206      throws IOException {
+207    var source = Files.readString(TOOL_SOURCE, StandardCharsets.UTF_8);
+208
+209    assertThat(source).contains("extends LocalSdkSingleTool");
+210    assertThat(source).contains("getContrastSDK()");
+211    assertThat(source).doesNotContain("ContrastApiClient");
+212  }
    
Three assertions that together form a migration guard. Line 209: it extends LocalSdkSingleTool (the temporary stdio-only bridge, not the shared SingleTool). Line 210: it calls getContrastSDK() directly (the local SDK factory method). Line 211: it has no ContrastApiClient dependency. If a developer migrates GetSastResultsTool to use ContrastApiClient as part of Slice 8's bulk move, this test breaks and forces a deliberate conversation about whether the deprecated tool should really be shared.
Future work Both LocalSdkSingleTool and LocalSdkPaginatedTool are temporary bridges. They exist so the 11 un-migrated tools can still access the SDK factories via @Autowired fields while BaseTool in core no longer has those fields. Once Slice 8 migrates all shared tools to ContrastApiClient, the only tool left on LocalSdkSingleTool will be get_scan_results.
7

Slice Gate Script

Every slice bead produces a temporary gate script in hack/ that runs targeted tests and scans the output for credential markers. This follows the pattern established by verify-s4a-client-boundary.sh and verify-s4b-first-tool-boundary.sh from the sibling beads.

Temporary Slice Gate Script Standard Pattern verify-s4c-local-parity.sh (49 lines) Follows: verify-s4a-client-boundary.sh, verify-s4b-first-tool-boundary.sh
All three S4 gate scripts share the same structure: set -euo pipefail, run targeted Gradle tests, capture output to a temp file, scan for forbidden credential markers (bearer, api_key, service_key, sentinel org IDs), and emit structured diagnostic logging with timing. The only difference is which test classes are targeted. This script runs:

BaseToolAuthenticationStrategyTest, ListVulnerabilityTypesToolTest, ListVulnerabilityTypesLocalParityTest, and GetSastResultsToolTest.
hack/verify-s4c-local-parity.sh NEW
+1#!/usr/bin/env bash
+2# Temporary S4C slice gate — remove once first-tool local parity is covered by permanent CI gates.
+3set -euo pipefail
+  ...
+31if ! (
+32  cd "${ROOT_DIR}"
+33  ./gradlew --no-daemon \
+34    :contrast-mcp-core:test \
+35    --tests '*BaseToolAuthenticationStrategyTest' \
+36    --tests '*ListVulnerabilityTypesToolTest' \
+37    :contrast-mcp-stdio-app:test \
+38    --tests '*ListVulnerabilityTypesLocalParityTest' \
+39    --tests '*GetSastResultsToolTest'
+40) >"${OUTPUT_FILE}" 2>&1; then
+  ...
+45if grep -Eiq 'bearer[[:space:]]+[A-Za-z0-9._-]+|api[_-]?key...|raw-token-value|local-org-id-should-not-serialize' \
+46    "${OUTPUT_FILE}"; then
+47  finish "failed" "targeted checks passed but diagnostic output contained a forbidden secret marker"
+48  exit 1
+49fi
      
8

What’s Next

S4C completes Slice 4. With the client boundary proven, the first tool migrated, and local parity confirmed, the next step is Slice 5: the hosted server in aiml-services consumes contrast-mcp-core and adapts get_user_info to use the shared BaseTool auth pipeline. That's the first time the shared code runs in both servers simultaneously.

TicketScopeDescription
AIML-757 Slice 3 Gradle migration, core module, publication pipeline
AIML-758 Slice 4A ContrastApiClient interface, AuthenticationStrategy hook, SdkApiClient bridge
AIML-758 Slice 4B Move list_vulnerability_types into contrast-mcp-core
AIML-758 Slice 4CThis PR Local parity proof, auth failure sanitization, .setCause(e) removal
AIML-759 Slice 5 Hosted server consumes core, get_user_info adapts to shared auth pipeline
AIML-762 Slice 8 Migrate remaining 11 shared tools; remove LocalSdk* bridges
FilePurposeLinesNovelty
SingleTool.java Replace .setCause(e) with exception type key-value +1 −1 Novel
PaginatedTool.java Same .setCause(e) fix for paginated pipeline +1 −1 Novel
BaseToolAuthenticationStrategyTest.java Auth failure, paginated coverage, source-code regression +108 Novel
ListVulnerabilityTypesLocalParityTest.java Golden JSON byte-equivalence through SdkApiClient +84 Novel
GetSastResultsToolTest.java Local-only regression guard for get_scan_results +16 Novel
hack/verify-s4c-local-parity.sh Temporary slice gate with credential marker scanning +49 Pattern