Context
This PR is the final child of Slice 4 — the slice that extracts a shared API client so the same tool code can run in both the local stdio server and the future hosted remote server. Slice 4 is divided into three beads, each building on the last:
- S4A (PR #123) — Introduced
ContrastApiClient, theAuthenticationStrategyhook inBaseTool, and the localSdkApiClientbridge. - S4B (PR #124) — Moved exactly one low-risk shared tool (
list_vulnerability_types) intocontrast-mcp-corebehindContrastApiClient. - S4C (this PR) — Proves the move didn't break anything: byte-equivalent golden JSON, auth failure sanitization, and a local-only regression guard for
get_scan_results.
The core question S4C answers: does list_vulnerability_types produce identical output when invoked through SdkApiClient instead of the old reflection-injected SDK factories?
If yes, the abstraction layer is safe, and the remaining 11 shared tools can be migrated with confidence in Slice 8.
If no, we catch it here before any downstream work depends on the pattern.
Approach
S4C uses three complementary proof strategies — each catches a different class of regression that the others miss. No classes were moved in this PR (that was S4B). The scope here is two one-line production fixes and four new/expanded test files, distributed across both modules. The diagram below maps every changed file to its module and shows which test proves which production behavior.
1. Byte-equivalent golden JSON.
The ListVulnerabilityTypesLocalParityTest serializes the tool's response to JSON bytes and compares against a hardcoded golden constant.
This catches any change to field order, field names, null handling, or serialization behavior.
If the SingleToolResponse record adds a field tomorrow, this test breaks — exactly what we want.
2. Auth failure sanitization.
The BaseToolAuthenticationStrategyTest now covers the scenario where AuthenticationStrategy.authenticate() throws an exception.
An attacker probing the hosted server with an expired token would trigger this path.
The test proves the error response contains only a generic reference ID, not the exception message (which could contain token fragments or internal state).
3. Local-only regression guard.
get_scan_results is the deprecated local-only tool that must never enter contrast-mcp-core.
A source-code assertion in GetSastResultsToolTest verifies it still extends LocalSdkSingleTool and has no ContrastApiClient dependency.
This prevents accidental migration during the Slice 8 bulk move.
The Security Fix: Stack Trace Removal
This is the only production code change in the entire PR — and it's a one-line fix applied to two files.
Both PaginatedTool and SingleTool had a .setCause(e) call in their catch-all exception handler.
That call attaches the full Java stack trace to the SLF4J log event, which means the stack trace — including internal class paths, method names, and potentially sensitive data from exception messages — flows into whatever log sink is configured.
Imagine an attacker sending a malformed request that triggers a NullPointerException deep in the SDK.
With .setCause(e), the full stack trace (containing SDK class paths, connection pool internals, and possibly credential fragments in certain exception types) would be attached to the log entry.
In the hosted server, log events could conceivably be exposed through monitoring dashboards.
The fix replaces the stack trace with just the exception's simple class name — enough for debugging, without the information disclosure risk.
111 } catch (Exception e) { 112 log.atError() 113 .addKeyValue(LoggingKeys.REQUEST_ID, requestId) -114 .setCause(e) +114 .addKeyValue(LoggingKeys.EXCEPTION_TYPE, e.getClass().getSimpleName()) 115 .setMessage("Request failed unexpectedly") 116 .log(); 117 return SingleToolResponse.error("An internal error occurred (ref: " + requestId + ")");
"An internal error occurred (ref: ...)" message. The issue was the log entry attached the full stack trace via .setCause(e). Now it logs only the exception class name (e.g., NullPointerException, IOException) as a structured key-value pair, which is sufficient for debugging without information disclosure.
128 } catch (Exception e) { 129 log.atError() 130 .addKeyValue(LoggingKeys.REQUEST_ID, requestId) -131 .setCause(e) +131 .addKeyValue(LoggingKeys.EXCEPTION_TYPE, e.getClass().getSimpleName()) 132 .setMessage("Request failed unexpectedly") 133 .log(); 134 return PaginatedToolResponse.error(
SingleTool and PaginatedTool share the same catch-all pattern — both needed the same treatment. The source-code regression test in Chapter 4 ensures .setCause(e) doesn't creep back into either file.
AuthenticationStrategy.authenticate() throwing) would pass through this same catch-all. In the hosted server, an expired OAuth token could trigger an IllegalStateException whose message contains the raw token value. With .setCause(e), that token would appear in logs. Now it doesn't.
Auth Pipeline Tests
BaseToolAuthenticationStrategyTest started in S4A with two tests for SingleTool: no-op when unconfigured, and strategy-before-doExecute when configured.
S4C adds five new tests that cover the gaps identified during PR #123 review.
The tests fall into three groups: auth failure handling, paginated pipeline coverage, and a source-code regression guard.
The diagram shows the seven test paths. Tests 1 and 2 (SingleTool no-op and happy path) existed from S4A. Tests 4 and 5 are the paginated equivalents added here. Tests 3 and 6 (red path) are the auth failure tests — the critical new coverage that proves exceptions from authenticate() produce sanitized error responses.
+67 @Test +68 void executePipeline_should_return_error_when_authentication_strategy_throws() { +69 var events = new ArrayList<String>(); +70 var tool = new RecordingSingleTool(events); +71 var context = new ToolContext(java.util.Map.of("requestId", "req-123")); +72 tool.setAuthenticationStrategy( +73 toolContext -> { +74 events.add("authenticate"); +75 throw new IllegalStateException("raw-token-value must not reach tool output"); +76 }); +77 +78 var result = tool.executePipeline(TestParams::valid, context); +79 +80 assertThat(result.isSuccess()).isFalse(); +81 assertThat(result.found()).isFalse(); +82 assertThat(result.data()).isNull(); +83 assertThat(result.errors()).hasSize(1); +84 assertThat(result.errors().get(0)).matches("An internal error occurred \\(ref: [0-9a-f]{8}\\)"); +85 assertThat(String.join(" ", result.errors())).doesNotContain("raw-token-value"); +86 assertThat(events).containsExactly("authenticate"); +87 }
"raw-token-value" to simulate a real credential leak. Line 85 proves the error response doesn't contain it. Line 84 proves the response matches the generic reference-ID pattern. Line 86 proves doExecute was never reached — the exception short-circuited the pipeline at the authentication step.
The paginated pipeline has an identical set of three tests (no-op, happy path, auth failure) using a RecordingPaginatedTool inner class. The pattern is the same — the only difference is that executePipeline takes (page, pageSize, paramsSupplier, context) instead of (paramsSupplier, context), and the result type is PaginatedToolResponse with .items() instead of .data().
+121 @Test +122 void paginatedExecutePipeline_should_return_error_when_authentication_strategy_throws() { +123 var events = new ArrayList<String>(); +124 var tool = new RecordingPaginatedTool(events); +125 var context = new ToolContext(java.util.Map.of("requestId", "req-123")); +126 tool.setAuthenticationStrategy( +127 toolContext -> { +128 events.add("authenticate"); +129 throw new IllegalStateException("raw-token-value must not reach tool output"); +130 }); +131 +132 var result = tool.executePipeline(1, 2, TestParams::valid, context); +133 +134 assertThat(result.isSuccess()).isFalse(); +135 assertThat(result.items()).isEmpty(); +136 assertThat(result.errors()).hasSize(1); +137 assertThat(result.errors().get(0)).matches("An internal error occurred \\(ref: [0-9a-f]{8}\\)"); +138 assertThat(String.join(" ", result.errors())).doesNotContain("raw-token-value"); +139 assertThat(events).containsExactly("authenticate"); +140 }
PaginatedTool.executePipeline(page, pageSize, ...). Same assertions — sanitized error, no credential leak, doExecute never called. Both tool base classes share the same catch-all pattern, so both need the same test.
+30 private static final List<Path> PIPELINE_SOURCES = +31 List.of( +32 Path.of("src/main/java/.../tool/base/SingleTool.java"), +33 Path.of("src/main/java/.../tool/base/PaginatedTool.java")); + +142 @Test +143 void executePipeline_unexpected_error_logs_should_not_attach_stack_traces() throws Exception { +144 for (var sourcePath : PIPELINE_SOURCES) { +145 var source = Files.readString(sourcePath, StandardCharsets.UTF_8); +146 +147 assertThat(source).doesNotContain(".setCause(e)"); +148 } +149 }
.setCause(e) is absent. If someone reintroduces it (perhaps resolving a merge conflict carelessly, or adding a new catch block by copying old code), this test fails immediately. It's the enforcement arm of the production fix in Chapter 3.
+177 private static final class RecordingPaginatedTool extends PaginatedTool<TestParams, String> { +178 +179 private final List<String> events; +180 +181 private RecordingPaginatedTool(List<String> events) { +182 this.events = events; +183 } +184 +185 @Override +186 protected ExecutionResult<String> doExecute( +187 PaginationParams pagination, TestParams params, WarningCollector collector) { +188 events.add("doExecute"); +189 return ExecutionResult.of(List.of("ok"), 1); +190 } +191 }
RecordingSingleTool pattern established in S4A. Both inner classes record lifecycle events into a shared list so assertions can verify call order (authenticate → doExecute → close) without mocking framework complexity.
Golden JSON Parity Test
This is the centerpiece of S4C — a byte-for-byte comparison proving that list_vulnerability_types produces identical JSON output after being moved to contrast-mcp-core.
Before S4B, the tool used reflection-injected sdkFactory and sdkExtensionFactory fields.
After S4B, it receives a ContrastApiClient through constructor injection.
This test proves the output is the same regardless of how the tool gets its data.
Here's how it works: the test constructs a real SdkApiClient wrapping mocked SDK factories, injects it into ListVulnerabilityTypesTool, calls listVulnerabilityTypes(null) with no ToolContext (simulating local stdio mode), serializes the response to JSON bytes, and compares byte-for-byte against a hardcoded golden constant.
+37@ExtendWith(MockitoExtension.class) +38class ListVulnerabilityTypesLocalParityTest { +39 +40 private static final String TEST_ORG_ID = "local-org-id-should-not-serialize"; +41 private static final byte[] S4C_LOCAL_STDIO_GOLDEN_JSON = +42 ("{\"data\":[\"cmd-injection\",\"sql-injection\",\"xss-reflected\"]," +43 + "\"errors\":[],\"warnings\":[],\"found\":true,\"success\":true}") +44 .getBytes(StandardCharsets.UTF_8);
cmd-injection, sql-injection, xss-reflected) — this matches the .sorted() call in ListVulnerabilityTypesTool.doExecute(). The org ID is a distinctive sentinel ("local-org-id-should-not-serialize") so the final assertion can verify it doesn't leak into the output.
+53 @BeforeEach +54 void setUp() { +55 var sdkApiClient = new SdkApiClient(sdkFactory, sdkExtensionFactory); +56 tool = new ListVulnerabilityTypesTool(sdkApiClient); +57 when(sdkFactory.getSDK()).thenReturn(sdk); +58 when(sdkFactory.getOrgId()).thenReturn(TEST_ORG_ID); +59 } +60 +61 @Test +62 void listVulnerabilityTypes_should_match_s4c_local_stdio_golden_json() throws Exception { +63 Rules.Rule rule1 = mock(); +64 Rules.Rule rule2 = mock(); +65 Rules.Rule rule3 = mock(); +66 when(rule1.getName()).thenReturn("xss-reflected"); +67 when(rule2.getName()).thenReturn("sql-injection"); +68 when(rule3.getName()).thenReturn("cmd-injection"); +69 +70 Rules rules = mock(); +71 when(rules.getRules()).thenReturn(List.of(rule1, rule2, rule3)); +72 when(sdk.getRules(TEST_ORG_ID)).thenReturn(rules); +73 +74 var response = tool.listVulnerabilityTypes(null); +75 var actualJson = objectMapper.writeValueAsBytes(response); +76 +77 assertThat(actualJson) +78 .as("S4C byte-equivalent golden JSON; update constant if " +79 + "SingleToolResponse field order changes") +80 .isEqualTo(S4C_LOCAL_STDIO_GOLDEN_JSON); +81 assertThat(new String(actualJson, StandardCharsets.UTF_8)) +82 .doesNotContain(TEST_ORG_ID); +83 }
SdkApiClient wrapping mocked factories, not a mocked ContrastApiClient. This is deliberate — the test exercises the full local call chain: ListVulnerabilityTypesTool → ContrastApiClient.getRules() → SdkApiClient.getRules() → sdkFactory.getSDK().getRules(orgId). A mocked ContrastApiClient would bypass SdkApiClient entirely and miss any bugs in the local bridge. Line 72 proves the org ID flows through correctly from the factory to the SDK call. Lines 80–82 prove the output matches the golden constant and the org ID doesn't leak into the serialized response.
xss-reflected, sql-injection, cmd-injection) but the golden JSON has them sorted (cmd-injection, sql-injection, xss-reflected). This proves the .sorted() call in the tool's doExecute() method is actually working. If someone removed the sort, the golden comparison would fail.
Local-Only Regression Guard
get_scan_results is the one tool that must never leave the stdio app.
It's deprecated, can return unbounded SARIF data, and uses the local Contrast Scan SDK directly.
During Slice 8, when the remaining 11 shared tools migrate to core, there's a real risk someone moves this one too.
This source-code assertion makes that mistake immediately visible.
+45 private static final Path TOOL_SOURCE = +46 Path.of("src/main/java/.../tool/sast/GetSastResultsTool.java"); + +204 @Test +205 void getScanResults_should_remain_local_only_and_absent_from_contrast_api_client() +206 throws IOException { +207 var source = Files.readString(TOOL_SOURCE, StandardCharsets.UTF_8); +208 +209 assertThat(source).contains("extends LocalSdkSingleTool"); +210 assertThat(source).contains("getContrastSDK()"); +211 assertThat(source).doesNotContain("ContrastApiClient"); +212 }
LocalSdkSingleTool (the temporary stdio-only bridge, not the shared SingleTool). Line 210: it calls getContrastSDK() directly (the local SDK factory method). Line 211: it has no ContrastApiClient dependency. If a developer migrates GetSastResultsTool to use ContrastApiClient as part of Slice 8's bulk move, this test breaks and forces a deliberate conversation about whether the deprecated tool should really be shared.
LocalSdkSingleTool and LocalSdkPaginatedTool are temporary bridges. They exist so the 11 un-migrated tools can still access the SDK factories via @Autowired fields while BaseTool in core no longer has those fields. Once Slice 8 migrates all shared tools to ContrastApiClient, the only tool left on LocalSdkSingleTool will be get_scan_results.
Slice Gate Script
Every slice bead produces a temporary gate script in hack/ that runs targeted tests and scans the output for credential markers.
This follows the pattern established by verify-s4a-client-boundary.sh and verify-s4b-first-tool-boundary.sh from the sibling beads.
Temporary Slice Gate Script Standard Pattern verify-s4c-local-parity.sh (49 lines) Follows: verify-s4a-client-boundary.sh, verify-s4b-first-tool-boundary.sh
-euo pipefail, run targeted Gradle tests, capture output to a temp file, scan for forbidden credential markers (bearer, api_key, service_key, sentinel org IDs), and emit structured diagnostic logging with timing. The only difference is which test classes are targeted. This script runs:
BaseToolAuthenticationStrategyTest, ListVulnerabilityTypesToolTest, ListVulnerabilityTypesLocalParityTest, and GetSastResultsToolTest.
+1#!/usr/bin/env bash +2# Temporary S4C slice gate — remove once first-tool local parity is covered by permanent CI gates. +3set -euo pipefail + ... +31if ! ( +32 cd "${ROOT_DIR}" +33 ./gradlew --no-daemon \ +34 :contrast-mcp-core:test \ +35 --tests '*BaseToolAuthenticationStrategyTest' \ +36 --tests '*ListVulnerabilityTypesToolTest' \ +37 :contrast-mcp-stdio-app:test \ +38 --tests '*ListVulnerabilityTypesLocalParityTest' \ +39 --tests '*GetSastResultsToolTest' +40) >"${OUTPUT_FILE}" 2>&1; then + ... +45if grep -Eiq 'bearer[[:space:]]+[A-Za-z0-9._-]+|api[_-]?key...|raw-token-value|local-org-id-should-not-serialize' \ +46 "${OUTPUT_FILE}"; then +47 finish "failed" "targeted checks passed but diagnostic output contained a forbidden secret marker" +48 exit 1 +49fi
What’s Next
S4C completes Slice 4.
With the client boundary proven, the first tool migrated, and local parity confirmed, the next step is Slice 5: the hosted server in aiml-services consumes contrast-mcp-core and adapts get_user_info to use the shared BaseTool auth pipeline.
That's the first time the shared code runs in both servers simultaneously.
| Ticket | Scope | Description |
|---|---|---|
| AIML-757 | Slice 3 | Gradle migration, core module, publication pipeline |
| AIML-758 | Slice 4A | ContrastApiClient interface, AuthenticationStrategy hook, SdkApiClient bridge |
| AIML-758 | Slice 4B | Move list_vulnerability_types into contrast-mcp-core |
| AIML-758 | Slice 4CThis PR | Local parity proof, auth failure sanitization, .setCause(e) removal |
| AIML-759 | Slice 5 | Hosted server consumes core, get_user_info adapts to shared auth pipeline |
| AIML-762 | Slice 8 | Migrate remaining 11 shared tools; remove LocalSdk* bridges |
| File | Purpose | Lines | Novelty |
|---|---|---|---|
| SingleTool.java | Replace .setCause(e) with exception type key-value | +1 −1 | Novel |
| PaginatedTool.java | Same .setCause(e) fix for paginated pipeline | +1 −1 | Novel |
| BaseToolAuthenticationStrategyTest.java | Auth failure, paginated coverage, source-code regression | +108 | Novel |
| ListVulnerabilityTypesLocalParityTest.java | Golden JSON byte-equivalence through SdkApiClient | +84 | Novel |
| GetSastResultsToolTest.java | Local-only regression guard for get_scan_results | +16 | Novel |
| hack/verify-s4c-local-parity.sh | Temporary slice gate with credential marker scanning | +49 | Pattern |