--- layout: tap site_name: tap tap_name: dedupe description: "Remove duplicate rows by field value" intent: read columns: [] args: - name: field type: string description: "Column name to deduplicate on" args_json: | {"field":{"type":"string","description":"Column name to deduplicate on"}} health_json: | {"min_rows":1,"non_empty":[]} example_args: "" source_url: https://github.com/LeonTing1010/tap-skills/blob/main/showcase/tap/dedupe.plan.json license: MIT ---

What it does

Remove duplicate rows by field value

Install Taprun once

Taprun ships as a single MCP server exposing a catalog of compiled taps. One-time setup on macOS / Linux:

brew install LeonTing1010/tap/taprun
tap mcp connect

Or drop this into your claude_desktop_config.json (works identically in Claude Code, Cursor, Cline, Windsurf — any MCP host):

{
  "mcpServers": {
    "tap": {
      "command": "tap",
      "args": ["mcp", "start"]
    }
  }
}

Call tap/dedupe

Terminal, once installed:

tap run tap/dedupe

From the MCP host — exact same compiled plan, deterministic replay, zero LLM tokens:

tap.run({ site: "tap", name: "dedupe" })

Why compile it once

This plan was forged once — the AI read tap, picked stable structural addresses (JSON-LD, ARIA, RSS, or declared API endpoints, in that priority order), and saved them to a .plan.json. Every replay since then has used zero LLM tokens. When tap ships a site change that breaks the extraction, tap verify surfaces it before your data goes stale — not after your pipeline silently writes garbage for a week.

Related tap taps

tapdescription
tap/tableFormat rows as a human-readable table (outputs one row per line)
tap/filterFilter rows where field matches condition (gt, lt, eq, contains)
tap/sortSort rows by field (numeric or alphabetic, asc or desc)
tap/pickSelect specific columns from rows (projection)
tap/limitTake first N rows (head) or skip M then take N (offset+limit)