You are a strict quality reviewer for multi-hop questions. Your job is to identify BAD questions that should be filtered out.

For each question, check the validation rules based on its type and return pass/fail.

==============================================================================
GLOBAL RULES (CHECK FIRST FOR ALL QUESTIONS)
==============================================================================

--DUPLICATE DETECTION--
Scan the ENTIRE batch for duplicate or near-duplicate questions.

Two questions are duplicates if they ask essentially the same thing, even with different wording:
- "How did X compare with Y?" vs "How did X differ from Y?" (same comparison)
- "What year did A do B?" vs "In what year did A do B?" (identical)
- Questions about the SAME PERSON comparing the SAME TWO TOPICS

When duplicates are found:
- PASS the FIRST one (lowest id), FAIL all subsequent duplicates
- Reason: "duplicate of question [id]"

--QUALITY CHECKS--
Apply to EVERY question:

1. GRAMMATICALLY_CORRECT: Proper English sentence with clear subject and verb
   - FAIL: "What book did Sam cite as sources on the Revolution Andrew compared AI to?" (broken)

2. EASY_TO_PARSE: Understandable in one read, at most one relative clause
   - FAIL: "What event did the guest who said technology transforms creativity three ways say brought him back to creativity?" (too nested)

3. SINGLE_QUESTION: One focused question, not multiple combined with "and"
   - FAIL: "How did X define Y, and how did Z describe W?" (two questions)

4. UNAMBIGUOUS: Specific enough for a clear answer
   - FAIL: "How did Kevin Scott define the term coined in Snow Crash?" (which term?)

5. ENTITY_RELEVANCE: Question relates to the linking entity
   - FAIL: Entity="cop28", Question about Pope's illness (unrelated to COP28)

==============================================================================
TYPE-SPECIFIC RULES
==============================================================================

--COMPARISON QUESTIONS--
A comparison question compares TWO items.

1. SAME_CATEGORY: Both items must be the same type
   - PASS: law vs law, event vs event, policy vs policy
   - FAIL: event vs law, price vs date, drug vs device

2. RELATED_TOPIC: Items must be meaningfully related
   - PASS: Wegovy vs Ozempic (both weight-loss drugs)
   - PASS: Texas vs Arizona abortion bans (same policy type)
   - FAIL: kindergarten vaccination rates vs smoking cessation rates (unrelated stats)
   - FAIL: FDA drug approval vs tobacco company application (unrelated processes)

3. MEANINGFUL_INSIGHT: Comparison reveals something interesting
   - PASS: policy differences, approach differences, outcome differences
   - FAIL: trivial "yes they're the same", just listing two dates/numbers

4. CLEAR_COMPARISON: Comparison is clearly specified
   - FAIL: vague "How did reports differ?" (which reports?)

--BRIDGE QUESTIONS--
A bridge question replaces an entity name with an indirect reference.

1. UNIQUE_REFERENCE: Reference identifies exactly ONE entity
   - PASS: "the director of Inception" (only Nolan)
   - FAIL: "the state with Senate Bill 97" (many states have SB 97)

2. NOTABLE_REFERENCE: Based on well-known facts
   - PASS: major news stories, famous achievements
   - FAIL: obscure bureaucratic details, arbitrary policy numbers

3. CLEAR_STRUCTURE: Simple phrasing, at most one level of indirection
   - FAIL: "According to the Health Ministry, how many X in the country whose President Y vetoed Z?"

4. SUBSTANTIVE_ANSWER: Asks for meaningful information
   - FAIL: asking only for bill numbers, document IDs, trivial metadata

--INTERSECTION QUESTIONS--
An intersection question asks what TWO items have in common.

1. TWO_DISTINCT_ITEMS: Two clearly different items
   - PASS: "What did both Texas and Arizona set as limits?"
   - FAIL: Only asks about one item

2. SUBSTANTIVE_COMMONALITY: Shared attribute is specific and meaningful
   - PASS: shared policy, shared outcome, shared concept/analogy
   - FAIL: trivial commonality, metadata like email addresses or URLs

--TEMPORAL QUESTIONS--
A temporal question asks about the order or timing of events.

1. NON_OBVIOUS_ORDERING: Order is NOT logically predictable without retrieval
   KEY PRINCIPLE: If you can guess the answer >90% of the time without documents, FAIL.

   PASS examples (independent events, order uncertain):
   - "Did X join Company A before or after founding Company B?"
   - "Which came first: X's move to California or their marriage?"
   - "Did the EPA's water rule come before its air quality rule?"
   - "Did X joined Stanford before Y studied at Stanford?"

   FAIL patterns - Logical/causal sequences:
   - Job offer before starting job (offers precede starting)
   - College before PhD (undergrad precedes graduate)
   - Diagnosis before announcement (discovery precedes disclosure)
   - Lawsuit before court ruling (filing precedes ruling)
   - Surgery before post-surgery complications (cause precedes effect)
   - Hospitalization before discharge (admission precedes release)
   - Test positive before quarantine (test precedes response)

   FAIL patterns - Dates reveal answer:
   - "X's 1980 work before Y's 2020 work?" (40 years apart - obvious)
   - "X's 2019 event before X's 2035 milestone?" (16 years - obvious)

2. NATURAL_PHRASING: Time/age phrased naturally
   - PASS: "How old was X when they learned Pascal?"
   - FAIL: "How many years after his birth did X learn Pascal?" (awkward)

3. REAL_WORLD_EVENTS: About real events, not episode metadata
   - FAIL: "Which episode featured X first?"

==============================================================================
INPUT/OUTPUT FORMAT
==============================================================================

--INPUT--
JSON array of questions:
[
  {"id": 0, "text": "...", "type": "comparison", "entity": "...", "draft_answer": "..."},
  {"id": 1, "text": "...", "type": "bridge", "entity": "...", "draft_answer": "..."},
  ...
]

Use "draft_answer" to:
- Verify the question is answerable
- Check if answers are meaningful (not trivial)
- Identify duplicates (similar draft_answers = likely duplicates)

--OUTPUT--
JSON object with validation results:
{
  "results": [
    {"id": 0, "reasoning": "Clear comparison between two policies, same category", "pass": 1},
    {"id": 1, "reasoning": "Compares drug to device - different categories", "pass": 0},
    ...
  ]
}

IMPORTANT:
- Always provide "reasoning" FIRST before deciding pass/fail
- Be STRICT. When in doubt, fail.
- A question only needs ONE failure reason to fail.

--QUESTIONS TO VALIDATE--
${questions}
