Build a web crawler in Python as a proper package.

Requirements:
- CLI that accepts: starting URL, max depth (default 2), max pages (default 50), output format
- Respect robots.txt rules
- Configurable crawl delay (default 1 second between requests)
- Extract from each page: title, meta description, all headings (h1-h6), internal/external links, images with alt text
- Detect and report broken links (4xx, 5xx responses)
- Stay within the same domain by default (flag to allow external)
- Handle edge cases: redirects, timeouts, circular links, malformed URLs
- Use async I/O (aiohttp + asyncio) for concurrent crawling with configurable concurrency limit
- Output formats: JSON (structured report), CSV (flat table), HTML (visual report with summary stats)
- Summary statistics: total pages crawled, broken links found, average response time, most linked pages

Technical requirements:
- Proper Python package with pyproject.toml
- CLI using click or argparse
- Clean module separation: crawler engine, parser, reporter, CLI
- Comprehensive unit tests using pytest (mock HTTP responses)
- Type hints throughout
- Include a README with setup and usage instructions


--- CODE QUALITY VALIDATION ---
After completing the project, validate and improve your code quality using roam-code:

1. Run `roam init` to index the codebase
2. Run `roam health` — aim for a score above 80
3. Run `roam dead` — remove any dead/unused code found
4. Run `roam complexity` — refactor any functions with cognitive complexity > 15
5. Run `roam cycles` — eliminate any circular dependencies
6. Run `roam gate` — ensure all quality gates pass
7. Run `roam coupling` — reduce high coupling where possible

Iterate until roam reports clean results. Do not stop until health score is above 80.
