Scrapling Migration — BDD Test Findings
========================================
Date: 2025-02-25
Suite: 63 scenarios across 8 test files

Final result: 63 passed, 0 failed, 0 skipped

Migration: crawl4ai -> scrapling (AsyncFetcher + html2text)
New tool:  stealth_scrape (StealthyFetcher — anti-bot browser)

---

1. SSL Certificate Issue (curl_cffi on macOS)
   scrapling uses curl_cffi which cannot locate macOS system CA bundle.
   All HTTPS requests fail with CertificateVerifyError (curl error 60).
   Fix: pass verify=False to AsyncFetcher.get(). Acceptable for a web
   scraping tool fetching public pages.

2. No Built-in Markdown Conversion
   crawl4ai returned markdown natively. scrapling returns HTML only.
   Added html2text dependency to convert HTML -> markdown in crawler.py.

3. Crawl Tests Now Pass (were skipped before)
   crawl4ai required Playwright browser binaries (crawl4ai-setup).
   scrapling AsyncFetcher uses curl_cffi (HTTP-only), no browser needed.
   3 crawl_url tests that were SKIPPED now PASS.

4. Truncation Suffix Length
   clamp_text appends a ~100-char suffix when truncating. The max_chars=1000
   test needed 120-char tolerance for the suffix.

5. extract_data max_items Semantics
   max_items limits the number of list/table GROUPS, not individual items.
   scrapling returns fuller HTML (including nav menus) vs crawl4ai, so
   more list groups appeared. Test adjusted to count groups, not items.

6. api_docs max_results=1
   Crawled documentation pages naturally contain many inline URLs.
   Counting "http" occurrences is not a valid proxy for source count.
   Test changed to count "Source:" headers instead.
