--- layout: tap site_name: facebook tap_name: keyword-search description: "Search Facebook posts by keyword using the user's logged-in session. Unscrambles author names that Facebook obfuscates via Flexbox order reordering. Returns author, body text, engagement counts, and a stable content hash when no native post permalink is exposed." intent: read layer: 4 layer_source: "DOM [role=article] + Flexbox-order unscramble" columns: - post_id - author_name - author_url - text - posted_at - url - like_count - comment_count - share_count - permalink - lang args: - name: keyword type: string required: true description: Search keyword - name: limit type: int default: 20 description: Max posts to return args_json: | { "keyword": {"type":"string","required":true,"description":"Search keyword"}, "limit": {"type":"int","default":20,"description":"Max posts to return"} } health_json: | {"min_rows":1,"non_empty":["post_id","author_name"]} example_args: '{"keyword":"AI automation","limit":5}' doctor_verdict: healthy doctor_checked: "2026-04-22" license: MIT ---

Why this tap exists

Facebook's /search/posts/?q=… page is one of the more hostile scraping targets on the open web. It requires login, ships no llms.txt or RSS, rotates internal GraphQL doc_ids weekly, and — most interestingly — applies Flexbox-order DOM scrambling to author display names.

When a naive scraper reads document.querySelectorAll('[role="article"]')[0].textContent, the first block of characters looks like random noise (oSodnprmmlffgfi1c3mSg…) followed by readable body text. Many give up here and declare the site un-scrapable. They are wrong.

The gibberish is author name characters split across many <span>s, each assigned a non-zero CSS order. The browser's flexbox layout re-sorts them visually; textContent returns DOM-order, which is randomized. Post body, engagement counts, and aria-labels are not scrambled.

This tap queries the search page via the user's Chrome session, walks each [role=article], and for scrambled containers (all children have length ≤ 2 and at least one non-zero order) it sorts by computed order and re-concatenates the text. Everything else is plain extraction.

Sample output

post_id      author_name   author_url                              text                                                 like_count  lang
fb_74ig3q    Snowie.Ai     https://www.facebook.com/SnowieAi       In 24 months every serious website will talk…        500         en

Known gaps. Facebook search cards do not expose a public post permalink href — the visible links are profile URLs plus encrypted __cft__ tracking params. When no native post ID is found, this tap emits a content-hash id (fb_…) that is stable across runs for the same author+body combination, so downstream deduplication still works.

posted_at, comment_count, and share_count are best-effort: Facebook omits or obfuscates these on some search result layouts. The tap returns empty strings rather than failing the row — use tap verify to check health drift per field if you need one of them to be non-empty.

Runtime

Chrome bridge only. Facebook blocks anonymous and non-browser clients at the search endpoint.

tap runtime chrome
tap facebook/keyword-search keyword="AI automation" limit=5

Case study

Full write-up — hypothesis, wrong diagnosis, the 5-minute diagnostic, the fix — in Facebook Scrambles Author Names, Not Post Bodies.