Facebook Groups — Scrape Posts
Input: Facebook group URL + sort order + desired count → Output: post list with full metadata (JSON).
Language
All process output to user (progress updates, process notifications) follows the user's language.
Objective
Given a Facebook group URL, scrape N posts sorted by the specified order and return structured metadata for each post.
Prerequisites
- Target group page is already open in the browser:
https://www.facebook.com/groups/{group_slug_or_id} - Already logged into Facebook (user avatar, Messenger icon, and notification bell visible in the top-right corner)
Pre-execution Checks
1. Tool Readiness
If browser-act has been confirmed available in the current session → skip this step.
Invoke browser-act via Skill tool to load usage. If installation or configuration issues arise, follow its guidance to resolve then retry.
2. Login Verification
If Facebook login status has been confirmed in the current session → skip this step.
Otherwise, navigate to https://www.facebook.com/ and verify login status programmatically:
browser-act navigate 'https://www.facebook.com/'
browser-act wait stable --timeout 15000
browser-act eval "JSON.stringify({user_id: document.cookie.match(/c_user=(\d+)/)?.[1] || '0', USER_ID: (()=>{try{return require('CurrentUserInitialData').USER_ID;}catch(e){return '0';}})()})"
Verdict:
user_idis a non-empty numeric string (e.g.,"61560817072276"), orUSER_ID !== "0"→ logged in, continueuser_id === nullorUSER_ID === "0"→ not logged in; assist user: runbrowser-act browser open {browser_id} https://www.facebook.com/login --headedto open a headed window so the user can sign in manually (Stealthnormalmode persists cookies — one login is reusable)
Facebook may clear c_user mid-session: if GraphQL errors such as field_exception or missing_required_variable_value occur during execution, re-run this login check before assuming the script is broken.
User refuses or cannot log in → terminate execution. Facebook enforces strict restrictions on unauthenticated group access (login modal blocks pagination, feed returns partial data + field_exception); login is a hard prerequisite.
Capability Components
This Skill's operational boundary = what the user can manually do in their browser. It only reads data already displayed to the authenticated user, never bypassing authentication or access controls — equivalent to copy-pasting on the user's behalf. JS code is encapsulated in Python files under the
scripts/directory, invoked viaeval "$(python scripts/xxx.py {params})".$(...)is bash syntax; use the bash tool for execution.
API: Scrape group posts (with auto-pagination)
Navigate to the target group page first, then invoke the scrape script (it auto-resolves the numeric group ID from the current page):
browser-act navigate 'https://www.facebook.com/groups/{group_slug_or_id}'
browser-act wait stable --timeout 20000
browser-act eval "$(python scripts/scrape-posts.py --sort CHRONOLOGICAL --count 20)"
Parameters:
--sort: Sort order, defaultCHRONOLOGICAL. See "Enum Parameters" below--count: Desired number of posts, default20. Script auto-paginates until count is met or feed is exhausted--max-pages: Pagination safety cap, default100--doc-id: GraphQL persisted querydoc_idforGroupsCometFeedRegularStoriesPaginationQuery, default26577462205242925. Update via this flag if Facebook rotates the version (see "Known Limitations")
Output example:
{
"ok": true,
"group_id": "2580640642080467",
"group_name": "Programmer Humor",
"sort": "CHRONOLOGICAL",
"total": 20,
"posts": [
{
"post_id": "4052937798184070",
"cache_id": "6790541484885792441",
"id": "UzpfSTEwMDA4ODY4MzIx...",
"permalink_url": "https://www.facebook.com/groups/programmerhumor/posts/4052937798184070/",
"creation_time": 1772941518,
"message": "Those were the days my friend ...",
"author": {
"id": "100088683215191",
"name": "Jeff Bramlett",
"profile_picture": null,
"url": "https://www.facebook.com/JeffieB56"
},
"group": {
"id": "2580640642080467",
"name": "Programmer Humor",
"url": "https://www.facebook.com/groups/programmerhumor/"
},
"reactions": {
"total": 1,
"total_formatted": "1",
"breakdown": [
{ "name": "Haha", "reaction_id": "115940658764963", "count": 1 }
]
},
"share_count": 0,
"share_count_formatted": "0",
"comment_count": 0,
"media": [
{
"__typename": "Photo",
"id": "938454962453936",
"photo_image": "https://scontent-...fbcdn.net/v/t39...jpg"
}
]
}
],
"diagnostics": {
"pages": [
{ "pageIdx": 0, "httpStatus": 200, "edgeCount": 4, "err": null, "hasNext": true }
]
}
}
Video posts include additional fields in media: playable_url (mp4 direct link), playable_url_hd, and thumbnail.
Enum Parameters
[AI] --sort sort order — Facebook accepts the following three values:
TOP_POSTS— most relevant (default web sort)CHRONOLOGICAL— newest first (reverse chronological by post time)RECENT_ACTIVITY— most recently active (reverse chronological by latest comment/reaction time)
Values are fixed and validated by argparse choices; no runtime query needed.
Pagination
API Pagination: handled automatically by the script.
- Pagination parameter:
cursor(embedded in GraphQLvariables) - Type: opaque cursor (server-side state, base64-encoded)
- Initial value:
null(first request) - Next page value:
data.node.group_feed.page_info.end_cursor - Each response returns 3 edges (FB streaming mode ignores client-provided
count) - Termination:
has_next_page === false, or--count/--max-pageslimit reached
Success Criteria
ok === trueandtotal >= 1posts[*].post_idnon-null rate = 100% (non-post units such as Section Headers are filtered out by the script)posts[*].permalink_urlandposts[*].creation_timenon-null rate = 100%- When using
CHRONOLOGICALsort,creation_timeis strictly monotonically decreasing
Known Limitations
- Public groups only: private groups require membership; returns empty or permission error when not a member
- No comment body:
comment_countreturns total count but the group feed GraphQL does not includetop_commentscontent or authors. Facebook places comment data in a separateCommentsRendererquery triggered only when the user clicks "Comments" — fetching comment bodies requires additional per-post_idGraphQL requests (out of scope) doc_idrotates with Facebook frontend versions: when the default26577462205242925expires (PersistedQueryNotFoundor HTTP 404), retrieve a fresh one:- Open any group page while logged in
- Scroll down to trigger a new batch of posts
browser-act network requests --filter api/graphql --method POST- Check
X-FB-Friendly-Nameheader on each request; findGroupsCometFeedRegularStoriesPaginationQuery - Extract
doc_idfrom that request's POST body and pass it via--doc-id
group_namecan be null: parsed from page HTML via heuristic regex; preferposts[*].group.name(more reliable)- Localized count fields:
reactions.total_formattedandshare_count_formattedformat depends on Facebook's UI language (e.g., non-English Facebook UI may return locale-specific number abbreviations instead of"12K") - Rapid requests trigger temporary throttling: paginating too fast or calling multiple groups concurrently may return empty responses or temporary bans. Serialize group requests with a 2–5 s sleep between each
- GraphQL
field_exception/ partial edges + errors: almost always caused by session cookie being cleared. Checkc_usercookie and