Capsulate — Email Intelligence Skill
Transform your inbox into structured, queryable dashboards using AI. No backend required — everything runs locally.
What This Does
- You describe what you want to track from your emails
- Claude generates an extraction schema and Gmail search query
- Your emails are searched, filtered, and deduplicated
- Subagents extract structured data from batches of emails in parallel, saving after each round
- A self-contained HTML dashboard is generated and opened in your browser
Prerequisites Check
Before starting, verify the user has the required tools. Run these silently:
which gws
gws auth status
If gws is not installed, tell the user:
Install the Google Workspace CLI:
npm install -g @googleworkspace/cliThen authenticate:gws auth login -s gmailThen run this skill again.
If gws is installed but the auth check fails for any reason, show the user the actual error output and tell them:
Run
gws auth login -s gmailto connect your Gmail account, then try again.
Re-running an Existing Canvas
If the user runs /capsulate and canvases already exist at ~/.capsulate/, list them and ask:
You have existing canvases:
subscription-trials(47 items, last updated Mar 28)package-tracking(12 items, last updated Mar 25)What would you like to do?
- Refresh an existing canvas (fetch only new emails)
- Open an existing canvas dashboard
- Update schema for an existing canvas (re-extract all items with new fields)
- Create a new canvas
Option: Open
Run open ~/.capsulate/<CANVAS_NAME>.html and exit.
Option: Refresh
Skip Step 1. Load the saved canvas JSON to get the existing query, schema, known IDs, and last_updated timestamp. Proceed to Step 2a in incremental mode.
Option: Update schema
Ask the user what fields to add, remove, or change. Show the current schema and proposed new schema for confirmation. Then re-run Steps 2a–4 against all emails (not incremental), replacing all existing items. Merge by _id: overwrite existing records, keep any that weren't in the new result set.
Option: Create
Proceed from Step 1 as normal.
Step 1 — Define the Canvas
Ask the user two questions:
What do you want to track from your emails? Examples: "subscription trials and renewals", "package deliveries", "invoices and receipts", "job applications", "event invitations"
How far back should I look? Options: last 30 days, last 3 months, last 6 months, last year, or all time
Convert their answer to a after:YYYY/MM/DD date string based on today's date. Use all time to mean no date filter. Store this as the canvas's time range.
Based on their answers, generate three things and confirm with the user before proceeding:
1a. Canvas Name
A short slug, e.g. subscription-trials, package-tracking, invoices
Sanitize the slug: lowercase, replace spaces and special characters with hyphens, strip anything that is not a-z, 0-9, or -. Example: "Q1 invoices / March" → q1-invoices-march.
1b. Gmail Search Query
A Gmail search string that will find the most relevant emails. Be specific to avoid noise. Examples:
- Subscriptions:
subject:(trial OR subscription OR renewal OR "free trial") -category:promotions - Packages:
subject:(shipped OR delivered OR tracking OR "out for delivery") -category:promotions - Invoices:
subject:(invoice OR receipt OR "order confirmation" OR payment) -category:promotions - Job apps:
subject:(application OR interview OR "thank you for applying" OR offer) -category:promotions
1c. Extraction Schema
A JSON object defining what to extract from each email. Keep fields concise and typed. Example for subscriptions:
{
"company": "string — name of the company or service",
"plan": "string — plan or tier name if mentioned",
"trial_end_date": "date (YYYY-MM-DD) — when trial ends, null if not found",
"amount": "string — price or amount, include currency symbol",
"status": "enum: trial | active | cancelled | expired | unknown",
"action_required": "boolean — does this email require the user to take action",
"notes": "string — any other relevant detail in one sentence, null if nothing notable"
}
Show the canvas name, query, time range, and schema to the user and ask: "Does this look right? I'll search your Gmail and extract data based on this."
Step 2a — Search Gmail
Once confirmed, build the query and search Gmail for matching emails.
Promotions hint: If the canvas type is unlikely to involve promotional emails (e.g. package tracking, invoices, job applications), append -category:promotions to exclude Gmail's Promotions tab and reduce noise. Skip this for canvases explicitly about deals, offers, or subscriptions where promotional emails are relevant.
Incremental mode (refreshing an existing canvas): use the canvas's last_updated timestamp as the date filter to only fetch new emails.
gws gmail +triage --query "<GENERATED_QUERY> after:YYYY/MM/DD" --format json --max 500
Full mode (new canvas or schema update): apply the time range chosen in Step 1. If the user chose "all time", omit the after: filter.
gws gmail +triage --query "<GENERATED_QUERY> after:YYYY/MM/DD" --format json --max 500
This returns a JSON object with a messages array, where each item has id, from, subject, and date.
- If 0 results (incremental): tell the user "No new emails since last refresh." Re-open the dashboard and exit.
- If 0 results (full): tell the user and suggest either a broader query or a wider time range. Offer to try again.
- If the result contains 500 items, warn the user that there may be more emails not included within the selected time range and offer to re-run with
--max 1000.
Do not proceed to extraction. Always run Step 2b next to filter and deduplicate the results before spawning any subagents.
Step 2b — Filter & Deduplicate
Before spawning any subagents, reduce the message list in two passes:
Deduplicate: If refreshing, load the existing canvas JSON and collect all known _id values. Remove any messages already in the saved canvas. Tell the user: "Found X new emails to process (Y already extracted, skipping)."
Pre-filter irrelevant emails: Scan the triage metadata (subject and from — no body fetch needed) and discard clearly irrelevant messages. Apply in order:
- Drop messages where subject matches generic noise:
Re:,Fwd:,[Automated],Out of Office,Delivery Status Notification,Undelivered Mail - Drop messages from bulk-mail senders (noreply@, mailer-daemon@, bounce@, postmaster@, no-reply@) unless the sender domain matches a brand relevant to the canvas (e.g. a shipping carrier for package tracking)
- Apply a canvas-type relevance heuristic to the subject — if the subject clearly cannot contain data matching the schema (e.g. "Weekly digest" for a package-tracking canvas), drop it
Tell the user: "Pre-filtered X of Y emails as irrelevant (Z remaining to extract)."
Extract the id field from each remaining message to build the final list of IDs to process.
Do not proceed to Step 3 until both deduplication and pre-filtering are complete and the final ID list is ready.
Step 3 — Extract Data (Parallel Subagents)
Ensure the data directory exists before the first round:
mkdir -p ~/.capsulate
Group the email IDs into batches of 5 emails per subagent, running 4 subagents concurrently (20 emails per round). Pause 2 seconds between rounds to respect rate limits.
When building each batch, include the subject and from metadata from the triage results alongside each ID — the subagent needs these to decide whether to skip without fetching the body.
After each round completes:
- Collect the subagent results, flatten into a list, discard
_skip: trueitems (count them), discard JSON par