Occupancy Insights
Turn raw presence data — from any source — into a clean, decision-ready report.
When you start
Before doing any analysis, confirm a few things in one short message to the user. Don't ask them one at a time; ask in a single batch and proceed with reasonable defaults if they don't answer.
| Confirm | Why it matters | Default if unanswered |
|---|---|---|
| Time zone of the timestamps | Day-of-week & hour-of-day buckets depend on it | Assume the local time zone of the user |
| What each row represents | Sample interval changes how you aggregate | Infer from the median delta between rows |
| Source type (sensor, Wi-Fi, badge, camera, reservations, manual) | Each source has its own caveats | Ask if not obvious from column names |
| Capacity, if known (per space, or one number) | Required to report Utilization; without it report counts only | Skip Utilization |
| The window the user cares about | Many datasets contain stale or partial periods | Use the full available range |
| Operating hours, if any | Affects how peaks and averages should be framed | Report 24/7 and call it out |
Step 1 — Detect the schema
Read the first few rows. Identify these roles by column content, not by exact column name:
| Role | Typical signals |
|---|---|
| Timestamp | ISO datetimes, Unix epochs, or date + time columns |
| Count | Integers ≥ 0 named like count, occupancy, headcount, people, devices, swipes, entries, value |
| Space identifier (optional) | Strings or IDs that repeat across rows: space, room, floor, building, zone, location |
| Capacity (optional) | A constant per space, often called capacity, max, seats, limit |
| Source (optional) | A column declaring which sensor type produced the row |
If a row's count is suspiciously above capacity, do not silently clip it. Flag it in the report's data-notes section so the user can investigate (likely a sensor double-count or a Wi-Fi-as-headcount issue — see references/multi-source-data.md).
Detect the data granularity
After identifying the timestamp column, classify the data as interval or daily by computing the median time delta between consecutive rows for a single space:
| Median delta | Granularity | Headline metrics to use |
|---|---|---|
| < 24 hours (e.g. 1 min, 5 min, 15 min, 1 hour) | Interval | Average daily peak, Typical daily peak (P90 of daily maxes), Single highest peak — these aggregate intervals up to days |
| ≈ 24 hours | Daily | Average daily count, Typical daily count (P90 of values), Single highest day — peak == count, so the "peak" framing is meaningless |
| > 24 hours (e.g. weekly summaries) | Coarser than daily | Report what's there honestly. Call out that day-of-week patterns and heatmaps cannot be produced. |
This is the single most common framing mistake: leading with "average daily peak" on a CSV that already has one row per day. If granularity is daily, the metric names in the output template must change to "Average daily count" / "Typical daily count" / "Single highest day". Apply the substitution everywhere — section headers, table cells, prose.
Step 2 — Compute the base metrics
Always produce these. Numbers without context are noise — every metric gets a unit and a frame of reference.
- Total observations and date range covered
- Average count across the window (mean across all rows)
- Daily-peak metrics (interval data only) — for each day, take that day's max; report the average and the P90 of those daily maxes
- Typical day (daily data only) — the P90 of the daily values
- Single highest — value, date, and (for interval data) time
- Utilization versus capacity, if capacity is provided. Always show as a percent and as raw counts side-by-side. Never report Utilization without showing the raw count too.
If the user asks "how busy was X last month", lead with the appropriate-granularity peak/typical metric and (if capacity exists) Utilization. Don't lead with average count alone — it understates how busy the space feels.
Step 3 — Find the patterns
Do these in order; stop when you have enough signal to answer the user's question.
- Day-of-week pattern — group by weekday, report each day's average count and (for interval data) average peak. Call out:
- Weekday-vs-weekend gap (if the gap is > 5×, the space is clearly an office/weekday-pattern space)
- The single busiest weekday and the single quietest weekday
- Hour-of-day pattern (interval data only) — group by hour, report the busiest 2–3 hours and the quietest 2–3 hours.
- Day-of-week × hour-of-day heatmap (interval data only) — see
references/analysis-recipes.mdfor the recipe (color scale, ordering, annotations). - Trend over the window — only if the window is at least 90 days. Otherwise skip; a trend on shorter windows is misleading because day-of-week and weekly seasonality dominate. Use
scripts/compute_trend.pyfor the math; it returns a slope, R², percent change, and a classification. Never report a trend on under 90 days of data, even if asked. - Anomalies — flag days that look unusual against their same-day-of-week baseline. Use
scripts/detect_anomalies.py. Suggest plausible causes (holiday, closure, event, sensor outage), but never assert a cause — the user knows their building, you don't.
Step 4 — Caveat the data source
Different presence sources count different things. Apply the right framing — see references/multi-source-data.md. Top-line rules:
- Wi-Fi device counts ≠ headcount. Never silently apply a divisor. Either ask the user for their calibration ratio or report device counts as-is and label them "devices" in every chart, table, and prose mention.
- Badge swipes are entry events, not occupancy. They tell you how many people came in, not how many are present right now. Convert to occupancy only if you have matched exits.
- Sensor / camera headcounts are the closest to true occupancy. Treat as authoritative for instantaneous count.
- Reservation data is intent, not presence. Two of three booked rooms typically sit empty. Report bookings as a ceiling on occupancy, not a measurement.
Step 5 — Pick the output format
The default output for any "report" request is a self-contained HTML file. Do not ask, do not offer a markdown alternative — render HTML and save it to disk.
Treat the user's wording as the trigger. If the prompt contains any of these words or phrases, produce an HTML file:
- report, insights report, analysis report
- document, deliverable, write-up, memo, brief, summary
- send, share, save, export, PDF, email
- a named output ("HQ March wrap-up", "Q1 review for the CEO")
Inline markdown is reserved for narrow follow-ups inside an existing conversation — not for first-shot requests. Use it only when:
| User intent | Output |
|---|---|
| Quick question already in flight ("how busy was Tuesday?") | Inline markdown in chat |
| Single chart, no surrounding analysis ("show me a heatmap") | Inline markdown plus the chart |
| Anything containing the words above, OR the user's first message about the data | Self-contained HTML file |
If the request is genuinely ambiguous (no trigger words, but more than a one-liner question), default to HTML rather than asking. Asking adds a turn the user didn't want; HTML is always salvageable as a file the user can ignore.
For HTML reports, follow references/html-reports.md. Use assets/report-template.html as the skeleton — it includes the Occ