Fabric Delta Lake & Spark Performance remediate
Systematic remediate toolkit for diagnosing and resolving performance issues across Microsoft Fabric Spark compute, Delta Lake tables, and Lakehouse workloads.
When to Use This Skill
- Spark notebooks or jobs running slower than expected
- Delta table queries degrading over time (small file problem)
- Capacity throttling errors (HTTP 430 / TooManyRequestsForCapacity)
- Session startup taking minutes instead of seconds
- Out-of-memory errors on executors or drivers
- Streaming ingestion falling behind or producing small files
- VOrder or resource profile misconfiguration
- Deciding between read-heavy vs. write-heavy workload profiles
- Planning table maintenance (OPTIMIZE, VACUUM, Z-Order)
- Tuning autoscaling, dynamic executor allocation, or autotune
Prerequisites
- Access to a Microsoft Fabric workspace with Data Engineering enabled
- Contributor or higher role on the workspace
- Fabric capacity (F2+) or Trial capacity
- PowerShell 7+ for diagnostic scripts
- PySpark for notebook-based diagnostics
Triage: Identify the Symptom Category
Start by matching the observed symptom to one of these categories, then follow the linked reference for the detailed workflow.
| Symptom | Category | Reference |
|---|---|---|
| Notebook takes minutes to start | Session Startup | session-startup.md |
| Queries slow, getting worse over time | Delta Table Health | delta-table-health.md |
| HTTP 430 or "TooManyRequestsForCapacity" | Capacity Throttling | capacity-throttling.md |
| OOM errors, executor lost, task failed | Memory & Compute | memory-compute.md |
| Streaming lag, checkpoint delays | Streaming Performance | streaming-perf.md |
| Power BI reports slow against Lakehouse | Read Optimization | read-optimization.md |
| Spark job slow but no errors | General Tuning | general-tuning.md |
Quick Diagnostic Checklist
Run through these checks before diving into detailed remediate.
1. Check Capacity and Concurrency
Verify your Fabric capacity SKU and current utilization. Each SKU has a queue limit for concurrent Spark jobs:
| SKU | Spark VCores | Queue Limit |
|---|---|---|
| F2 | 4 | 4 |
| F8 | 16 | 8 |
| F16 | 32 | 16 |
| F32 | 64 | 32 |
| F64 | 128 | 64 |
| F128 | 256 | 128 |
| Trial | 128 | No queueing |
If hitting queue limits, cancel idle jobs via the Monitoring Hub, scale the SKU, or stagger job schedules.
2. Check Resource Profile
New Fabric workspaces default to the writeHeavy profile. Verify the active profile
matches your workload:
| Profile | Best For | VOrder Default |
|---|---|---|
| writeHeavy | ETL, ingestion, batch writes | Disabled |
| readHeavyForSpark | Interactive Spark queries, EDA | Enabled |
| readHeavyForPBI | Power BI Direct Lake, dashboards | Enabled |
Mismatch is one of the most common performance issues. Read-heavy workloads on the writeHeavy profile miss VOrder optimization entirely.
3. Check Delta Table Health
Run the diagnostic script or execute in a notebook:
# Quick Delta table health check
from delta.tables import DeltaTable
dt = DeltaTable.forName(spark, "your_table_name")
history = dt.history()
history.select("version", "operation", "operationMetrics").show(20, truncate=False)
Look for: high version counts without OPTIMIZE, many small ADD operations, and missing VACUUM runs.
4. Check Environment Configuration
Verify Spark compute settings in the attached environment:
# Print active Spark configuration
for item in spark.sparkContext.getConf().getAll():
print(f"{item[0]} = {item[1]}")
Key properties to verify:
spark.sql.parquet.vorder.default— true for read workloadsspark.microsoft.delta.optimizeWrite.enabled— true to reduce small filesspark.ms.autotune.enabled— true to enable ML-based query tuningspark.sql.adaptive.enabled— true for adaptive query execution
5. Check Session Startup Time
Fabric Spark session startup depends on pool configuration:
| Scenario | Expected Startup |
|---|---|
| Default settings, no custom libraries | 5–10 seconds |
| Default + library dependencies | 5–10 sec + 30 sec–5 min |
| High regional traffic | 2–5 minutes |
| Private Links or Managed VNets | 2–5 minutes |
| High traffic + libraries + networking | 2–5 min + 30 sec–5 min |
If consistently slow, check whether Private Links or Managed VNets are enabled (these bypass Starter Pools).
Key Configuration Properties
VOrder (Read Optimization)
VOrder applies a special sort order to Parquet files at write time that dramatically improves read performance for Power BI Direct Lake and SQL analytics endpoint queries.
Enable: spark.sql.parquet.vorder.default = true
Trade-off: Write operations take slightly longer. Enable only when read performance matters more than write throughput.
Native Execution Engine
Vectorized C++ engine that runs Spark SQL and DataFrame operations significantly faster than the default JVM engine.
Enable: spark.fabric.nativeExecution.enabled = true
Compatible with Fabric Runtime 1.3+. Not all operations supported; the engine falls back to JVM for unsupported operations transparently.
Autotune
ML-based automatic Spark configuration tuning. Builds a model per query pattern and optimizes settings iteratively over 20–25 runs.
Enable: spark.ms.autotune.enabled = true
Requirements: Fabric Runtime 1.1 or 1.2 only. Incompatible with high concurrency mode and private endpoints. Targets queries running 15+ seconds.
Optimized Write
Merges or splits partitions before writing to maximize file sizes and reduce small files.
Enable: spark.microsoft.delta.optimizeWrite.enabled = true
Trade-off: Adds a full shuffle before writes. Beneficial for append-heavy workloads; may degrade performance if data is already well-partitioned.
Available Scripts
| Script | Purpose |
|---|---|
| Diagnose-DeltaTableHealth.ps1 | Assess Delta table health via Fabric REST API |
| Get-SparkCapacityStatus.ps1 | Check capacity utilization and queue status |
| Invoke-TableMaintenance.ps1 | Run OPTIMIZE + VACUUM via REST API |
Available Templates
| Template | Purpose |
|---|---|
| perf-diagnostic-notebook.py | PySpark notebook for comprehensive diagnostics |
remediate
| Problem | Likely Cause