Microsoft Fabric Data Factory Performance remediate
Systematic approach to diagnosing and resolving performance issues in Microsoft Fabric Data Factory pipelines, copy activities, and dataflows.
When to Use This Skill
- Pipeline execution takes longer than expected
- Copy activities are slow or appear stuck
- Activities show "Not Started" status for extended periods
- Capacity throttling errors (HTTP 430, TooManyRequestsForCapacity)
- Throughput is lower than expected for copy operations
- Dataflow Gen2 refresh is slow or timing out
- Pipeline monitoring shows performance degradation over time
- Need to optimize parallelism, DIU, or partitioning settings
Prerequisites
- Access to Microsoft Fabric workspace with Contributor or higher role
- Familiarity with the Fabric Monitoring Hub
- Understanding of Fabric capacity SKUs and their limits
- PowerShell 7+ for running diagnostic scripts
Diagnostic Workflow
Step 1: Identify the Bottleneck Category
Determine which category your issue falls into:
| Category | Symptoms | Start Here |
|---|---|---|
| Copy Activity Slow | Low throughput, long transfer duration | copy-activity-tuning.md |
| Pipeline Stuck | Activity shows In Progress with no movement | pipeline-stuck-resolution.md |
| Capacity Throttling | HTTP 430 errors, jobs queued | capacity-throttling-guide.md |
| Dataflow Slow | Dataflow Gen2 refresh takes too long | dataflow-optimization.md |
| Spark Job Queue | Jobs stuck in "Not Started" status | capacity-throttling-guide.md |
Step 2: Collect Diagnostics
Run the diagnostic script to gather baseline metrics:
./scripts/Get-FabricPipelineDiagnostics.ps1 -WorkspaceId "<guid>" -PipelineName "MyPipeline"
Or manually collect from the Monitoring Hub:
- Open Fabric portal and navigate to Monitoring Hub
- Filter by pipeline name and time range
- Select the run details (glasses icon) for the slow run
- Capture the Duration Breakdown for copy activities
- Note the queue time, transfer time, and pre/post-copy script duration
Step 3: Apply Targeted Fixes
Based on the bottleneck category, apply the appropriate optimization from the reference guides.
Quick Fixes for Common Issues
Copy Activity Running Slowly
- Set Intelligent Throughput Optimization to
Maximum(or custom 4-256) - Configure Degree of Copy Parallelism based on source type
- Enable Partition Option for SQL sources (Dynamic Range or Physical)
- Pre-calculate partition upper/lower bounds to avoid overhead
- Enable Staging when sink is Fabric Warehouse
Pipeline Activity Stuck
- Cancel the stuck activity and retry
- Check source/sink connectivity and credentials
- Verify Fabric capacity is not in throttled state
- Review if payload exceeds 896 KB limit
- Check for connection timeout or network interruption
Capacity Throttling (HTTP 430)
- Check current Spark concurrency against SKU limits
- Cancel unnecessary active Spark jobs via Monitoring Hub
- Consider upgrading to a larger capacity SKU
- Distribute pipeline trigger times to avoid burst load
- Use job queueing for non-interactive Spark workloads
Dataflow Gen2 Performance
- Reduce data volume with query folding and filters
- Avoid unnecessary data type conversions
- Minimize the number of transformation steps
- Use staging for large datasets
- Check for connector-specific throttling
Capacity SKU Quick Reference
| SKU | Max Spark Cores | Queue Limit | Equivalent Power BI |
|---|---|---|---|
| F2 | Limited | 4 | - |
| F4 | Limited | 4 | - |
| F8 | Limited | 8 | - |
| F16 | Limited | 16 | - |
| F32 | Limited | 32 | - |
| F64 | Standard | 64 | P1 |
| F128 | Standard | 128 | P2 |
| F256 | Standard | 256 | P3 |
| F512 | Standard | 512 | P4 |
| F1024 | Large | 1024 | - |
| F2048 | Large | 2048 | - |
| Trial | P1 equiv | N/A (no queue) | P1 |
Copy Activity Performance Settings Reference
| Setting | Property | Range | Recommendation |
|---|---|---|---|
| Intelligent Throughput Optimization | dataIntegrationUnits | Auto, Standard (64), Balanced (128), Maximum (256), Custom (4-256) | Start with Auto, increase for large datasets |
| Degree of Copy Parallelism | parallelCopies | 1-256 | Auto for most; limit to 32 for Fabric Warehouse sink |
| Partition Option | Source settings | None, Physical, Dynamic Range | Use Dynamic Range for large SQL tables |
| Enable Staging | enableStaging | true/false | Required for Fabric Warehouse sink |
| Source Retry Count | sourceRetryCount | Integer | Set 2-3 for transient failures |
| Fault Tolerance | enableSkipIncompatibleRow | true/false | Enable for non-critical loads |
Error Code Quick Reference
| Error | Meaning | Action |
|---|---|---|
| HTTP 430 | Capacity compute limit reached | Reduce concurrent jobs or upgrade SKU |
| Payload too large | Activity config exceeds 896 KB | Reduce parameter sizes |
| TooManyRequestsForCapacity | Spark compute or API rate limit | Cancel active jobs or wait |
| Connection timeout | Source/sink unreachable | Check network, credentials, firewall |
| Deflate64 unsupported | Compression format not supported | Re-compress with deflate algorithm |
Monitoring Setup
Enable workspace monitoring for ongoing performance analysis:
- Go to Workspace Settings > Monitoring
- Add a Monitoring Eventhouse and enable Log workspace activity
- Query the
ItemJobEventLogstable with KQL for pipeline-level insights
Example KQL query for failure trends:
ItemJobEventLogs
| where ItemKind == "Pipeline"
| summarize count() by JobStatus
See workspace-monitoring-setup.md for detailed configuration.