The Problem Currently, when a Semantic Model refresh in Fabric/Power BI becomes unresponsiveโoften due to "zombie" queries on the source system, gateway bottlenecks, or locks on external data sourcesโthe "Cancel" command is merely a request, not a command. As per Microsoft Support, the service waits for a "safe point" or an atomic commit before stopping. However, in many real-world scenarios, the refresh enters a "stuck" state that never reaches a safe point, forcing developers and capacity admins to wait for the 5-hour hard timeout. The Impact Capacity Waste: A "stuck" refresh continues to consume CU (Capacity Units) and memory for hours, unnecessarily inflating costs or throttling other critical workloads. Development Downtime: Developers are blocked from re-triggering a corrected refresh until the "stuck" one clears. Operational Risk: If a refresh is hogging a gateway or source connection, it can cause a "logjam" effect for all other scheduled items in the tenant. The Current "Workarounds" are Non-Viable Microsoft Support often suggests restarting the On-premises Data Gateway or reassigning the Workspace Capacity to kill a stuck process. However, these are not feasible solutions in a production environment because: Collateral Damage: Restarting a gateway or shifting capacity kills every other active refresh and connection in that environment. Service Interruption: It causes a cascade of failures for critical reports that are otherwise running perfectly, leading to widespread business disruption just to stop one "zombie" process. Administrative Overhead: It requires high-level permissions and causes manual cleanup work for admins across the entire tenant. The Proposed Solution: The "Force-Kill / Discard Shadow" Switch Since Power BI/Fabric utilizes a "copy-on-write" approachโwhere a new version of the model is constructed in the background while the original remains available for queriesโthere is a logical path for a forced interruption: Discard Shadow State: Introduce a "Force Kill" option that immediately terminates the Serviceโs tracking of that specific Refresh ID. Immediate Resource Release: Rather than waiting for the external query to return or the "commit" phase, the Service should immediately stop billing CUs for the process and mark the background "shadow" model for deletion. Source Termination: Trigger a "Cancel" signal to the Data Gateway or the SQL/Data Lake source to attempt to kill the session at the source, but do not make the Service wait for confirmation from the source before releasing the Fabric Capacity. The Argument: Data Integrity is Not at Risk The standard justification for this delay is "maintaining data integrity." However, Power BI/Fabric utilizes a shadow-refresh (copy) mechanism. Since the refresh is building a new version of the model in the background while the original remains live for users: Force-killing the refresh should simply discard the background shadow copy. The original model remains untouched, so no data corruption can occur. There is no logical reason to wait 5 hours to "safely" discard a model that is already failed/stuck. Why this is feasible: Because the original dataset remains untouched until the very final "commit" phase, "killing" a refresh during the processing phase should carry zero risk of data corruption to the live model. We are simply asking for the ability to "toss out the draft" immediately rather than waiting 5 hours for the draft to realize it's stuck. Conclusion In the era of Fabric's "Pay-as-you-go" and "Capacity-based" pricing, being forced to pay for 5 hours of a stuck refresh is a significant pain point. We need a "Kill Switch" that prioritizes Capacity Governance over "Waiting for a safe point" when the developer knows the refresh is already a failure. #Semantic Model, #Capacity Management, #Developer Experience, #Refresh.
... View more