Enable rollback on install failures during update#40524
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens WSL against unrecoverable failures when an MSI major upgrade fails mid-install by (1) moving old-product removal into the MSI transaction for rollback safety and (2) adding a dedicated, localized error path when required packaged VHD files (e.g., system.vhd, modules.vhd) are missing so users get an actionable message instead of a generic HCS failure.
Changes:
- Adjust WiX
MajorUpgradescheduling soRemoveExistingProductsruns inside the MSI transaction (rollback restores the previous install on failure). - Introduce
WSL_E_SYSTEM_DISTRO_MISSINGand a localized message for missing packaged files. - Add runtime existence checks in VM startup paths and wire the new HRESULT into common error-string handling.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| msipackage/package.wix.in | Schedules MajorUpgrade removal to occur inside the MSI transaction for rollback protection. |
| src/windows/service/inc/wslservice.idl | Adds new HRESULT WSL_E_SYSTEM_DISTRO_MISSING (0x33). |
| src/windows/service/exe/WslCoreVm.cpp | Replaces debug-only asserts with production checks that throw a user-facing localized error when packaged VHDs are missing. |
| src/windows/service/exe/HcsVirtualMachine.cpp | Adds packaged-file existence validation before attaching boot VHDs (currently without setting a user-facing message). |
| src/windows/common/wslutil.cpp | Adds the new HRESULT to common error mappings and returns a localized fallback string (currently hard-coded to system.vhd). |
| localization/strings/en-US/Resources.resw | Adds MessageSystemDistroMissing localized string resource. |
|
Thanks for investigating, would it be possible to try to root cause the issue instead of a band-aid? A slightly better error message isn’t going to help users that get into this state. |
2ad3626 to
05c4925
Compare
Move MajorUpgrade Schedule to afterInstallInitialize so RemoveExistingProducts runs inside the MSI transaction. On upgrade failure, the old product is restored instead of leaving files permanently deleted. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
05c4925 to
c1f0d2c
Compare
|
Apologies for the earlier noise — I've cleaned this up. The PR is now a single-line MSI fix (the root cause), no runtime checks. The previous defense-in-depth changes have been removed to keep this focused on what actually prevents the data loss. |
ptrivedi
left a comment
There was a problem hiding this comment.
Seems to be a righteous fix targeted towards preventing data loss. It would be good to keep an eye out at figuring out why some of these installs fail as we look at more issues.
OneBlue
left a comment
There was a problem hiding this comment.
Reading through https://docs.firegiant.com/wix3/xsd/wix/majorupgrade/, this appears to be low risk and should be a direct improvement in case of an upgrade failure.
Change LGTM, although if it all possible I would recommend adding a test failure that artificially fails the upgrade to validate the rollback (potentially doing something similar to what the MsixUpgradeFails() test case does)
bb59b51 to
25ff710
Compare
25ff710 to
09ceb15
Compare
09ceb15 to
6ec96bb
Compare
6ec96bb to
3ae9004
Compare
3ae9004 to
465ae40
Compare
| // Restore ProductCode and reinstall cleanly for subsequent tests. | ||
| restoreProductCode.reset(); | ||
| InstallMsi(); | ||
| ValidatePackageInstalledProperly(); |
465ae40 to
ada5779
Compare
ada5779 to
0851863
Compare
| auto wslExePath = m_installedPath / WSL_BINARY_NAME; | ||
| VERIFY_IS_TRUE(std::filesystem::exists(wslExePath)); | ||
|
|
| // Verify rollback restored the old product's files. | ||
| VERIFY_IS_TRUE(std::filesystem::exists(wslExePath)); | ||
|
|
0851863 to
8889dc3
Compare
8889dc3 to
47bb518
Compare
| TEST_METHOD(MsiUpgradeRollbackRestoresFiles) | ||
| { | ||
| // Remove the MSI package — mirrors the setup in MsixUpgradeFails. | ||
| UninstallMsi(); | ||
| VERIFY_IS_FALSE(IsMsiPackageInstalled()); |
47bb518 to
6dd9d9e
Compare
Add MsiUpgradeRollbackRestoresFiles test that validates the Schedule="afterInstallInitialize" fix by: 1. Installing an older WSL version (2.0.2) 2. Locking wsl.exe to force the upgrade to fail 3. Verifying rollback restores files and MSI registration 4. Reinstalling current version for subsequent tests Follows the same pattern as MsixUpgradeFails() but tests the MSI-to-MSI upgrade path with rollback verification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6dd9d9e to
2f06d8d
Compare
| TEST_METHOD(MsiUpgradeRollbackRestoresFiles) | ||
| { | ||
| // Verify that direct MSI install via msiexec works after uninstall. | ||
| // This validates the test infrastructure before testing failure scenarios. | ||
| UninstallMsi(); | ||
| VERIFY_IS_FALSE(IsMsiPackageInstalled()); |
| std::chrono::minutes(2), | ||
| []() { return wil::ResultFromCaughtException() == E_ABORT; }); | ||
|
|
||
| VERIFY_ARE_EQUAL(0L, exitCode); |
| std::wstring commandLine; | ||
| THROW_IF_FAILED(wil::GetSystemDirectoryW(commandLine)); | ||
| commandLine += std::format(L"\\msiexec.exe /qn /norestart /i \"{}\" /L*V \"{}\"", m_msiPath, logPath); | ||
|
|
||
| LogInfo("Calling msiexec: %ls", commandLine.c_str()); |
Fix: all installed files lost on failed MSI upgrade
Fixes #40488
Problem
MajorUpgradewith noScheduleattribute (our previous state) uses the WiX defaultafterInstallValidate, which removes the old product outside the MSI transaction. If the new install fails afterward, all old files (~30 files, ~1.1GB including system.vhd) are gone with no rollback.Even Worse: once files are lost, reinstalling does not recover them. Running
msiexec /iagain reports success but does nothing -- MSI thinks the product is already installed and skips all file operations. Recovery requiresmsiexec /fa(repair),REINSTALL=ALL, or full uninstall + reinstall -- none of which a typical user would know to try.Fix
Add
Schedule="afterInstallInitialize"-- this moves old product removal inside the transaction. On failure, MSI restores all files from.rbfbackups automatically. The unrecoverable state is never reached.Tradeoff: ~700MB extra temporary disk during upgrade (freed on commit). No other downsides identified -- all custom actions are guarded by
(not UPGRADINGPRODUCTCODE).Not in scope
Locked-file reboot-pending deletes (kernel-mode lock holders that
MSIRMSHUTDOWNcan't kill). That's a separate issue.Test results
afterInstallInitialize)How the fix works (observed via filesystem monitoring):
During upgrade, old files are atomically renamed to
.rbfbackups (not deleted). On failure, MSI restores them from the.rbffiles automatically.