This is a post about where to look to find the cause when Autopilot fails.
By “Autopilot”, I am referring to the whole process of deploying, enrolling and setting up a Windows device. The process really contains several distinct parts:
- The Out of Box Experience (OOBE) like selecting language, region and keyboard
- Enrolment in Intune and joining the Azure AD domain (or hybrid)
- Implementing all the policies and apps assigned to the user or device by Intune
- The Enrollment Status Page (ESP) which monitors and controls the process nearly from the beginning to the end.
But I am using the term “Autopilot” to refer to all these, for convenience.
We can distinguish two types of failure. One, when setting the process up and testing it to see if it works. Another, during deployments when everything is supposed to be working. This post is about the second. For the first, you can generally follow the guides for setting up Autopilot and ESP, and search the documentation if it is not working. For the second, you need a good understanding of how the process works, what happens when it goes wrong, and where to look to find the cause.
Here is the best end-to-end diagram of the process: Windows Autopilot deployment process. And here is the page that best describes what happens in each phase of EPS: Enrollment Status Page tracking information. It is worth studying these in detail.
Getting the logs for Autopilot is straightforward. From a command prompt, run:
“mdmdiagnosticstool.exe -area DeviceEnrollment;DeviceProvisioning;Autopilot;Tpm –cab C:\Temp\Autopilot.cab”.
You will need to run elevated to get the TPM diagnostics. You will also need to make sure that whoever runs the command is able to save in the location specified. If you are asking a standard user to run the diagnostics, you can use Settings > Accounts > Access work or school > Export your management log files.
Michael Niehaus has written scripts to provide a quick interpretation of the diagnostics logs: Get-AutopilotDiagnostics. Running this script against the cab file is the first place to start.
There are a few points to note about the diagnostics:
- You need to run it as soon as possible after the problem occurs. Several of the event logs contained in the diagnostics collection roll over, so events will be lost if you only run it later.
- If you are reproducing a reproducible fault, you can add other tools like netsh, by breaking in with Shift+Fn10 before the fault occurs.
- The diagnostic logging only captures events relating to Autopilot and enrolment. If you want a wider selection of logs, you may want to run One Data Collector at the same time. It only takes a few minutes to run.
- Running Get-AutopilotDiagnostics with the –Online parameter will fetch the application name to match the ID, saving you a lot of time in trawling logs to find the name of a failed app ID.
If the failure occurred before the ESP started, then this is Autopilot proper. The place to look is in the event log: Applications and Services logs > Microsoft > Windows > ModernDeployment-Diagnostics-Provider > Autopilot.
When the ESP starts, we can find the place that it fails in the registry. This is obtained from: HKLM\Software\Microsoft\Provisioning\AutopilotSettings. Each sub-category has a status of: notStarted; inProgress, succeeded; or failed. It would be great if these could be surfaced somewhere more accessible. As it is, they can be found in the diagnostics log file: MdmDiagReport_RegistryDump.reg.
For example, see this obscure fault: Error code 0x80180014 when re-enrolling using self-deployment or pre-provisioning mode. The documentation is not correct. Normally, when you do a “Fresh Start” or a “Wipe”, the object in Intune is soft-deleted. But, occasionally, this back-end process fails. As a result, when you do a reset, it will fail in Device Preparation, at the “Registering your device for mobile management” stage i.e. enrolling in Intune. The status is recorded at: HKLM\Software\Microsoft\Provisioning\AutopilotSettings\DevicePreparationStatus.Category.
This happens because the Intune object has not been deleted as it should be. The solution is to find the object and delete it manually. Knowing the stage it failed at enables you to investigate why this happened.
It can be useful to know exactly when the fault occurred. This helps us to correlate different logs. Bizarrely, the time is shown in the event log: Applications and Services logs > Microsoft > Windows > Shell-Core > Operational. Use the “Find” action to search for the word “failed”. This will show the CloudExperienceHost Web App Event with “Subcategory ID = DevicePreparation.MdmEnrollmentSubcategory; state = failed.” When you know the time, you can search other logs to see if anything distinctive happened at exactly that time.
As I said at the beginning, the purpose of this post is to describe how to investigate unexpected faults. Once you know where and when in the process the fault occurred, you can follow the trail to diagnose it. If you know the type of fault, you may well need to reproduce it with additional logging to find the cause.