Autopilot Failure

A few weeks ago, Autopilot stopped working. Autopilot is the service that builds a Windows desktop from scratch when it first boots up, a bit like MDT. If the device hardware ID is registered in the Autopilot service then, when it starts up, it contacts Autopilot and runs the Out of Box Experience (OOBE) according to the settings in the profile.

On 14 Feb 2022, routine setup of new devices stopped working. We did not at first know why, so we spent a couple of days testing and trying different things. Then we raised a ticket with Microsoft Premier Support. The symptoms were that Store Apps were failing to install (they said “blocked by AppLocker”) and Win32 apps were failing with random errors.

Microsoft Premier Support clearly failed to understand the basics of the problem. They ran through known issues, and asked us to check basic things, and to exclude things that were failing. But Autopilot was working before 14 Feb. Why would applications that were working before 14 Feb start not working?

We spent two weeks in pointless dialog. Then on 25 Feb Autopilot started working again. We made no changes. It just started to work, as it had before 14 Feb. Premier Support offered no analysis and no explanation.

If you were in the middle of a rollout, that could be hundreds or thousands of devices that would fail. How is it even possible that Microsoft had no clue that the service had failed, no clue about why, and no clue about how to fix it?

Some time later, from a different source, we received an explanation. The issue was caused by an error in the code, where devices would sometimes randomly hit a web exception when trying to fetch the content or content info of some Win32 apps from the service. Microsoft created a hotfix and, as a result, error 0x81036502 no longer occurs.

But Premier Support did not know about the fault, did not know about the hotfix, and did not even connect our problem with the fault. This is a failure on a grand scale.

Fault with Company Portal

This is a story about the complete failure of Microsoft Premier Support to diagnose and resolve a fault in the Company Portal.

It is difficult to put into words how complete the failure is. But it includes a failure to define the problem; a failure to capture it or reproduce it; and a failure to provide any diagnosis of the cause.

The Company Portal is the Modern App, or Store App, that enables a user to install applications that have been published and made available to them from Intune. It is an essential part of the Mobile Device Management (MDM) platform. Without the Company Portal, a user can only have the applications that are “Required”. So, after Autopilot, Company Portal will often be the first place a user goes, to obtain the rest of the applications that they need to work with. An example is specialist finance applications. These might be made available to a community of finance users, but each person with install the ones they need individually.

The problem we have had for several months is that the Company Portal will suddenly disappear from a user’s desktop. It is gone. The user can search for “Company Portal” and it is not there. Where has it gone? No idea. How do you get it back? Well, you obviously can’t use the Company Portal to get it!

The facts of the problem are simple and clear, though you would not believe it from the number of times we have been asked to explain and provide logs:

  • After Autopilot completes, Company Portal is present and working.
  • Some short time later, it has disappeared from the user’s Start menu.
  • If you run Get-AppXPackage as the user, the Company Portal is not listed. However, if you log on as an admin, and run Get-AppXPackage –AllUsers, then the portal is shown as installed and available but only for the admin user.
  • The AppX event log does not show any obvious relevant events.
  • It seems to happen in episodes. And it seems to happen sometimes and not others.

We have been asked repeatedly to provide logs: Autopilot logs and One Data Collector logs. But, obviously, if you gather logs before it has disappeared, then there is nothing to see. If you gather logs after it has disappeared, then there is also nothing to see.

After a while, we asked Microsoft Premier Support to try to reproduce the fault themselves instead of continuously asking us for logs. Amazingly, they are unable to do this. Microsoft Premier Support does not have access to virtual machine, or physical machines, that can be used to reproduce faults in Intune and Autopilot. Just let that sink in. Premier Support is unable to attempt to reproduce a fault in Autopilot. It depends on the customer to reproduce it.

We had a long discussion with Premier Support about Offline versus Online apps. The Microsoft documentation for Autopilot recommends in several places that you should use the Offline version of Company Portal. This is counter-intuitive. Offline apps are designed, intended, to be used offline. The scenario given in Microsoft documentation is a kiosk or shared device that is not connected to the internet. The Offline app is installed by DISM in a task sequence, and is used offline. Company Portal, by definition, is of no use offline. It is used to install applications from Intune. If the device were offline, it would not connect to Intune. So why install the Offline version?

We eventually established, at least we think, that an Offline app is in some way cached by Intune; whereas an Online app is obtained directly from a Microsoft Store repository. This seems relevant to the case of the disappearing portal, but we never discovered more about the true difference.

In an early occurrence, we found an AppX event to say that the Company Portal was not installed because of a missing dependency. The missing dependency was the Microsoft Services Store Engagement app. This is the app that enables users to provide feedback. But this app is (apparently) an embedded part of Windows 10 and cannot be missing. We heard no more about this.

The Company Portal stopped disappearing for a while, and we deduced that the fault was in some way related to version updates. It occurred frequently when the version changed from 111.1.107.0 to 11.1.146.0. It has started to occur frequently again now the version is 11.1.177.0. Of course, we have no idea how it is related to the update. We don’t even really know how an update of an Offline app happens.

Finally Microsoft Premier Support has asked us to gather a SysInternals Procmon log, together with a UXTrace log. I have done a lot of troubleshooting with Procmon. It generates huge log files, of every file and registry operation, as well as some TCP operations. To use Procmon effectively, you need a way to stop it when the fault occurs. Microsoft Premier Support simply asked us to run it and stop it when the fault occurred. There are several problems with this. The first is that the user needs to run UXTrace and Procmon with elevated rights. In our environment (as in almost any production environment) the user does not have admin rights and cannot elevate. The second is that Procmon creates huge logs. You can’t just run it for an unspecified length of time, then stop it and save the log. Microsoft Premier Support were clearly unable to understand the problem of gathering the logs, let alone provide a solution. This is dismal. I would expect a competent second-line engineer to have the skills to work out a strategy for collecting logs. It is part of the basic skill of troubleshooting.

So, three months on, Microsoft Premier Support has no clue, and no practical problem-solving approach.

The thing we have found is that Premier Support seems to have no access to the people who develop and run Intune, Autopilot and Company Portal. They are just as much in the dark as you or I.

Microsoft Graph and PowerShell

This is a post about using PowerShell and Microsoft Graph to access data in Azure AD, Intune and Office 365. The GUI management of these Microsoft 365 technologies is constantly evolving, but there will always be things that can’t be done that way. Microsoft Graph approaches the problem from the other direction. It provides an endpoint and API to access the entire dataset. You can then write your own scripts or applications, using the object model of the whole of the Microsoft 365 suite of products.

Continue reading

WDAC FilePath Rules and Drivers

The new File Path rules in Windows Defender Application Control (WDAC) allow EXE and DLL files in the path, but not SYS, or MSI or script files. This is curious and, as far as I know, undocumented. And it means that we cannot simply allow all files in C:\Windows. If we do that, the system will not boot because the drivers will still be blocked. We will need to use another method to add drivers to a WDAC policy.

Continue reading

MDAC or WDAC

The Application Control feature in Windows 10 was originally called Device Guard Code Integrity. This was brought under the Defender umbrella of security technologies as Windows Defender Application Control (WDAC). Microsoft earlier this year announced that Windows Defender would become cross-platform (with a version of Defender antivirus for macOS) and be renamed Microsoft Defender.

In my blog posts I originally called it Microsoft Defender Application Control (MDAC). You can see in the screenshot below that all the Defender technologies for Windows 10 Endpoint Protection, in Intune, are now Microsoft Defender.

Intune Endpoint Protection Policies

However, Microsoft now seems to have standardised on WDAC, so I have reverted to that (2021).