Microsoft Intune Enterprise App Management Explained

Working in end-user computing, one of the things I consider constantly is the extent to which you need additional products to do the job efficiently. There’s always a trade-off; between the additional cost on the one hand, and the increased functionality or ease of use on the other. Sometimes functionality is absorbed into the core Microsoft product set, and sometimes it remains outside.

Last year, Microsoft introduced the Enterprise App Management (EAM) feature for Intune. This extends app deployment for Windows with a catalog of pre-packaged apps. Up to this point, even the smallest and simplest app needed to be wrapped in a custom intunewin file and configured with deployment parameters. With EAM, you just select an app and assign it.

This article looks at how far the EAM feature goes in meeting enterprise app deployment requirements. Does it replace third-party solutions, or even your app packaging engineers? Let’s take a look.

Enterprise App Management

EAM is a premium add-on to Intune. It provides a catalog with pre-packaged configurations, ready to deploy an app simply by selecting it. It does many of the things an engineer would otherwise need to do:

  • Fetch a current version of the media from the vendor
  • Apply a command line to install and uninstall the app silently
  • Decide on the context: system or user
  • Handle return codes (e.g. soft reboot)
  • Create a detection rule, to determine if the installation was successful, or if the app is already installed.

Common and standard apps

I would guess there are at least 8 to10 apps in any environment where a standard configuration is enough:

  • Browsers, like Chrome and Firefox
  • Videoconference clients, like Zoom and Webex
  • Virtual desktop clients, like Citrix, VMware or AWS
  • The Microsoft Visual C++ Redistributable stack.

There is also often a tail of apps that are either free or enterprise-licensed, like 7-Zip, Notepad++ and Visual Studio Code.

Customisation in EAM

You can apply your own customisations to catalog apps. In this case the catalog at least gives you a start. Most obviously, you can edit the install command. You can also, of course, create a dependency or a custom requirement (to test whether to install or not).

Example

A good example of this is the Citrix Workspace app. Citrix XenServer and NetScaler create a large and complex technical environment, and the Workspace app is the means of access to it. A Citrix user might have: a) different identities, b) different environments to connect to, and c) different functional requirements, like copy/paste or file transfer.

The Intune EAM catalog app gives this command install:

"CitrixWorkspaceApp24.9.10.28.exe" /silent /noreboot /AutoUpdateCheck=disabled

It also gives the uninstall command, to save figuring it out:

"%ProgramFiles(X86)%\Citrix\Citrix Workspace 2409\bootstrapperhelper.exe" /uninstall /cleanup /silent

That performs a basic install and uninstall, obviously. But what is this client going to connect to, and how? For that, you need to understand the whole client configuration in the Citrix Workspace App command-line parameters; which is what a packaging engineer does. So you might, instead, have an install command of:

"CitrixWorkspaceApp24.9.10.28.exe" /silent /noreboot /AutoUpdateCheck=auto /includeSSON

You might also appreciate that the Store the Workspace app connects to can be configured in policy, instead of the installation, which enables you to change it without re-installing. So there is an architecture behind the deployment that needs to be developed and understood before you can deploy the app, even with the EAM catalog.

Limitations

Although you can edit the catalog app, at present you cannot use an MST (transform file), or a script with a catalog app, because there is no facility to upload additional files. However, Microsoft told me in January this year that: “We have a solution in the pipeline for this today. Not much more to say about it for now.” So it looks as though this will be possible in future.

My point in this trivial example is that, even with a catalog app and a standard configuration, you may be able to achieve what you want for a subset of your apps.

Beyond the Catalog

These are fairly simple app deployments. There are two cases where the catalog cannot help you:

  1. Where you need more extensive customisation
  2. Where the app is not in the catalog.

Extensive customisation

Once beyond the collection of simple apps, app deployment expands into a world of great complexity. There have been a few like this in every migration programme I have worked on. Sometimes these are legacy apps. Sometimes they are not legacy, but complex nevertheless.

The open source PSAppDeployToolkit (PSADT) covers several of the more common customisations. Essentially, this adds wrap-around functions to the execution of the command line:

  • Pre or post install scripts
  • Copy or delete files
  • Modify the registry
  • Generate verbose logs

You could write these actions into your own script, but it would get tedious after a while, and you would probably write common functions, just like PSADT.

Then there are installations that need to be interactive. Suppose the app is an update or upgrade, and needs to close the current app? Or suppose it needs to close all Office apps while it installs? PSADT uses an interactive dialogue to do this, including to pause or defer the installation if now is not a convenient time.

Normally, in Intune, this dialog would not be seen because there is no interaction between the system context and the user session. The installation would just time out, waiting for user input. But the ServiceUI.exe utility from the Microsoft Deployment Toolkit (MDT) provides this capability. So PSADT creates a dialogue to prompt the user for input; and ServiceUI surfaces the dialog to the session of the logged on user.

If you needed any of these options, you would wrap the command line in the PSADT template, then package it as an intunewin file and upload it. The install command becomes the command to run the PSADT package, instead of the app command line. PSADT will exit with a return code that Intune will interpret as a result.

In this model, PSADT controls the app install command, and Intune controls the PSADT execution.

The Intune Enterprise App Management feature is not going to be able to do this without a very large enhancement. I have no idea whether Microsoft has this in mind, or is content to leave it as a catalog of fairly simple apps with fairly simple command line installs.

Not in the catalog

Inevitably, there will also be legacy apps or uncommon apps that are not in the EAM catalog. You can see that the Microsoft catalog currently contains mostly free apps or free clients to larger paid-for systems. They say they will add more, and I am sure they will. But, in the meantime, you still need someone to package them. Some apps are unlikely ever to be in a catalog. They may be proprietary, or have a very specialised customer base, or have customisations that are unique to the organisation, or just be very old.

The future of app deployment

So now I go back to my basic question: does it replace third-party solutions, or even your app packaging engineers?

The beginning of app deployment

Right from the beginning, app deployment in Windows had command lines for unattended install. These existed in InstallShield packages, in Windows Installer MSIs and other installers like NSIS (Nullsoft), Inno and WiX Toolset. As a consequence, there was, and is, an ecosystem of open and closed libraries of install commands. As a packager you have three options:

  1. Read the vendor documentation
  2. Deduce the commands from the installer type, or from the /? help
  3. Use a search engine or even AI to find the commands.

Catalogs

It is a small step from there to provide a catalog of pre-packaged apps with standard commands. Altiris did this many years ago for common apps like Adobe Reader, Flash, Java Runtime. These were especially useful in the days before automatic updates, because you really did need to keep updating Flash and Java runtime every time a vulnerability was discovered. Now many vendors offer pre-packaged apps as part of an endpoint management solution. It is really quite easy to offer an app with a standard command line install and uninstall.

But, as you do this, you run into exactly the problems I described above: more complex customisations, and apps not in the catalog.

Two vendors now offer PSADT as part of their app catalog packages: PatchMyPC and Apptimized. That means they can provide an extended range of customisations. PatchMyPC have taken stewardship of PSADT, and played a role in the major upgrade to v4.0 of the toolkit. Both companies offer a large catalog of pre-packaged apps, and a packaging service for custom packages.

It is not my aim to compare Intune Enterprise App Management in detail with other products and vendors like PatchMyPC and Apptimized. I just want to provide some context. In summary, there is a scale of complexity:

  1. Common apps with simple command line installs
  2. More complex but still standard customisations, for example with PSADT
  3. Complex customisations, and legacy or uncommon apps.

Intune Enterprise App Management currently hits Level 1. This gives a rough feature parity with other endpoint management solutions.

PSADT provides the wrapper for a standard set of more complex customisations at Level 2. But you still need to be a fairly experienced packager to use it effectively. Anyone can wrap a command line in a nice PSADT dialogue. The skill is to know what needs to be customised and how best to do it; registry, policy, file, script? PatchMyPC and Apptimized have products that do this as a service. Either they have the package already prepared, or can create it as a custom app.

However, in any large organisation, there will still be a requirement for more complex customisations. There will also be legacy or uncommon apps. You could just hand all the material over to a packaging factory and tell them what you need. But a) finding out what you need is most of the challenge, and b) then you are using a specialist, just not employing them.

So Enterprise App Management is a great addition to Intune. It will save time with the simpler and more common apps. The apps in the catalog can still be customised a little, and there will probably be more customisation in time. But it does not have the range of customisation options available in PSADT, or through services like PatchMyPC and Apptimized. Even then, you will probably need a specialist application packager for a few of the most difficult apps.

Autopilot Timings

This post gives test timings for different configurations in Autopilot. It follows on from a previous one about modern deployment in 2025. The aim is to see how the new Autopilot Device Preparation (Autopilot v2.0) compares with the classic Autopilot (Autopilot v1.0).

Autopilot v2.0 has significant differences in architecture, which Microsoft says will provide a faster and more reliable setup. This is important for the end-user experience. I have run different configurations in a test environment to see how they compare, and these are the results.

Methodology

First is a description of the test methodology. The tests were done as follows:

  • Using two identical VMs in Hyper-V
  • Each VM has 4 GB of vMem, and 12 vCPUs
  • Running on an HP Z2 workstation G9
  • With about 650 Mbps download and 65 Mbps upload speed
  • A clean ISO of Windows 11 24H2
  • One VM registered as an Autopilot device, the other not
  • In Intune, security baselines and a policy each for Microsoft 365 Apps, Edge and Device Inventory: same policies for all.

For apps:

  1. The built-in Microsoft 365 Apps (actually delivered by the Office CSP)
  2. Company Portal
  3. Custom win32 app for Adobe Acrobat Reader DC
  4. Microsoft Visual C++ 2008 Redistributable
  5. Microsoft Visual C++ 2015-2022 Redistributable
  6. 7-Zip
  7. Google Chrome
  8. Microsoft Visual Studio Code.

The first three apps on the list were used for a partial list of apps. The next five came from the new Enterprise App Catalog.

The Enterprise App Catalog apps were used as an easy way to pad out the deployment. Curiously, however, they cannot be added to a list of apps in either version of Autopilot.

All apps were assigned to both a dynamic device group for Autopilot-registered devices, and a static device group for Device Preparation.

Pre-provisioning cannot be done with a Hyper-V VM, so this was not tested. As there is no pre-provisioning in Autopilot Device Preparation, it would not be possible to make a comparison.

Configurations

  1. OOBE
    • Account not assigned a Device Preparation profile, and device not registered as an Autopilot device
    • Represents the least possible time to deploy a device.
  2. Autopilot with 3 blocking apps
    • Device is registered as an Autopilot device
    • Profile is configured with 3 blocking apps.
  3. Autopilot with all apps blocking*
    • Device is registered as an Autopilot device
    • Profile is configured with all apps blocking.
  4. Autopilot Device Preparation with 3 reference apps
    • User is a member of group assigned the device preparation profile
    • Profile is configured with 3 reference apps.

* In this configuration, I had the same intermittent error described here: Autopilot error with Microsoft 365 Apps.

Timing

Timings given are the average of three runs, given in minutes and seconds.

ConfigurationTimeTime minus OOBEIntune TimeComment
OOBE03:0500:0000:00
Autopilot with 3 blocking apps10:1507:1007:38No more apps installed on completion
Autopilot with all apps blocking12:3009:2510:19All apps installed
Autopilot Device Preparation with 3 reference apps13:0710:0209:27Apps continue to install after completion

OOBE represents the time taken for activities that are common across all configurations: authentication, MFA prompts, language and keyboard settings, Windows updates and reboots, Windows Hello for Business. The main difference is that, in Device Preparation, the privacy settings are not suppressed. This adds possibly 5-10 seconds. You can set a policy to “Disable Privacy Experience”, but I did not.

The time minus OOBE represents the amount of variable time, depending on the work. I would expect a production deployment to take a multiple of this, having more apps and a slower network. For example, a time of 09:05 times 2 equals 18:10, plus 03:05 for OOBE equals 21:15 would be good for a user-driven deployment in my experience. A time of 09:05 times 3 equals 27:15, plus 03:05 for OOBE equals 30:20 would not be unusual.

The Intune time is the time reported in Intune by the enrollment monitoring service. From observation, this is the time in the “Setting up for work or school” dialogue.

Interpretation

In this test, the new Autopilot Device Preparation is not faster than classic Autopilot.

Device Preparation with 3 reference apps is significantly slower than Autopilot with 3 blocking apps. Device Preparation with 3 reference apps is still slower than Autopilot with all apps blocking (i.e. installing 8 apps), although less so.

This is surprising. Enrolment Time Grouping, in Autopilot Device Preparation, is supposed to make it faster because Intune only needs to enumerate the apps assigned to the one group. In classic Autopilot, in the ESP, you can see a significant amount of time spent in “Apps (Identifying)”. But this change does not seem to result in a shorter elapsed time for Device Preparation.

I was not able to test Device Preparation with all apps (actually a maximum of 10) referenced. But I can estimate it would be 2-3 minutes slower, because that is the amount of extra time taken in classic Autopilot.

In effect, the reference apps feature of Device Preparation enables you to close the gap on how much slower it is than classic Autopilot.

Curiously, the times reported in Intune give a shorter time for Device preparation with 3 reference apps than for classic Autopilot with all apps blocking, whereas the total elapsed time was longer. At the moment, I do not know if this is because they measure different things, or some other reason outside this calculated time.

Conclusions

The new Autopilot Device Preparation (Autopilot v2.0) service is not faster than classic Autopilot (Autopilot v1.0), despite the changes in architecture.

However, the new Reference Apps feature will make deployment faster in situations where you have more default applications, or large ones, or both. I can see this being quite a common scenario.

With classic Autopilot, if you select to block on only a few apps, Intune does not carry straight on at the end of Autopilot to install the remaining apps. Instead, it halts and waits for a new sync cycle. This limits the usefulness of blocking. With Autopilot Device Preparation, if you select only a few reference apps, Intune does carry straight on afterwards with the remaining apps. I think this makes it a practical choice with an acceptable experience for the user. The more apps you have, the more useful it is.

Speed is not the only factor for deployment, of course. Reliability is also important. But this post is only about the timings.

Modern Deployment 2025

Bulk deployment of end-user computing (EUC) devices is a fact of corporate life, and has been for at least 20 years. The vendors and products change, but the task remains essentially the same: how to distribute devices to staff with the desired applications and configurations.

This blog is about deploying Windows devices, and for the managers of the process rather than the technicians. Windows deployment is a mainstream topic with some excellent technical commentary from Michael Niehaus, Peter van der Woude, Rudy Ooms and others. There is rather less about the pros and cons of different methods.

Autopilot v1.0 provides a cloud service for Windows deployments, to replace on-premises re-imaging. But it can be unreliable, and is a slower experience for the end user unless you prepare (pre-provision) the device in advance.
Autopilot v2.0 (called Autopilot Device Preparation) is significantly simplified and so should be more reliable. Currently, it lacks a pre-provisioning mode, which restricts it to the slow experience. But this is mitigated by a new feature that allows you to select which apps are installed as a priority before the user reaches a working desktop. The more standard apps you have, the more of an advantage this is.
It may be unfashionable, in the age of cloud services, but an on-premises re-imaging service combined with Autopilot v2.0 will probably provide the most efficient result overall.

Deployment

The aim of deployment has been remarkably consistent. I can’t see that it has changed at all: take a device from the manufacturer, and convert it to one that is ready for work as efficiently as possible.

Image –>Re-image –>Identity –>Applications –>Configurations
  1. Image
    • The OS image deployed to the original equipment by the manufacturer (OEM)
    • Because manufacturers generally compete in consumer as well as business sectors, the OEM image tends to contain a variety of other applications: freeware, trialware and vendor utilities.
  2. Re-image
    • A custom image deployed in place of the OEM one
    • Either simply to remove all the vendor-added software
    • Or to do that as well as add corporate software, in order to speed up the overall process of delivering a fully ready device
    • Done by a wide variety of imaging tools: historically, Ghost; then Altiris, Windows Deployment Services (WDS), Microsoft Deployment Toolkit (MDT), FOG and many others.
  3. Identity
    • Enrolment in an identity management system, so that the device is recognised as a corporate device, and the user logging on is recognised as a corporate user
    • Either on-premises Active Directory, or cloud Entra ID, or a hybrid of both.
  4. Applications
    • The installation of corporate applications by an agent on the device
    • The agent can go on to patch and replace applications during the life of the image
  5. Configurations
    • The configuration of different settings on the device
    • Everything from BitLocker to certificate authorities to Kerberos to application configurations (if the application is designed to store settings in the registry)
    • Ultimately these are done by settings in the registry, or by running a script or executable
    • The available settings are defined either in XML-formatted templates (ADMX) or Configuration Service Providers (CSPs)
    • Different device management tools generally provide an interface to set these configurations.

So the aim is to get from 1 to 5 as efficiently as possible. What are the obstacles?

Step 2 is an expensive step. The OEM device has to be unboxed, re-imaged, and re-boxed for delivery. If you re-image with a “thin” image (no applications), then there has to be time to install all the applications later. If you re-image with a “fat” image (all the applications) then by the time it gets to the end user there is a good chance that some of them need to be updated. If you re-image well in advance, the device will need Windows updates and possibly even a full feature update e.g. from Windows 11 23H2 to 24H2.

Step 3 is a complicated dance that has to be carefully controlled. The process has to ensure that only the devices you want to enrol can enrol (e.g. not personal devices); and that all the devices you want to enrol are enrolled (i.e. not bypassed).

Steps 4 and 5 are really about timings. You don’t want to deliver a device to a member of staff until it is ready to use; but you also don’t want them sitting idle watching a progress bar.

Up until perhaps 2018-2020, this process was performed with on-premises infrastructure, SCCM being perhaps the most common but with many alternatives.

Autopilot

Windows Autopilot, introduced in 2017, changed this model in a really quite radical way. What it did was to make every single new Windows desktop OS in the entire world check in first with the Autopilot cloud service to see whether it is a corporate device. It is worth having a look at the Autopilot flowchart. If we simplify it a little, we have two flows:

  1. Start, and connect to the Internet; get the unique hardware ID of the device; then check with the cloud Autopilot registration service whether the device is registered; if it is, get an Autopilot profile for setting up the device.
  2. Follow the Autopilot profile to enrol the device in Entra ID and in Intune; then use Intune apps and device configuration profiles to set up the device.

Autopilot also has a secondary flow, to set the device up in a technician phase first; and then, if the device is assigned to a user and not used as a kiosk-type device, perform a second phase depending on the user account. This technician phase is equivalent to the re-imaging phase in Step 2 above.

Autopilot changes the way devices are deployed because, using the first workflow (“user-driven mode”), you can send the OEM-supplied device direct to the end-user. The device will always check first whether it is registered in Autopilot, and then set itself up accordingly. Or, using the secondary workflow, you can part set it up, then deliver it to the end user to complete. Being a cloud service, it also appealed to organisations that wanted to reduce their on-premises services.

There were two main problem with this. The first is that the process has been (in my experience) unreliable. The second is that, unless you insert the additional step of pre-provisioning the device, the setup is inherently a slow experience for the end user.

For unreliable, see my previous blog: Autopilot and Intune faults. Just to be absolutely clear, these are not configuration errors. They are backend faults in the service causing unpredictable and unsolvable failures in the status monitored by the Enrollment Status Page (ESP). It is hard to put a number on this, but I would say it was perhaps 2-5% of all deployments. That might not sound a lot, but lets take two scenarios:

  • The device is sent to a member of staff working from home; it fails to build correctly; they are stuck at “setup failed”. What happens now? They are at home. The support staff can’t see the screen.
  • You are helping a group of people migrate to their replacement device, in a large migration. One or two fail at Account Setup. The staff can’t leave, because the device doesn’t work. Do you try a reset and start again? Do you give them another device and start again? Do you try to fix it?

For slow, a user-driven deployment might take perhaps 20-30 minutes on an excellent network, depending mainly on the number of applications to install and uninstall. If you are on a poor network, say at home, then it might be a lot longer. For the end user, this is time spent simply waiting. If they go away to do something else, they will not come back and find it done, because it will be waiting at a logon prompt before starting the Account Setup phase.

In contrast, I would say that an on-premises deployment should be 99.9% successful and fast. The device is almost fully built by a technician, before being handed over. I really cannot remember any significant level of faults, once the deployment is fully tested and piloted, so the user phase is short and reliable. Of course, it requires double handling, as in Step 2. But the double-handling is the same in the case of pre-provisioning.

Autopilot Device Preparation

Autopilot v2.0 was introduced this year, 2024. It is a radical rethink of the original version. There are four main changes in the architecture. The question is: will they make the process more reliable, and faster?

  1. There is no pre-registration of the device hardware ID
  2. There is, as yet, no pre-provisioning or unassigned-user mode (called “self-deploying”)
  3. The ESP is, by default, replaced by an extension of the Out of Box Experience (OOBE)
  4. A list of priority apps and scripts is used instead of a list of blocking apps.

No pre-registration

Instead of using a global registration service, it works as follows:

  • The end-user account is added to a security group
  • When the account signs in on any unmanaged device, that device is automatically enrolled in Entra ID and Intune
  • Intune assigns a Device Preparation profile to members of the user group
  • The profile adds the device to a device group
  • That device group is used to install applications and configure the device.

This is a big change in architecture. The hardware ID used for device registration is a bit like the boot configuration used to prevent Windows licence fraud. It is explained in this blog: Autopilot hardware hash. Registration ensures that only a registered device can enrol, and that all registered devices are enrolled.

Registration was not difficult, for a large organisation. Either the vendor registered all new devices, or a script could be run to extract them from all existing devices. It is another step, but not a big one. I think removal of this step is more of an advantage for small organisations, where extracting the hash could be quite difficult.

If the step was not needed, why was it there, and what are the consequences of removing it? It seems to be a balance of risk. Autopilot v2.0 now allows an option of providing a simpler corporate device identifier. This could be used to prevent enrolment of unauthorised devices. But, for Windows, the identifier still has to be pre-loaded as a CSV of the device manufacturer, model and serial number. It is just slightly easier, since it does not require access to the device to obtain it.

No pre-provisioning

Autopilot v2.0 currently only supports the primary, user-driven, workflow. Microsoft says in the FAQs that: “The pre-provisioning mode and self-deploying mode scenarios will be supported in the future, but aren’t part of the initial release.”

Pre-provisioning is a fundamentally different workflow from user-driven.

  • At this stage, we usually don’t know what user will receive the device. It is a bulk preparation process, similar to re-imaging
  • Because there is no user account, and so no authentication or authorization, it uses a different process to validate the identity of the device
  • This process requires the “attestation”, or proving the identity of, the Trusted Platform Module (TPM), the unique hardware component incorporated in the device by the vendor. This proves that the device is the one registered in Autopilot, and not an imposter.

Since there is no pre-registration in Autopilot v2.0, it will not be possible to attest that the device is the one registered. We will have to wait and see how Microsoft solves this. But, without it, we lose the ability to cut the end-user wait time in half. An alternative is to re-image the device with standard applications installed, before handing over the device for a deployment with Autopilot v2.0.

No Enrollment Status Page

The ESP controls the flow of Autopilot v1.0. A failure at any step causes Autopilot to end with an error. Depending on the profile, the user can then either reset the device and start again, or continue anyway to a part-completed desktop.

As I have described elsewhere, the failures are sometimes caused by faults in configuration, but often by unknowable back end failures in the cloud services. Microsoft even recommended at one stage to not use the ESP to track progress.

In Autopilot v2.0, the ESP is optionally replaced by a simple progress indicator during the OOBE dialogue. I think this is easier for a user to understand. The percentage progress, however, is not the actual progress through the work. It is the percentage of the total time allowed before the process times out, default 60 minutes.

Autopilot v1.0 ESP

Autopilot v2.0 OOBE

The ESP itself is not the cause of failures. However, other failures in the process cause it to terminate Autopilot, even if those failures are not fatal to the deployment.

List of reference apps

The change with the biggest impact on the end-user experience is the list of “reference apps”. It is an ambiguous term. It means the apps to install during deployment. All other apps are installed after the setup is complete.

Autopilot v1.0 has the concept of “blocking apps” in the ESP. These hold the deployment until they are installed, and raise an error if they fail. The choice is none, selected apps, or all. If you configure All, then the deployment will take longer than if you configure Selected (although, if you pre-provision, this time does not matter). However, if you configure Selected, other apps are not installed. Instead, they will wait perhaps an hour for the next sync cycle. This may or may not be acceptable. In my experience it is not.

Autopilot v2.0 replaces this with “apps you want to reference with this deployment”. These apps are installed during the deployment. Unlike v1.0, all other apps continue to install without a pause, but without holding the setup. This gives a better user experience than v1.0, because it might only be another 5 -10 minutes before all the required apps are installed. With this design, I think it is reasonable to finish setup with a subset of apps, for example Office 365, Company Portal and a VPN or zero-trust network client, and perhaps any configuring “apps” (scripted configurations deployed as custom win32 apps). There is always more to do in the minutes immediately after setup is complete, and while the other apps are installing.

You might say this is making the best of a bad job, because with no pre-provisioning the only alternative is to hold the setup until all the apps are installed. But I think it is actually a realistic alternative to pre-provisioning. It really depends on whether you can package everything up to deliver an acceptable desktop in an acceptable amount of time.

The list of reference apps also cuts out a minute or more spent enumerating apps in Autopilot v1.0. Instead, Autopilot v2.0 only installs apps from the single device group specified in the profile. Microsoft calls this Enrollment Time Grouping.

Summary

I am optimistic that Autopilot Device Preparation (Autopilot v2.0) will be more reliable than v1.0, because the process has been simplified: in particular with no ESP and so fewer reasons for failure.

It is not faster. You might expect it to be, and Microsoft claims it to be (because of Enrollment Time Grouping). But my tests do not bear this out. It takes a given amount of time to download and install apps, and this does not change. It is possible there is less time spent enumerating the apps to install, but this does not translate into a shorter time overall.

However, the new ability to specify the must-have apps, and for installation of other apps to continue outside of the setup window, gives an opportunity to cut the waiting time for an end user.

The lack of a pre-provisioning mode means that you cannot take the theoretically fastest route (barring failures) of preparing the device with all apps before giving it to the end user. It might be unfashionable, but this means there is a rationale for re-imaging the device with a conventional on-premises technology before shipping it to the end user to complete with Autopilot v2.0.

AD Remediation: Tiered Administration

This is one of a series of posts about fixing problems that have accumulated in an old instance of Active Directory (AD). In this case, it is about introducing tiered administration into a Windows and Active Directory environment.

I have been in two minds about this post. Organisations have been moving away from AD and on-premises Windows servers, towards Entra ID (formerly Azure Active Directory) and cloud-based services, for a long time. The idea of tiered administration of AD came in around 2014. If organisations were going to introduce it, they should have done it by now. But some organisations may not. The larger, more complex and older the directory, the more difficult it is to do. I worked on this recently for a large organisation, and I was unable to find good a good description of the approach online. so I thought it might be useful after all to share this. Please leave a comment if you have any suggestions or questions from your own experience.

This is not a post with how-to screenshots. There are plenty of those elsewhere. It is a description of what needs to be done in practice, and some of the obstacles, together with how to overcome them. I also hope to pick a way through some of the Microsoft documentation on this. There is no single guide, that I know of, for how to do it.

It is also not a post on the general topic of hardening Windows or AD, or securing privileged accounts. There are plenty of those. It is specifically about tiered administration only.

Background

First, here is a bit of background. We need this to understand what tiered administration in AD is trying to achieve.

Tiered Administration is one of those “good things”, like Least Privilege and Separation of Duties. The National Cyber Security Centre (NCSC) describes it here: Secure System Administration. The idea is quite simple. Different accounts should be used to administer different layers of services differentiated by their criticality. For example, you should use a different account to administer the finance system than to run diagnostics (with local admin rights) on an end-user laptop. If the account you use for the laptop is compromised, it will not affect the finance system.

For Windows administration, the idea really took shape when Mimikatz blew a large hole in Windows security. In about 2011, Benjamin Delpy, published open source code to obtain credentials from a running Windows device. Using Mimikatz, any administrator could obtain the credentials of any other account logged on to the device, and use it to leapfrog onto any other device where that account had access, and so on. This meant that an attack could travel from any compromised device, including just a regular workstation, across and up to potentially any other device, including a domain controller. From there, they could simply destroy the entire environment.

This was a fundamental risk to the Windows operating system, and Microsoft responded with a slew of technologies and guidance to mitigate it. In 2012, the Microsoft Trustworthy Computing initiative published Mitigating Pass-the-Hash (PtH) Attacks and Other Credential Theft Techniques, followed by a Version 2 in 2014. In Windows 2012 R2, released in 2013, they introduced several technologies to mitigate the risk, including the Protected Users security group, Authentication Policies and Authentication Policy Silos, and Restricted Admin mode. To be fair, these built on a history of strengthening Windows security, for example with User Account Control (UAC) in Windows Vista and Server 2008.

Tiered administration is in Section Three of Version 2 of the Mitigation document referenced above: specifically in the section “Protect against known and unknown threats”. The technical implementation is described in Mitigation 1: Restrict and protect high-privileged domain accounts.

There is no technical fix for credentials theft in an on-premises Windows environment. It is not a bug or a loophole. It is intrinsic to Windows AD authentication with Kerberos and NTLM. Mitigation of the risk requires a range of large and small technical changes, as well as significant operational changes. Tiered administration is both, and it is only part of a plan to tighten up security. If you think you can do it with a few technical changes, and quickly, you are badly mistaken.

Documentation

It would not be useful to list all the things you need to do to protect privileged accounts in AD, but this is some of the key Microsoft documentation on legacy tiered administration. I use the documentation not just to read about a topic, but to provide an audit trail for compliance:

  1. Mitigation for pass-the-hash (referenced above)
  2. Best practices for Securing Active Directory. This is an excellent and extremely important document. Although it does not describe tiered administration specifically, you need to include all of the recommendations in your implementation: in particular, Appendices D, E, F and G. This document also describes in detail the Group Policy Objects (GPOs) to restrict logon across tiers, but it applies them only to the built-in and default domain groups, and not to your custom groups of tiered accounts.
  3. Unfortunately, I don’t think you will find a comprehensive Microsoft document on implementing tiered administration in AD. The guidance has been updated for modern authentication and cloud services, in the Enterprise Access Model. The legacy model referred to is the one described in the Mitigation document of 2014.
  4. Legacy privileged access guidance. This document covers the implementation of a Privileged Access Workstation (PAW). It is not a reference for tiered administration, but it does describe the GPOs that restrict administrators from logging on to lower tier hosts. It is important to recognise that the purpose of this document is to describe the implementation of a PAW, not tiering as a whole, and it uses only a simplified model of tiering.
  5. Administrative tools and logon types. This explains the different logon types and their vulnerability to credentials theft. These are the logons that will be denied by User Rights Assignment settings in the GPOs.

In the Microsoft legacy model, a tier represents a level of privilege in the domain. A Tier 0 account is one with the highest level of privileges over the whole domain. A Tier 1 account has high privileges over important business services and data. A Tier 2 account has high privileges over individual (e.g. end-user) services and data.

These documents are useful if you want an audit trail to show you have implemented the protections rigorously. As a CISO, for example, you might want to check that all the controls are implemented, or, if not, that the risk is identified and accepted.

You will find a lot of detailed and up-to-date (mostly) documentation on individual technical topics, especially for Tier 0 and PAW. This is not one of them. This aims to give a more rounded picture of both the technical and operational practicalities of implementing tiered administration in AD.

Logon restrictions

The basic control in tiered administration for Windows is to prevent an account in one tier from logging on to any Windows computer that is administered by an account in a lower tier. The purpose is to avoid the risk of exposing the credentials of the more privileged account.

These are the technical steps I have followed to implement the logon restrictions. The Microsoft legacy model uses three tiers, but there is nothing magic about that. It is just the number of tiers in their documentation. The reason, I think, is the traditional split between first, second and third line support; or end-user, server and domain engineers.

Here I have used User Rights Assignment settings in GPOs. You can also use Authentication Policies and Authentication Policy Silos. Those are discussed later in this post.

  1. Create three GPOs, one for each tier of computers: Domain Controllers and other Tier 0 servers; member servers; end-user workstations.
  2. List the groups you will use for your tiered administration accounts, one for each tier.
  3. List parallel groups for service accounts. This is because service accounts will separately be denied interactive logon rights to their own tier. This is not, strictly, part of tiering and so not covered further here.
  4. Create a spreadsheet to document the logon rights to be denied. Use three worksheets, one for each tier.
  5. In the first column, list the five logon rights to be denied. You can find this list in several of the documents I have referenced above. They are:
    • Deny access to this computer from the network
    • Deny log on as a batch job
    • Deny log on as a service
    • Deny log on locally
    • Deny log on through Remote Desktop Services.
  6. Across the top, create column headings for each of the accounts and groups to be restricted. These are:
    • Each of the built-in and default privileged accounts and groups listed in the Best Practices for Securing Active Directory guide, Appendices D to G. These are domain and local Administrator, domain Administrators, Domain Admins, and Enterprise Admins.
    • Your custom groups of tiered accounts: Tiers 0, 1 and 2.
  7. Follow Appendices D to G to document the logon restrictions for those accounts and groups. For example, in Appendix D, the built-in domain Administrator account has four logon restrictions.
  8. For your custom tiered administration accounts, implement all five logon restrictions according to tier, i.e. Tier 0 accounts are denied on the Tier 1 and Tier 2 worksheets; Tier 1 accounts are denied on the Tier 2 worksheet only.
  9. Finally (!) create the GPOs with the settings in the spreadsheet. Link them to the OUs with domain controllers and other Tier 0 servers; member servers; and workstations. Since this would be a “big bang” implementation, you might first apply the GPOs only to a sub-set of the computers.
  10. Test. The Microsoft Best Practices guide give a screenshot-level description of validating the controls, which is useful when preparing a test plan.

I have found different versions of these GPOs in different blogs, especially for the custom groups in Step 9 above. So, which is definitive? There are a few points to note:

  • For the custom groups of administrators, the five logon restrictions are the same five as those given for Domain Admins in the Best Practices guide
  • They are also the same given for “Domain admins (tier 0)” and “Server administrators (tier 1)” in the original v.2 Pass-the Hash document, referenced above, although the guidance is not as precise.
  • The Domain Admins group is the one added automatically to the local Administrators group when a computer joins the domain. It is logical to follow the same template for other administrators.
  • You do not need to deny logons upwards, to implement tiered administration e.g. deny logon for Tier 2 accounts on member servers or domain controllers. Lower tier accounts are not put at risk by logging on to a device administered by a higher tier.

You may also notice that the logon restrictions include Remote Desktop Services. This is because the normal remote desktop protocol (RDP) passes credentials to the target computer, where they could be captured. Restricted Admin mode of RDP does not pass the credentials. Instead, it authenticates the account on the source computer. So, if you enforce Restricted Admin, you do not need to deny log on over Remote Desktop Services.

There are a few obstacles to this, not insuperable:

  • Restricted Admin needs to be enabled on the target but, separately, required on the source. This means that, to enforce it by GPO, you need to know what the source will be.
  • It does not delegate credentials onwards. So, if you connect to a remote server, and then in the session connect to a file share or another server, you are not authenticated.

This is just the technical part of implementing logon restrictions in a tiered administration model for AD. It is a lot of detail, but it is not difficult.

Delegation

The next step is that you must match this with controls of delegation in the directory. Why does that matter? Because if someone has control of the objects in the directory, they can change what restrictions are applied. They might be able to change the GPO, or move a computer between OUs, or reset the credentials of an account in a higher tier. I have found no Microsoft documentation relating to delegation with tiered accounts. For tidying up existing delegations, see my separate post on AD Remediation: Delegation.

The first step is to ensure that all administrative accounts and groups go into a separate OU for admin resources only, where the normal delegations do not apply. This also means you must not have delegations in the root of the domain (e.g. Full Control of all Computer Objects), unless you also have Denies or Block inheritance, which you should avoid.

In a separate OU, the only default permissions will be for domain administrators. Then, you can pick your way slowly to allowing some very limited delegations of control over these accounts and groups. One thing to remember is that accounts in the custom Tier 0 group of administrators do not need also to be domain administrators. You can put an account in that group, and apply logon restrictions, without the account actually being a highly privileged account in terms of control of AD. It just means that the credentials are less likely to be compromised by logging on to lower tier computers.

This is a very confusing point. The allocation of tiered accounts is not primarily about who you trust. You should grant privileges (based on the Least Privilege idea) according to the skills and experience of the individual. But, in terms of threats, you should assume that any account can be compromised. The point of tiered administration is not to control who does what. It is to prevent the escalation from an easily compromised computer (like a workstation used to browse the internet) to a highly critical one (like a domain controller). So, you might allow a service provider administrator to add accounts to admin groups, or reset their administrators’ passwords, but only using a Tier 0 account, and one that is not a domain administrator. Likewise you could have Tier 1 accounts that do not administer servers, but have delegated control over Tier 2 accounts.

You need to be very careful that accounts of one tier do not go into groups that have control over objects in a higher tier. There is no automated way to control this. Accounts in a higher tier can control objects in a lower tier, but not vice versa.

Permissions, including delegated permissions in AD, are not inherently tiered according to logon restrictions. For example, clearly, you may have permissions for a file share that allow a wide range of admin accounts to add, change and delete files. My approach is to create separate sub-OUs for tiered and non-tiered groups of administrator accounts. That way, it is clear to administrators whether a group should have admins of only one tier or not.

Migration

To migrate, you will need to give every administrator one or more tiered accounts. These are the accounts that are in the tiered groups used in the User Rights Assignment GPOs. These are assigned according to the roles people perform, obviously.

The accounts need to be in the right delegation groups, depending on the admin role. For example, a Tier 1 account might be in the delegation group to create and delete computer objects in the member servers OU. A Tier 2 account might be in the delegation group to create and delete computer objects in the workstations OU.

For all other group membership, you will need to a) take the groups that the existing account is a member of, then b) work out which ones each tiered account needs to be part of. This might be a knotty operational problem. If your groups are well-organised already, then it might be easy. However, if your groups are chaotic (see my other post on AD Remediation: Obsolete Objects) then it will be more difficult.

To do this, you need to classify the groups according to the criticality of the data to which they give control. This is the enterprise access model in full. You have to consider, not what you want the person to access, but what any account of that tier might access, if compromised. The credentials in one tier are vulnerable to being captured by any account in that tier. If if it would be an unacceptable risk for all accounts in a tier to access a resource, then no account in that tier should have access.

Although you are blocking logon down-tier by accounts you trust, the objective is to prevent control flowing up-tier by accounts that are compromised. Administrative tiers correspond to the relative value of the organisation’s data and systems. End-user data and systems are controlled by all admins. Business data and systems are controlled by Tier 0 and Tier 1 admins. Critical data and systems are controlled only by Tier 0 admins. So, if you do not want a Tier 2 account to control a type of data or system, they should not be in any groups that allow them to do it. Even if you trust the administrator personally, they should use a higher tier of account to do it.

You will also need to create or modify GPOs to make the new tiered admin groups a member of the appropriate local Administrators group on servers or workstations. Logically this can be a subset of the admin group. Not all Tier 1 admins need to be able to log on to all members servers, or even to any member server. It is the same with Tier 2.

All service accounts must be assigned to log on to one tier and one tier only. For some services this might be a significant change, and it might require splitting services into two or even three instances. For example, if a service has administrative rights on domain controllers (which should be few if any), the service account cannot also have logon rights on member servers; and likewise for member servers and workstations. Examples of potential cross-tier services are anti-malware, auditing and logging, device management and inventory services.

The opportunity should be taken to understand exactly what rights a service account needs. It is quite common to make a service account a member of the local Administrators group when it doesn’t need to be. If this has not been done in the past, it will be a lot of work to retrofit, but necessary. Also, of course, a regular service account should be changes to a Managed or Group Managed Service Account if possible.

Other important considerations

This section covers a few other aspects of tiered administration in an on-premises Windows environment.

Authentication Policies and Authentication Policy Silos

Authentication Policies and Authentication Policy Silos were introduced in Windows 2012 R2. They provide one of the mitigations for the pass-the-hash and pass-the ticket vulnerabilities, by applying limited conditions to a Kerberos authentication.

You could use these in some cases, in addition to User Rights Assignment. The reason I have used GPOs in this post is because:

  • Authentication policies cannot apply to the built-in domain Administrator account.
  • Authentication policies are applied to accounts, not groups. They cannot be applied to the built-in and default groups in a domain, for example to the Domain Admins group.
  • So, to meet the recommendations in Appendices D to G (referenced above), we still need to use GPOs.
  • If you have the GPOs, it is an easy step to add the custom tiered admin and service account groups.

Trusted devices

To protect credentials, every administrative logon needs to be on a trusted device, at every step. The NCSC describes this very well in Secure system administration. This includes the original device, as well as any intermediary.

This is quite difficult and expensive to do. For example, if you have a third party service provider, will you provide each member of their staff with a dedicated laptop? Will your admin staff carry around two or three laptops? Or you may provide a hardened jump server: but what device will they use to connect to that? It is quite beyond the scope of this post to go into the different ways of achieving secure access, but it is important to accept that tiering is not complete without it.

Default security groups

AD has a long list of default security groups, some of which have elevated privileges in the domain. You should, obviously be careful about which accounts go in these groups. But there is a small class of groups that are “service administrators”, because they have a degree of control over domain controllers and therefore the whole domain. They don’t have full control, but they do have elevated control. They are:

  • Account Operators (recommended to be empty)
  • Backup Operators
  • Server Operators.

In my opinion, the members of this group should only be Tier 0 accounts, because they have a degree of control over the whole domain. But these Tier 0 accounts do not need to be a member of Administrators or Domain Admins. It does mean that the holder of the account also needs a Tier 0 PAW. You might also include these groups in your tiering GPOs, so that any account in them would be unable to log on to a lower tier.

Modern authentication

The problem that on-premises tiering of Windows administration is trying to solve is changed fundamentally by moving to cloud-based services. With authentication by Entra ID, we can use two or more factors (MFA), access based on conditions (Conditional Access), secure hardware to protect credentials (the Trusted Platform Module), and time-limited access (with Privileged Identity Management).

We all know this. The relevance here is that, if you bear in mind the complexity and uncertainty of implementing tiered administration on-premises, it may be more cost effective to move a large part of the problem to cloud-based services. If all your end-user devices use Windows Hello for Business, and Intune for device management, then you do not need a Tier 2 for on-premises administration at all. If you replace on-premises servers with cloud services then you also dispense with a lot of Tier 1. Even if you have a core of on-premises services that cannot be replaced, the problem is much reduced. It is far easier to manage a small number of administrators running a small number of on-premises servers than a large number.

Additionally, there is the observation that tiering can prevent a future breach, but not resolve an existing unknown one. Implementing tiering when you migrate to a new environment, with separate accounts for each environment, and clean devices created in the new environment, can do that.

Default Computers container

Computers, by default, are placed in the default Computers container when they join the domain. This container cannot have GPOs linked to it. This creates a risk that a computer in the container will be administered by accounts in different tiers. Your automated computer build processes should move computers automatically to the correct OU but, in any event, computers must not be allowed to remain here.

Conclusion

This is a large and important topic for on-premises Windows security, not easy to cover in one post. I think what I have described is a way to implement tiered administration for AD in practice, in a way that is compliant with Microsoft best practices and NCSC recommendations. Please make any suggestions or ask any questions in the comments below.

Windows Hello for Business and MFA

As an end-user computing specialist, I spend most of my time on security-related matters. Good cyber security is the most difficult part of the design to get right, with a balance between security and ease of use. It is quite easy to implement the standard security controls. What is more difficult is to deal with all the exceptions and operational difficulties in a secure way.

One small example of this is the configuration of Windows Hello for Business (WHB). WHB is an excellent authentication method but, like anything, it has potential flaws too.

Before WHB

Before WHB, a member of staff could typically log on to any corporate device. It had to be a corporate device, because only that would recognise the domain account. But it could be any corporate device. In fact, roaming profiles were designed to enable anyone to log on to any device.

There are two problems with this. First, because it relies only on a simple password, the password needs to be reasonably long and complex. This increases the risk that the user will write the password down. Where do they do this? They know they should not put it on a post-it note stuck to the computer. So they write it down in a notebook kept with the computer. If the computer is stolen with the notebook, the thief has access to the computer as that person.

The second problem is that, if someone gets hold of a password (for example by phishing), they only need to get hold of a device, any device, to gain access. There is no protection other than knowledge of the password combined with access to any device. An insider might easily obtain a password, and have access to another device to use it. Indeed, people might even voluntarily disclose their password, or arrange to have a password changed, so that another person can use it on another device (e.g. maternity leave).

With WHB

WHB counters these problems. It uses a one-time event to create an association between a specific user and a specific device. The one-time event uses a second authentication method to verify the identity of the user. When the identity is confirmed, a unique PIN is created, valid only for that device. The association is bound up in the Trusted Platform Module (TPM), a hardware component on the motherboard of the computer. When the PIN is supplied, it validates the association between user and device and unlocks the credentials to be used for access to network resources, for example the email account. The email service (e.g. Exchange) knows absolutely nothing about the PIN. It doesn’t even know there is a PIN. What it knows (through Conditional Access) is that the user supplied valid credentials from a managed device protected by a TPM.

We all have experience of something similar, when we create a PIN for a mobile phone. And, just like a phone, facial recognition or fingerprint can be used with WHB as a proxy for the PIN. The difference is that, with the personal phone, there was no separate verification of the identity at the outset. The person with the PIN is just the person who set up the phone.

Two flaws

There are two flaws with this authentication method. The first is in the one-time event; the second is in the way WHB is configured.

For the first, you need to know that the person setting up WHB is who they say they are. That might be quite obvious if they come into an office to set it up. But if you send out devices to be set up at home, you don’t have an assurance that the device gets to the right person. There has to have been a secure association created in the first place, between the user and the method they use to verify their identity.

The way I think of the verification of identity, or multi-factor authentication (MFA), is that it is like showing your photo ID to pick up a building pass. You need to come into the building, where people can see you, and you need to supply a proof of identity. Then you pick up the pass, and the pass in future lets you into the building. But that depends on having a valid proof of identity in the first place. The second method (building pass) is piggy-backing on the first method (photo ID).

When setting up WHB for the first time, staff typically use the Microsoft Authenticator app on their mobile phone. But setting up the Authenticator app does not prove your identity. It only proves that you know the password. So there is a circular logic if you set up the Authenticator app at the same time as setting up WHB. The steps in this circular logic are:

  1. User starts to set up WHB on a device, by supplying a password
  2. If the account does not already have a second factor method associated with it, then the user is prompted to set it up
  3. User downloads Microsoft Authenticator app on phone
  4. User receives prompt on phone to validate their identity
  5. User sets up PIN associated with that identity.

At no time did the user prove their identity other than by supplying the password of the account. WHB does not know who owns the phone. In the future, any prompt for MFA will prove that it is the same person who set up the MFA; but not who that person really is. So the second factor (Microsoft Authenticator app on a mobile phone) must be set up in a secure way that validates the identity of the person setting it up.

This is actually quite difficult to do. When an account is first created, it does not have a second authentication factor associated with it, only a password. A vulnerability exists until the second is set up securely and verifiably by the owner of the account.

The physical way to do this is to set up the second factor for the account as a one-time event similar to obtaining a building pass. The member of staff comes into the office. Someone validates their identity and enables the registration of the phone as a second factor. Any pre-existing registration is deleted. Then the member of staff receives the device and sets up WHB. The logical way to do this is with a Conditional Access policy. The policy can require specific conditions to allow the user to register security information. For example, it can require this to be done from the corporate LAN. Now the steps in this logic are:

  1. User enters the building, where their identity is verified
  2. User proceeds, as before, to set up device with WHB, but this time the second factor is a phone locked to a verified identity.

The second flaw is that the configuration of WHB enables it. It does not enforce it. The user still has the option to sign in with a password. This means that anyone can sign in with only a password and gain full access to the device and the data of the user of that account. This was the problem WHB is designed to solve. How did that happen?? The user will be nagged to set up WHB, but they don’t have to.

The way to prevent this is to configure Conditional Access policies to require multi-factor authentication for every access, even on managed devices. You might say that is absurd. Surely the possession of a managed device is the second factor. You have the password, and you have the device. But the critical point is that the WHB PIN (not password) is what proves ownership of the device. When using the PIN, the user does not need to respond to an MFA prompt when they log on. Supplying the PIN counts as performing MFA, because it was created with MFA. The MFA is valid (by default) for 90 days and, every time you supply the PIN, you revalidate and extend the MFA.

This is just one example of what I mean about striking the right balance between security and ease of use. It is easy to enable WHB, but it takes a few extra steps to make sure it is associated with a verified identity.

AppLocker or WDAC?

This is a short piece on the question of whether to use AppLocker or Windows Defender Application Control (WDAC) for application control on a Windows desktop. As technicians, we can sometimes get too interested in what technology is best, or what is newest. But the more important matter is what best meets the requirement.

WDAC is the newer technology, and a significant advance on AppLocker. You can read about the differences here: Overview. So, in a Microsoft environment (Windows 10/11 desktop, 365 Apps, Intune, SharePoint etc.) we should assume we would use WDAC unless there are reasons not to. What could those reasons be?

Cyber security is important, of course. But it needs to be a part of a productive work environment. The most secure desktop is one that cannot be used. And it needs to be part of a holistic approach. For example, if we do not allow a user to have local administrator privileges on a device, the exposure to malware is much lower than if we do. If we require MFA to log on to a device, the risk of a malicious user is much lower than if we do not.

In my view, application control should be transparent to the user. Software that is legitimate should just run. Software that is illegitimate should not run, with a message about the reason. If a new piece of software is introduced, it should either just run, or not run. There should not be a long delay while IT staff rejig the rules to allow it to run. An example would be a piece of finance software. Let’s say we are coming up for year-end, and the finance team have an update to one of the applications they use. They should be able to install it, and it should run. It should not take a month to develop and test application control rules.

AppLocker is much easier and less risky to update than WDAC. AppLocker XML files are simple text files that you can edit manually. WDAC XML files are also text files, but it is not practical to edit them manually. AppLocker uses the Subject Name of a certificate to identify a signed file. It is the same subject name regardless of the certificate used to sign. WDAC uses the thumbprint. The same name might be used in multiple different certificates with different thumbprints. A mistake in an AppLocker policy might cause some processes not to run. A mistake in a WDAC policy might cause Windows not to boot. If it cannot boot, the only solution is to re-image the device. Imagine doing that for 30 or 50,000 devices!

I think the right approach is to use WDAC, but with a process in place to make it relatively quick and safe to update. What is this approach?

  1. Use file path rules so that most administratively installed applications are allowed anyway
  2. Use “snippets” to extend the existing policies (snippets are policies created from a single application, and merged with the main policy)
  3. Use Supplemental policies for discrete areas of the business e.g. finance, or Assistive Technology, applications
  4. Use the WDAC Wizard for creating the base policy and applying updates
  5. Maintain a strict workflow for testing and deploying a policy update.

Let’s say you have a new application and it is blocked by current WDAC policy. There are several ways you could update the policy:

  • Scan the whole device and create a new policy. But this creates a significant risk of introducing new faults.
  • Read the event log or the Microsoft Defender audit of AppControl events to create rules for what was blocked. But this will only catch the first file that was blocked, not subsequent files that would have been blocked if that file had been allowed.
  • Scan the application itself, to create a policy that allows just that one application, then add this to the existing policy.

My preferred workflow is this:

  • Understand where the application saves all files including temp files and installation files
  • Copy all of them to a temp folder
  • Look to see whether the exe and dll files are signed or not. If they are, you will be able to use a Publisher rule. If they are not, see if you can install to a different location. For example, quite a few applications will allow a per-user or a per-machine install. Always use a per-machine install if you can, into a folder requiring admin rights. If you cannot, then you are going to have to use a hash, although this means any update of the file will be invalid.
  • Scan that temp folder to create a snippet
  • Merge the snippet into the base, or create a supplemental policy
  • Apply to a selection of test devices and make sure they still boot!

You need to keep a strict version control of policy versions and snippets. To achieve this, you should update the policy ID. Policies have several identifiers. The file name itself is irrelevant. When you import it into Windows, it will be generated with a name that is the policy GUID. The “Name” and “Id” (visible in the policy) are also just labels. The “BasePolicyID” and “PolicyID” are the two GUIDs that Windows uses to identify the policy. When you merge two policies, or merge a policy and a snippet, these GUIDs are not changed. You will see in the Event Log that Windows considers it to be the same policy. So, to keep track of which policy version is actually applied, you really want to update the GUID. You can do this in PowerShell with Set-CIPolicyIdInfo.

If you follow this approach, WDAC will work like a charm!

Autopilot and Intune Faults

“When sorrows come, they come not single spies, but in battalions.” We are deploying thousands of devices with Autopilot and Intune, and the service faults come in battalions.

We have been tracking these faults for a while. There are two types:

  1. Microsoft identifies a fault with a service announcement
  2. We raise a ticket, there is no cause found for the fault. No service announcement.

In mid-May, account setup failed to complete on pre-provisioned devices. The setup just hung. No cause found.

There was a service incident at the same time (now rolled over in the logs). Users unable to use Autopilot. Different problem, but possibly related.

Application failed to unzip after downloading. No cause found.

Application failed to download from Intune, with “endpoint failed to respond.” No cause found.

Late June, Autopilot failed at the beginning, before entering ESP. Error is 80072ee2. DNS query failed for “enterpriseregistration.microsoft.com”. Network timeout trying to register the device at DRS. No cause found.

From 21 June to 7 July, incident IT396955 “Users’ devices may have incorrectly appeared as non-compliant after Autopilot pre-provisioning in Microsoft Intune”. We don’t allow non-compliant devices to connect, so this caused a complete failure. Root cause: “A recent fix for an unrelated issue.” Although the incident dates from 21 June, it was only identified as an incident on 4 July.

On 21 July, incident IT402961 “Users and admins may have been unable to access the Microsoft Intune service or see limited functionality.” Root cause: “a network gateway outage.”

The facts show that the Autopilot service, with Intune, is fundamentally unreliable. If it were Intune alone, users would experience a failure of policy updates, or application deployments. But, during Autopilot, the result is a failed deployment.

At present, I recommend not using Autopilot to deploy devices, for the next year or so. It is too unreliable. My guess is that an internal service agreement has the wrong incentives.

Intune, WDAC and Managed Installer

WDAC has an option (Option 13) to allow apps installed by a Managed Installer. This sounds great! Everything you install using your preferred installer would be allowed, without going to the trouble of creating rules. But there’s a snag. There is no Configuration Service Provider (CSP) to deliver this policy in Intune.

The Managed Installer option actually uses the same method to allow executables to run as the Intelligent Security Graph option (Option 14). When a file is authorised by one of these methods, an extended attribute is written to the file. You can see this attribute with the fsutil utility. The method is documented here: Automatically allow apps deployed by a managed installer with Windows Defender Application Control.

The documentation on Managed Installer is a little confusing. The main documentation shows a policy that allows the Intune Management Extension, as well as the SCCM extension.

<FilePublisherRule Id="55932f09-04b8-44ec-8e2d-3fc736500c56" Name="MICROSOFT.MANAGEMENT.SERVICES.INTUNEWINDOWSAGENT.EXE version 1.39.200.2 or greater in MICROSOFT® INTUNE™ from O=MICROSOFT CORPORATION, L=REDMOND, S=WASHINGTON, C=US" Description="" UserOrGroupSid="S-1-1-0" Action="Allow"> <Conditions> <FilePublisherCondition PublisherName="O=MICROSOFT CORPORATION, L=REDMOND, S=WASHINGTON, C=US" ProductName="*" BinaryName="MICROSOFT.MANAGEMENT.SERVICES.INTUNEWINDOWSAGENT.EXE"> <BinaryVersionRange LowSection="1.39.200.2" HighSection="*" /> </FilePublisherCondition> </Conditions> </FilePublisherRule>

So, looking at that, we would obviously be able to allow Intune apps in Intune, right? But we cannot. The reason is that the documentation also describes implementing this policy in a GPO. But in Intune we cannot use GPO’s and, instead, we use Configuration Service Providers (CSP). The Managed Installer option is implemented as an AppLocker policy, and the AppLocker CSP does not contain a section for the Managed Installer rule collection type.

Although we cannot implement this as an Intune policy (because there is no CSP), we could theoretically implement it another way. With a registry key, for example, even if there were no CSP to configure the registry key, we could simply add, change or delete it in script. With AppLocker policies, we can use PowerShell to create a policy from an XML file, using Set-AppLockerPolicy. So the solution is to deliver a custom AppLocker policy with PowerShell, to enable the Intune agent as a Managed Installer in WDAC.

There are three significant drawbacks:

  1. The effort and constraints in managing the policies manually through PowerShell. For example, there is no Remove cmdlet for a policy in PowerShell
  2. Managed Installer tags the installed files, but not automatic updates. To allow the updates, you would either have to reinstall, or apply rules to allow the updated files, which would defeat the purpose.

Autopilot Faults and Logs

This is a post about where to look to find the cause when Autopilot fails.

By “Autopilot”, I am referring to the whole process of deploying, enrolling and setting up a Windows device. The process really contains several distinct parts:

  • The Out of Box Experience (OOBE) like selecting language, region and keyboard
  • Enrolment in Intune and joining the Azure AD domain (or hybrid)
  • Implementing all the policies and apps assigned to the user or device by Intune
  • The Enrollment Status Page (ESP) which monitors and controls the process nearly from the beginning to the end.

But I am using the term “Autopilot” to refer to all these, for convenience.

We can distinguish two types of failure. One, when setting the process up and testing it to see if it works. Another, during deployments when everything is supposed to be working. This post is about the second. For the first, you can generally follow the guides for setting up Autopilot and ESP, and search the documentation if it is not working. For the second, you need a good understanding of how the process works, what happens when it goes wrong, and where to look to find the cause.

Here is the best end-to-end diagram of the process: Windows Autopilot deployment process. And here is the page that best describes what happens in each phase of EPS: Enrollment Status Page tracking information. It is worth studying these in detail.

Getting the logs for Autopilot is straightforward. From a command prompt, run:

“mdmdiagnosticstool.exe -area DeviceEnrollment;DeviceProvisioning;Autopilot;Tpm –cab C:\Temp\Autopilot.cab”.

You will need to run elevated to get the TPM diagnostics. You will also need to make sure that whoever runs the command is able to save in the location specified. If you are asking a standard user to run the diagnostics, you can use Settings > Accounts > Access work or school > Export your management log files.

Michael Niehaus has written scripts to provide a quick interpretation of the diagnostics logs: Get-AutopilotDiagnostics. Running this script against the cab file is the first place to start.

There are a few points to note about the diagnostics:

  • You need to run it as soon as possible after the problem occurs. Several of the event logs contained in the diagnostics collection roll over, so events will be lost if you only run it later.
  • If you are reproducing a reproducible fault, you can add other tools like netsh, by breaking in with Shift+Fn10 before the fault occurs.
  • The diagnostic logging only captures events relating to Autopilot and enrolment. If you want a wider selection of logs, you may want to run One Data Collector at the same time. It only takes a few minutes to run.
  • Running Get-AutopilotDiagnostics with the –Online parameter will fetch the application name to match the ID, saving you a lot of time in trawling logs to find the name of a failed app ID.

If the failure occurred before the ESP started, then this is Autopilot proper. The place to look is in the event log: Applications and Services logs > Microsoft > Windows > ModernDeployment-Diagnostics-Provider > Autopilot.

When the ESP starts, we can find the place that it fails in the registry. This is obtained from: HKLM\Software\Microsoft\Provisioning\AutopilotSettings. Each sub-category has a status of: notStarted; inProgress, succeeded; or failed. It would be great if these could be surfaced somewhere more accessible. As it is, they can be found in the diagnostics log file: MdmDiagReport_RegistryDump.reg.

For example, see this obscure fault: Error code 0x80180014 when re-enrolling using self-deployment or pre-provisioning mode. The documentation is not correct. Normally, when you do a “Fresh Start” or a “Wipe”, the object in Intune is soft-deleted. But, occasionally, this back-end process fails. As a result, when you do a reset, it will fail in Device Preparation, at the “Registering your device for mobile management” stage i.e. enrolling in Intune. The status is recorded at: HKLM\Software\Microsoft\Provisioning\AutopilotSettings\DevicePreparationStatus.Category.

This happens because the Intune object has not been deleted as it should be. The solution is to find the object and delete it manually. Knowing the stage it failed at enables you to investigate why this happened.

It can be useful to know exactly when the fault occurred. This helps us to correlate different logs. Bizarrely, the time is shown in the event log: Applications and Services logs > Microsoft > Windows > Shell-Core > Operational. Use the “Find” action to search for the word “failed”. This will show the CloudExperienceHost Web App Event with “Subcategory ID = DevicePreparation.MdmEnrollmentSubcategory; state = failed.” When you know the time, you can search other logs to see if anything distinctive happened at exactly that time.

As I said at the beginning, the purpose of this post is to describe how to investigate unexpected faults. Once you know where and when in the process the fault occurred, you can follow the trail to diagnose it. If you know the type of fault, you may well need to reproduce it with additional logging to find the cause.

Autopilot Faults and the Network

The list of network requirements for Intune and Autopilot is extensive. This post is about finding out if the client cannot connect to one or more of the required endpoints during Autopilot.

Microsoft publishes the list of required endpoints for Autopilot and Intune. There is no point in repeating the information here. The important points to note are:

  • The documents also contain references to secondary services. For example, it requires access to Windows Activation, Azure AD (for authentication), NTP time service and others
  • In an enterprise, HTTP and HTTPS traffic will be intercepted by a proxy. Proxy servers may inspect the traffic, and require authentication. All this traffic needs to bypass the proxy; or bypass inspection and authentication. This is not easy to do, and it is easy to make mistakes.
  • The lists often require URL’s and wildcards, for example *.microsoftaik.azure.net for TPM attestation.

As a result, you may need to troubleshoot things that should be working, but are not. You need to be able to find out what is blocked, or not connecting.

The simplest way, which everyone will be familiar with, is to use Wireshark. You can install Wireshark at the beginning of Autopilot, and use Alt+Tab to get back to it after the problem has occurred (see Autopilot and Shift+Fn10).

You may think you are looking for a dropped connection. But the place to look for that is on the firewall. In reality the packet capture will require a lot more work to analyse than that. Almost all the traffic is encrypted HTTPS. You cannot see the contents of the transactions. What you will see (roughly) is:

  • A DNS query to resolve one of the required endpoints
  • A DNS response resolving the query to an IP address corresponding to a traffic manager, load balancer, CDN
  • TCP SYN
  • TCP ACK
  • Client Hello
  • Server Hello, Certificate, Server Key Exchange, Server Hello Done
  • Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message
  • Application Data

Azure Traffic Manager is a DNS service that distributes client traffic to the closest available Azure endpoint. If the client makes a DNS query for (for example) fe2cr.update.microsoft.com, it will obtain the IP address of an endpoint corresponding to fe2cr.update.msft.com.trafficmanager.net.

This means you can’t expect to see traffic to and from the endpoints contained in the lists of endpoints. A query for account.live.com will result in traffic to and from l-0013.l-msedge.net.

If you use a display filter of “dns”, you will see all the queries made. You can check on the proxy that all those names bypass inspection and authentication. If they do not, you may expect to see:

  • No application data after a handshake
  • A TCP FIN followed by another attempt to connect with a new TCP SYN.

As the Autopilot traffic is nearly all HTTPS, you could also use Fiddler to capture the exchange between client and server. When you install Fiddler, you choose to accept the Fiddler certificate. This means that the traffic can be proxied and decrypted by Fiddler. As a result, you should be able to see the content of the transactions.

I recently had a case where Autopilot failed at the start of the Account Setup phase of Autopilot. As it happens, I could see in the Microsoft-Windows-Shell-Core/Operational event log that a “AAD user token request failed” but not why it failed. I wanted to do a packet capture during the time this request failed. But you cannot break into Autopilot during the account setup phase and, even if you could, it would be too late to start a capture. Ideally you want to start a capture at a stopped point in the process, so that you have time to set it up before continuing.

This was a case where the Network Shell utility (netsh.exe) can help. Netsh can record a trace file (*.etl) using different providers. Most importantly, it is not limited to packet capture. It records the activity of each provider specified in the command. Providers are bundled up in “scenarios” for different types of network problem. The “InternetClient” scenario, for example, contains the following providers that we may be interested in:

  • Web-Http
  • WinInet
  • WebIO
  • WinHttp
  • DNS-Client
  • TCPIP.

A unique advantage of Netsh is the “Persistent” parameter. This enables the trace to continue through a restart. By making it persistent, we can start a trace in the Device Setup phase, continue through a restart into the Account Setup phase, and then stop the trace when the fault has occurred.

A difficulty to overcome is that the trace needs to be started and stopped in the same security context. In Device Setup, the console user is DefaultUser0 with admin rights. But this account does not persist into Account Setup.

To get around this, we need to use PsExec. This is the SysInternals utility that enables me (among many other things) to run a command in the system context. If I start the trace in the system context, then I can do the same to stop it. So the full plan of attack is:

  • Shift+Fn10.
  • In the console, download PsExec or copy it from a USB stick.
  • Run: “psexec.exe –s –i cmd.exe”. This brings up a new console running in the System context.
  • In the new console, run: “netsh trace start capture=yes report=yes persistent=yes traceFile=[path to trace.etl] scenario=InternetClient”. Depending on the circumstances, you can use maxSize to limit the size of the trace file. By default, the logging is circular.
  • Continue through to when the fault occurs.
  • In this case, because Autopilot was failing in Account Setup, I needed to use the “Continue” option in the Enrollment Status Page (ESP) profile, so that I could finish and get back to a user session, to be able to run PsExec again and stop the trace with “Netsh trace stop”.

A significant disadvantage of Netsh is that the best way to read the ETL file is with Microsoft Message Analyzer. But this is now obsolete and no longer available to download. There are utilities to convert the ETL to pcap, so it can be read in Wireshark, but that loses the benefit of the network provider logging, like DNS-Client and WinInet. I keep a copy of Microsoft Message Analyzer just for this.

So, this post has given three tools and methods to investigate a network fault causing Autopilot to fail. You will need to adapt the methods to the particular circumstances of the case.