Modern Deployment 2025

Bulk deployment of end-user computing (EUC) devices is a fact of corporate life, and has been for at least 20 years. The vendors and products change, but the task remains essentially the same: how to distribute devices to staff with the desired applications and configurations.

This blog is about deploying Windows devices, and for the managers of the process rather than the technicians. Windows deployment is a mainstream topic with some excellent technical commentary from Michael Niehaus, Peter van der Woude, Rudy Ooms and others. There is rather less about the pros and cons of different methods.

Autopilot v1.0 provides a cloud service for Windows deployments, to replace on-premises re-imaging. But it can be unreliable, and is a slower experience for the end user unless you prepare (pre-provision) the device in advance.
Autopilot v2.0 (called Autopilot Device Preparation) is significantly simplified and so should be more reliable. Currently, it lacks a pre-provisioning mode, which restricts it to the slow experience. But this is mitigated by a new feature that allows you to select which apps are installed as a priority before the user reaches a working desktop. The more standard apps you have, the more of an advantage this is.
It may be unfashionable, in the age of cloud services, but an on-premises re-imaging service combined with Autopilot v2.0 will probably provide the most efficient result overall.

Deployment

The aim of deployment has been remarkably consistent. I can’t see that it has changed at all: take a device from the manufacturer, and convert it to one that is ready for work as efficiently as possible.

Image –>Re-image –>Identity –>Applications –>Configurations
  1. Image
    • The OS image deployed to the original equipment by the manufacturer (OEM)
    • Because manufacturers generally compete in consumer as well as business sectors, the OEM image tends to contain a variety of other applications: freeware, trialware and vendor utilities.
  2. Re-image
    • A custom image deployed in place of the OEM one
    • Either simply to remove all the vendor-added software
    • Or to do that as well as add corporate software, in order to speed up the overall process of delivering a fully ready device
    • Done by a wide variety of imaging tools: historically, Ghost; then Altiris, Windows Deployment Services (WDS), Microsoft Deployment Toolkit (MDT), FOG and many others.
  3. Identity
    • Enrolment in an identity management system, so that the device is recognised as a corporate device, and the user logging on is recognised as a corporate user
    • Either on-premises Active Directory, or cloud Entra ID, or a hybrid of both.
  4. Applications
    • The installation of corporate applications by an agent on the device
    • The agent can go on to patch and replace applications during the life of the image
  5. Configurations
    • The configuration of different settings on the device
    • Everything from BitLocker to certificate authorities to Kerberos to application configurations (if the application is designed to store settings in the registry)
    • Ultimately these are done by settings in the registry, or by running a script or executable
    • The available settings are defined either in XML-formatted templates (ADMX) or Configuration Service Providers (CSPs)
    • Different device management tools generally provide an interface to set these configurations.

So the aim is to get from 1 to 5 as efficiently as possible. What are the obstacles?

Step 2 is an expensive step. The OEM device has to be unboxed, re-imaged, and re-boxed for delivery. If you re-image with a “thin” image (no applications), then there has to be time to install all the applications later. If you re-image with a “fat” image (all the applications) then by the time it gets to the end user there is a good chance that some of them need to be updated. If you re-image well in advance, the device will need Windows updates and possibly even a full feature update e.g. from Windows 11 23H2 to 24H2.

Step 3 is a complicated dance that has to be carefully controlled. The process has to ensure that only the devices you want to enrol can enrol (e.g. not personal devices); and that all the devices you want to enrol are enrolled (i.e. not bypassed).

Steps 4 and 5 are really about timings. You don’t want to deliver a device to a member of staff until it is ready to use; but you also don’t want them sitting idle watching a progress bar.

Up until perhaps 2018-2020, this process was performed with on-premises infrastructure, SCCM being perhaps the most common but with many alternatives.

Autopilot

Windows Autopilot, introduced in 2017, changed this model in a really quite radical way. What it did was to make every single new Windows desktop OS in the entire world check in first with the Autopilot cloud service to see whether it is a corporate device. It is worth having a look at the Autopilot flowchart. If we simplify it a little, we have two flows:

  1. Start, and connect to the Internet; get the unique hardware ID of the device; then check with the cloud Autopilot registration service whether the device is registered; if it is, get an Autopilot profile for setting up the device.
  2. Follow the Autopilot profile to enrol the device in Entra ID and in Intune; then use Intune apps and device configuration profiles to set up the device.

Autopilot also has a secondary flow, to set the device up in a technician phase first; and then, if the device is assigned to a user and not used as a kiosk-type device, perform a second phase depending on the user account. This technician phase is equivalent to the re-imaging phase in Step 2 above.

Autopilot changes the way devices are deployed because, using the first workflow (“user-driven mode”), you can send the OEM-supplied device direct to the end-user. The device will always check first whether it is registered in Autopilot, and then set itself up accordingly. Or, using the secondary workflow, you can part set it up, then deliver it to the end user to complete. Being a cloud service, it also appealed to organisations that wanted to reduce their on-premises services.

There were two main problem with this. The first is that the process has been (in my experience) unreliable. The second is that, unless you insert the additional step of pre-provisioning the device, the setup is inherently a slow experience for the end user.

For unreliable, see my previous blog: Autopilot and Intune faults. Just to be absolutely clear, these are not configuration errors. They are backend faults in the service causing unpredictable and unsolvable failures in the status monitored by the Enrollment Status Page (ESP). It is hard to put a number on this, but I would say it was perhaps 2-5% of all deployments. That might not sound a lot, but lets take two scenarios:

  • The device is sent to a member of staff working from home; it fails to build correctly; they are stuck at “setup failed”. What happens now? They are at home. The support staff can’t see the screen.
  • You are helping a group of people migrate to their replacement device, in a large migration. One or two fail at Account Setup. The staff can’t leave, because the device doesn’t work. Do you try a reset and start again? Do you give them another device and start again? Do you try to fix it?

For slow, a user-driven deployment might take perhaps 20-30 minutes on an excellent network, depending mainly on the number of applications to install and uninstall. If you are on a poor network, say at home, then it might be a lot longer. For the end user, this is time spent simply waiting. If they go away to do something else, they will not come back and find it done, because it will be waiting at a logon prompt before starting the Account Setup phase.

In contrast, I would say that an on-premises deployment should be 99.9% successful and fast. The device is almost fully built by a technician, before being handed over. I really cannot remember any significant level of faults, once the deployment is fully tested and piloted, so the user phase is short and reliable. Of course, it requires double handling, as in Step 2. But the double-handling is the same in the case of pre-provisioning.

Autopilot Device Preparation

Autopilot v2.0 was introduced this year, 2024. It is a radical rethink of the original version. There are four main changes in the architecture. The question is: will they make the process more reliable, and faster?

  1. There is no pre-registration of the device hardware ID
  2. There is, as yet, no pre-provisioning or unassigned-user mode (called “self-deploying”)
  3. The ESP is, by default, replaced by an extension of the Out of Box Experience (OOBE)
  4. A list of priority apps and scripts is used instead of a list of blocking apps.

No pre-registration

Instead of using a global registration service, it works as follows:

  • The end-user account is added to a security group
  • When the account signs in on any unmanaged device, that device is automatically enrolled in Entra ID and Intune
  • Intune assigns a Device Preparation profile to members of the user group
  • The profile adds the device to a device group
  • That device group is used to install applications and configure the device.

This is a big change in architecture. The hardware ID used for device registration is a bit like the boot configuration used to prevent Windows licence fraud. It is explained in this blog: Autopilot hardware hash. Registration ensures that only a registered device can enrol, and that all registered devices are enrolled.

Registration was not difficult, for a large organisation. Either the vendor registered all new devices, or a script could be run to extract them from all existing devices. It is another step, but not a big one. I think removal of this step is more of an advantage for small organisations, where extracting the hash could be quite difficult.

If the step was not needed, why was it there, and what are the consequences of removing it? It seems to be a balance of risk. Autopilot v2.0 now allows an option of providing a simpler corporate device identifier. This could be used to prevent enrolment of unauthorised devices. But, for Windows, the identifier still has to be pre-loaded as a CSV of the device manufacturer, model and serial number. It is just slightly easier, since it does not require access to the device to obtain it.

No pre-provisioning

Autopilot v2.0 currently only supports the primary, user-driven, workflow. Microsoft says in the FAQs that: “The pre-provisioning mode and self-deploying mode scenarios will be supported in the future, but aren’t part of the initial release.”

Pre-provisioning is a fundamentally different workflow from user-driven.

  • At this stage, we usually don’t know what user will receive the device. It is a bulk preparation process, similar to re-imaging
  • Because there is no user account, and so no authentication or authorization, it uses a different process to validate the identity of the device
  • This process requires the “attestation”, or proving the identity of, the Trusted Platform Module (TPM), the unique hardware component incorporated in the device by the vendor. This proves that the device is the one registered in Autopilot, and not an imposter.

Since there is no pre-registration in Autopilot v2.0, it will not be possible to attest that the device is the one registered. We will have to wait and see how Microsoft solves this. But, without it, we lose the ability to cut the end-user wait time in half. An alternative is to re-image the device with standard applications installed, before handing over the device for a deployment with Autopilot v2.0.

No Enrollment Status Page

The ESP controls the flow of Autopilot v1.0. A failure at any step causes Autopilot to end with an error. Depending on the profile, the user can then either reset the device and start again, or continue anyway to a part-completed desktop.

As I have described elsewhere, the failures are sometimes caused by faults in configuration, but often by unknowable back end failures in the cloud services. Microsoft even recommended at one stage to not use the ESP to track progress.

In Autopilot v2.0, the ESP is optionally replaced by a simple progress indicator during the OOBE dialogue. I think this is easier for a user to understand. The percentage progress, however, is not the actual progress through the work. It is the percentage of the total time allowed before the process times out, default 60 minutes.

Autopilot v1.0 ESP

Autopilot v2.0 OOBE

The ESP itself is not the cause of failures. However, other failures in the process cause it to terminate Autopilot, even if those failures are not fatal to the deployment.

List of reference apps

The change with the biggest impact on the end-user experience is the list of “reference apps”. It is an ambiguous term. It means the apps to install during deployment. All other apps are installed after the setup is complete.

Autopilot v1.0 has the concept of “blocking apps” in the ESP. These hold the deployment until they are installed, and raise an error if they fail. The choice is none, selected apps, or all. If you configure All, then the deployment will take longer than if you configure Selected (although, if you pre-provision, this time does not matter). However, if you configure Selected, other apps are not installed. Instead, they will wait perhaps an hour for the next sync cycle. This may or may not be acceptable. In my experience it is not.

Autopilot v2.0 replaces this with “apps you want to reference with this deployment”. These apps are installed during the deployment. Unlike v1.0, all other apps continue to install without a pause, but without holding the setup. This gives a better user experience than v1.0, because it might only be another 5 -10 minutes before all the required apps are installed. With this design, I think it is reasonable to finish setup with a subset of apps, for example Office 365, Company Portal and a VPN or zero-trust network client, and perhaps any configuring “apps” (scripted configurations deployed as custom win32 apps). There is always more to do in the minutes immediately after setup is complete, and while the other apps are installing.

You might say this is making the best of a bad job, because with no pre-provisioning the only alternative is to hold the setup until all the apps are installed. But I think it is actually a realistic alternative to pre-provisioning. It really depends on whether you can package everything up to deliver an acceptable desktop in an acceptable amount of time.

The list of reference apps also cuts out a minute or more spent enumerating apps in Autopilot v1.0. Instead, Autopilot v2.0 only installs apps from the single device group specified in the profile. Microsoft calls this Enrollment Time Grouping.

Summary

I am optimistic that Autopilot Device Preparation (Autopilot v2.0) will be more reliable than v1.0, because the process has been simplified: in particular with no ESP and so fewer reasons for failure.

It is not faster. You might expect it to be, and Microsoft claims it to be (because of Enrollment Time Grouping). But my tests do not bear this out. It takes a given amount of time to download and install apps, and this does not change. It is possible there is less time spent enumerating the apps to install, but this does not translate into a shorter time overall.

However, the new ability to specify the must-have apps, and for installation of other apps to continue outside of the setup window, gives an opportunity to cut the waiting time for an end user.

The lack of a pre-provisioning mode means that you cannot take the theoretically fastest route (barring failures) of preparing the device with all apps before giving it to the end user. It might be unfashionable, but this means there is a rationale for re-imaging the device with a conventional on-premises technology before shipping it to the end user to complete with Autopilot v2.0.

AD Remediation: Tiered Administration

This is one of a series of posts about fixing problems that have accumulated in an old instance of Active Directory (AD). In this case, it is about introducing tiered administration into a Windows and Active Directory environment.

I have been in two minds about this post. Organisations have been moving away from AD and on-premises Windows servers, towards Entra ID (formerly Azure Active Directory) and cloud-based services, for a long time. The idea of tiered administration of AD came in around 2014. If organisations were going to introduce it, they should have done it by now. But some organisations may not. The larger, more complex and older the directory, the more difficult it is to do. I worked on this recently for a large organisation, and I was unable to find good a good description of the approach online. so I thought it might be useful after all to share this. Please leave a comment if you have any suggestions or questions from your own experience.

This is not a post with how-to screenshots. There are plenty of those elsewhere. It is a description of what needs to be done in practice, and some of the obstacles, together with how to overcome them. I also hope to pick a way through some of the Microsoft documentation on this. There is no single guide, that I know of, for how to do it.

It is also not a post on the general topic of hardening Windows or AD, or securing privileged accounts. There are plenty of those. It is specifically about tiered administration only.

Background

First, here is a bit of background. We need this to understand what tiered administration in AD is trying to achieve.

Tiered Administration is one of those “good things”, like Least Privilege and Separation of Duties. The National Cyber Security Centre (NCSC) describes it here: Secure System Administration. The idea is quite simple. Different accounts should be used to administer different layers of services differentiated by their criticality. For example, you should use a different account to administer the finance system than to run diagnostics (with local admin rights) on an end-user laptop. If the account you use for the laptop is compromised, it will not affect the finance system.

For Windows administration, the idea really took shape when Mimikatz blew a large hole in Windows security. In about 2011, Benjamin Delpy, published open source code to obtain credentials from a running Windows device. Using Mimikatz, any administrator could obtain the credentials of any other account logged on to the device, and use it to leapfrog onto any other device where that account had access, and so on. This meant that an attack could travel from any compromised device, including just a regular workstation, across and up to potentially any other device, including a domain controller. From there, they could simply destroy the entire environment.

This was a fundamental risk to the Windows operating system, and Microsoft responded with a slew of technologies and guidance to mitigate it. In 2012, the Microsoft Trustworthy Computing initiative published Mitigating Pass-the-Hash (PtH) Attacks and Other Credential Theft Techniques, followed by a Version 2 in 2014. In Windows 2012 R2, released in 2013, they introduced several technologies to mitigate the risk, including the Protected Users security group, Authentication Policies and Authentication Policy Silos, and Restricted Admin mode. To be fair, these built on a history of strengthening Windows security, for example with User Account Control (UAC) in Windows Vista and Server 2008.

Tiered administration is in Section Three of Version 2 of the Mitigation document referenced above: specifically in the section “Protect against known and unknown threats”. The technical implementation is described in Mitigation 1: Restrict and protect high-privileged domain accounts.

There is no technical fix for credentials theft in an on-premises Windows environment. It is not a bug or a loophole. It is intrinsic to Windows AD authentication with Kerberos and NTLM. Mitigation of the risk requires a range of large and small technical changes, as well as significant operational changes. Tiered administration is both, and it is only part of a plan to tighten up security. If you think you can do it with a few technical changes, and quickly, you are badly mistaken.

Documentation

It would not be useful to list all the things you need to do to protect privileged accounts in AD, but this is some of the key Microsoft documentation on legacy tiered administration. I use the documentation not just to read about a topic, but to provide an audit trail for compliance:

  1. Mitigation for pass-the-hash (referenced above)
  2. Best practices for Securing Active Directory. This is an excellent and extremely important document. Although it does not describe tiered administration specifically, you need to include all of the recommendations in your implementation: in particular, Appendices D, E, F and G. This document also describes in detail the Group Policy Objects (GPOs) to restrict logon across tiers, but it applies them only to the built-in and default domain groups, and not to your custom groups of tiered accounts.
  3. Unfortunately, I don’t think you will find a comprehensive Microsoft document on implementing tiered administration in AD. The guidance has been updated for modern authentication and cloud services, in the Enterprise Access Model. The legacy model referred to is the one described in the Mitigation document of 2014.
  4. Legacy privileged access guidance. This document covers the implementation of a Privileged Access Workstation (PAW). It is not a reference for tiered administration, but it does describe the GPOs that restrict administrators from logging on to lower tier hosts. It is important to recognise that the purpose of this document is to describe the implementation of a PAW, not tiering as a whole, and it uses only a simplified model of tiering.
  5. Administrative tools and logon types. This explains the different logon types and their vulnerability to credentials theft. These are the logons that will be denied by User Rights Assignment settings in the GPOs.

In the Microsoft legacy model, a tier represents a level of privilege in the domain. A Tier 0 account is one with the highest level of privileges over the whole domain. A Tier 1 account has high privileges over important business services and data. A Tier 2 account has high privileges over individual (e.g. end-user) services and data.

These documents are useful if you want an audit trail to show you have implemented the protections rigorously. As a CISO, for example, you might want to check that all the controls are implemented, or, if not, that the risk is identified and accepted.

You will find a lot of detailed and up-to-date (mostly) documentation on individual technical topics, especially for Tier 0 and PAW. This is not one of them. This aims to give a more rounded picture of both the technical and operational practicalities of implementing tiered administration in AD.

Logon restrictions

The basic control in tiered administration for Windows is to prevent an account in one tier from logging on to any Windows computer that is administered by an account in a lower tier. The purpose is to avoid the risk of exposing the credentials of the more privileged account.

These are the technical steps I have followed to implement the logon restrictions. The Microsoft legacy model uses three tiers, but there is nothing magic about that. It is just the number of tiers in their documentation. The reason, I think, is the traditional split between first, second and third line support; or end-user, server and domain engineers.

Here I have used User Rights Assignment settings in GPOs. You can also use Authentication Policies and Authentication Policy Silos. Those are discussed later in this post.

  1. Create three GPOs, one for each tier of computers: Domain Controllers and other Tier 0 servers; member servers; end-user workstations.
  2. List the groups you will use for your tiered administration accounts, one for each tier.
  3. List parallel groups for service accounts. This is because service accounts will separately be denied interactive logon rights to their own tier. This is not, strictly, part of tiering and so not covered further here.
  4. Create a spreadsheet to document the logon rights to be denied. Use three worksheets, one for each tier.
  5. In the first column, list the five logon rights to be denied. You can find this list in several of the documents I have referenced above. They are:
    • Deny access to this computer from the network
    • Deny log on as a batch job
    • Deny log on as a service
    • Deny log on locally
    • Deny log on through Remote Desktop Services.
  6. Across the top, create column headings for each of the accounts and groups to be restricted. These are:
    • Each of the built-in and default privileged accounts and groups listed in the Best Practices for Securing Active Directory guide, Appendices D to G. These are domain and local Administrator, domain Administrators, Domain Admins, and Enterprise Admins.
    • Your custom groups of tiered accounts: Tiers 0, 1 and 2.
  7. Follow Appendices D to G to document the logon restrictions for those accounts and groups. For example, in Appendix D, the built-in domain Administrator account has four logon restrictions.
  8. For your custom tiered administration accounts, implement all five logon restrictions according to tier, i.e. Tier 0 accounts are denied on the Tier 1 and Tier 2 worksheets; Tier 1 accounts are denied on the Tier 2 worksheet only.
  9. Finally (!) create the GPOs with the settings in the spreadsheet. Link them to the OUs with domain controllers and other Tier 0 servers; member servers; and workstations. Since this would be a “big bang” implementation, you might first apply the GPOs only to a sub-set of the computers.
  10. Test. The Microsoft Best Practices guide give a screenshot-level description of validating the controls, which is useful when preparing a test plan.

I have found different versions of these GPOs in different blogs, especially for the custom groups in Step 9 above. So, which is definitive? There are a few points to note:

  • For the custom groups of administrators, the five logon restrictions are the same five as those given for Domain Admins in the Best Practices guide
  • They are also the same given for “Domain admins (tier 0)” and “Server administrators (tier 1)” in the original v.2 Pass-the Hash document, referenced above, although the guidance is not as precise.
  • The Domain Admins group is the one added automatically to the local Administrators group when a computer joins the domain. It is logical to follow the same template for other administrators.
  • You do not need to deny logons upwards, to implement tiered administration e.g. deny logon for Tier 2 accounts on member servers or domain controllers. Lower tier accounts are not put at risk by logging on to a device administered by a higher tier.

You may also notice that the logon restrictions include Remote Desktop Services. This is because the normal remote desktop protocol (RDP) passes credentials to the target computer, where they could be captured. Restricted Admin mode of RDP does not pass the credentials. Instead, it authenticates the account on the source computer. So, if you enforce Restricted Admin, you do not need to deny log on over Remote Desktop Services.

There are a few obstacles to this, not insuperable:

  • Restricted Admin needs to be enabled on the target but, separately, required on the source. This means that, to enforce it by GPO, you need to know what the source will be.
  • It does not delegate credentials onwards. So, if you connect to a remote server, and then in the session connect to a file share or another server, you are not authenticated.

This is just the technical part of implementing logon restrictions in a tiered administration model for AD. It is a lot of detail, but it is not difficult.

Delegation

The next step is that you must match this with controls of delegation in the directory. Why does that matter? Because if someone has control of the objects in the directory, they can change what restrictions are applied. They might be able to change the GPO, or move a computer between OUs, or reset the credentials of an account in a higher tier. I have found no Microsoft documentation relating to delegation with tiered accounts. For tidying up existing delegations, see my separate post on AD Remediation: Delegation.

The first step is to ensure that all administrative accounts and groups go into a separate OU for admin resources only, where the normal delegations do not apply. This also means you must not have delegations in the root of the domain (e.g. Full Control of all Computer Objects), unless you also have Denies or Block inheritance, which you should avoid.

In a separate OU, the only default permissions will be for domain administrators. Then, you can pick your way slowly to allowing some very limited delegations of control over these accounts and groups. One thing to remember is that accounts in the custom Tier 0 group of administrators do not need also to be domain administrators. You can put an account in that group, and apply logon restrictions, without the account actually being a highly privileged account in terms of control of AD. It just means that the credentials are less likely to be compromised by logging on to lower tier computers.

This is a very confusing point. The allocation of tiered accounts is not primarily about who you trust. You should grant privileges (based on the Least Privilege idea) according to the skills and experience of the individual. But, in terms of threats, you should assume that any account can be compromised. The point of tiered administration is not to control who does what. It is to prevent the escalation from an easily compromised computer (like a workstation used to browse the internet) to a highly critical one (like a domain controller). So, you might allow a service provider administrator to add accounts to admin groups, or reset their administrators’ passwords, but only using a Tier 0 account, and one that is not a domain administrator. Likewise you could have Tier 1 accounts that do not administer servers, but have delegated control over Tier 2 accounts.

You need to be very careful that accounts of one tier do not go into groups that have control over objects in a higher tier. There is no automated way to control this. Accounts in a higher tier can control objects in a lower tier, but not vice versa.

Permissions, including delegated permissions in AD, are not inherently tiered according to logon restrictions. For example, clearly, you may have permissions for a file share that allow a wide range of admin accounts to add, change and delete files. My approach is to create separate sub-OUs for tiered and non-tiered groups of administrator accounts. That way, it is clear to administrators whether a group should have admins of only one tier or not.

Migration

To migrate, you will need to give every administrator one or more tiered accounts. These are the accounts that are in the tiered groups used in the User Rights Assignment GPOs. These are assigned according to the roles people perform, obviously.

The accounts need to be in the right delegation groups, depending on the admin role. For example, a Tier 1 account might be in the delegation group to create and delete computer objects in the member servers OU. A Tier 2 account might be in the delegation group to create and delete computer objects in the workstations OU.

For all other group membership, you will need to a) take the groups that the existing account is a member of, then b) work out which ones each tiered account needs to be part of. This might be a knotty operational problem. If your groups are well-organised already, then it might be easy. However, if your groups are chaotic (see my other post on AD Remediation: Obsolete Objects) then it will be more difficult.

To do this, you need to classify the groups according to the criticality of the data to which they give control. This is the enterprise access model in full. You have to consider, not what you want the person to access, but what any account of that tier might access, if compromised. The credentials in one tier are vulnerable to being captured by any account in that tier. If if it would be an unacceptable risk for all accounts in a tier to access a resource, then no account in that tier should have access.

Although you are blocking logon down-tier by accounts you trust, the objective is to prevent control flowing up-tier by accounts that are compromised. Administrative tiers correspond to the relative value of the organisation’s data and systems. End-user data and systems are controlled by all admins. Business data and systems are controlled by Tier 0 and Tier 1 admins. Critical data and systems are controlled only by Tier 0 admins. So, if you do not want a Tier 2 account to control a type of data or system, they should not be in any groups that allow them to do it. Even if you trust the administrator personally, they should use a higher tier of account to do it.

You will also need to create or modify GPOs to make the new tiered admin groups a member of the appropriate local Administrators group on servers or workstations. Logically this can be a subset of the admin group. Not all Tier 1 admins need to be able to log on to all members servers, or even to any member server. It is the same with Tier 2.

All service accounts must be assigned to log on to one tier and one tier only. For some services this might be a significant change, and it might require splitting services into two or even three instances. For example, if a service has administrative rights on domain controllers (which should be few if any), the service account cannot also have logon rights on member servers; and likewise for member servers and workstations. Examples of potential cross-tier services are anti-malware, auditing and logging, device management and inventory services.

The opportunity should be taken to understand exactly what rights a service account needs. It is quite common to make a service account a member of the local Administrators group when it doesn’t need to be. If this has not been done in the past, it will be a lot of work to retrofit, but necessary. Also, of course, a regular service account should be changes to a Managed or Group Managed Service Account if possible.

Other important considerations

This section covers a few other aspects of tiered administration in an on-premises Windows environment.

Authentication Policies and Authentication Policy Silos

Authentication Policies and Authentication Policy Silos were introduced in Windows 2012 R2. They provide one of the mitigations for the pass-the-hash and pass-the ticket vulnerabilities, by applying limited conditions to a Kerberos authentication.

You could use these in some cases, in addition to User Rights Assignment. The reason I have used GPOs in this post is because:

  • Authentication policies cannot apply to the built-in domain Administrator account.
  • Authentication policies are applied to accounts, not groups. They cannot be applied to the built-in and default groups in a domain, for example to the Domain Admins group.
  • So, to meet the recommendations in Appendices D to G (referenced above), we still need to use GPOs.
  • If you have the GPOs, it is an easy step to add the custom tiered admin and service account groups.

Trusted devices

To protect credentials, every administrative logon needs to be on a trusted device, at every step. The NCSC describes this very well in Secure system administration. This includes the original device, as well as any intermediary.

This is quite difficult and expensive to do. For example, if you have a third party service provider, will you provide each member of their staff with a dedicated laptop? Will your admin staff carry around two or three laptops? Or you may provide a hardened jump server: but what device will they use to connect to that? It is quite beyond the scope of this post to go into the different ways of achieving secure access, but it is important to accept that tiering is not complete without it.

Default security groups

AD has a long list of default security groups, some of which have elevated privileges in the domain. You should, obviously be careful about which accounts go in these groups. But there is a small class of groups that are “service administrators”, because they have a degree of control over domain controllers and therefore the whole domain. They don’t have full control, but they do have elevated control. They are:

  • Account Operators (recommended to be empty)
  • Backup Operators
  • Server Operators.

In my opinion, the members of this group should only be Tier 0 accounts, because they have a degree of control over the whole domain. But these Tier 0 accounts do not need to be a member of Administrators or Domain Admins. It does mean that the holder of the account also needs a Tier 0 PAW. You might also include these groups in your tiering GPOs, so that any account in them would be unable to log on to a lower tier.

Modern authentication

The problem that on-premises tiering of Windows administration is trying to solve is changed fundamentally by moving to cloud-based services. With authentication by Entra ID, we can use two or more factors (MFA), access based on conditions (Conditional Access), secure hardware to protect credentials (the Trusted Platform Module), and time-limited access (with Privileged Identity Management).

We all know this. The relevance here is that, if you bear in mind the complexity and uncertainty of implementing tiered administration on-premises, it may be more cost effective to move a large part of the problem to cloud-based services. If all your end-user devices use Windows Hello for Business, and Intune for device management, then you do not need a Tier 2 for on-premises administration at all. If you replace on-premises servers with cloud services then you also dispense with a lot of Tier 1. Even if you have a core of on-premises services that cannot be replaced, the problem is much reduced. It is far easier to manage a small number of administrators running a small number of on-premises servers than a large number.

Additionally, there is the observation that tiering can prevent a future breach, but not resolve an existing unknown one. Implementing tiering when you migrate to a new environment, with separate accounts for each environment, and clean devices created in the new environment, can do that.

Default Computers container

Computers, by default, are placed in the default Computers container when they join the domain. This container cannot have GPOs linked to it. This creates a risk that a computer in the container will be administered by accounts in different tiers. Your automated computer build processes should move computers automatically to the correct OU but, in any event, computers must not be allowed to remain here.

Conclusion

This is a large and important topic for on-premises Windows security, not easy to cover in one post. I think what I have described is a way to implement tiered administration for AD in practice, in a way that is compliant with Microsoft best practices and NCSC recommendations. Please make any suggestions or ask any questions in the comments below.

AD Remediation: OUs

This is one of a series of posts about fixing problems that have accumulated in an old instance of Active Directory (AD). In this case, it is about re-organising Organizational Units (OUs) in the directory.

OUs are containers for objects in the directory, in the same way that folders are containers for files. Over the years your directory may have accumulated many OUs; typically these will represent each era of management, with different structures, naming conventions, objects, delegations and GPOs. You may also have many objects left in the old OUs. You may decide it is time to tidy the whole thing up: create a fresh, new, structure and remove all the old ones.

Identifying all the objects in old OUs is easy enough. Then you can either move them to a new structure, if they are still current; or remove them if they are obsolete. That process is described in AD Remediation: Obsolete Objects. While you are doing the clean-up, here is a script to find the number of remaining objects in each OU, including its child OUs – obviously you cannot delete an OU that has no objects in it directly but has child OUs that do contain objects: Count-ObjectsByOU.ps1.

Scripting for discovery is an interesting task. It is full of endless complexities in the PowerShell object model for AD. For example, “Enabled” is a property returned by a Get-ADUser and Get-ADComputer object, but it is not a property returned by a Get-ADObject object, even if the object is a user or computer. Instead, Get-ADObject returns a UserAccountControl property, which is a set of flags to indicate the status of the account, including: enabled/disabled; does not expire; cannot change password; locked out and others. The user object in the AD schema does not have a single attribute for Enabled or Disabled. Get-ADUser interprets the UserAccountControl attribute to expose it as a series of separate properties. It is helpful to refer to the schema of object classes and attributes when trying to understand what is in the directory.

You really only need to create a new structure (rather than re-use the current structure) if you are making a significant change to delegation or GPOs. OUs are often created when introducing a new service provider, or a new version of the desktop. That is because these result in a significant change of delegation or GPOs. If you are making small adjustments, you can probably do it in place.

If you know what delegation you want to implement, and what policy configurations you want to apply, then you already have almost everything you need for a new OU structure. The function of OUs is to place objects hierarchically, and the purpose of the hierarchy is to apply permissions. Permissions are inherited. so the OU hierarchy represents the set of permissions applied to an object. Permissions for delegation and for GPOs work slightly differently, but they are both permissions. An account applies a GPO if it has the GPOApply permission on it, inherited from anywhere above it in the hierarchy.

AD has a tree structure, based on LDAP and X500. Each object in an X500 directory tree has a unique Distinguished Name (DN) derived from its Relative Distinguished Name (RDN) and the RDNs of every object above it in the hierarchy. Because the object has a unique DN, it can exist in only one place in the directory at a time, and so inherit only one set of permissions.

If you form your objects into exclusive sets, each with different delegation or different GPOs that you want to apply, and where each set can be contained in only one other set, then you will have a rudimentary OU structure for objects. For example, if you have a set of physical desktops and another of virtual desktops, with different GPOs, then a single windows computer can only be in one or the other, but both sets can be in a set of workstations. If you have a set of finance users, and another of users in Spain, and they are not mutually exclusive, then you cannot have them as separate OUs One must be a child of the other.

You can apply the same delegation, or link the same GPO, to different OUs if necessary. But the aim should be to have as few duplications as possible. Duplicate delegations risk drifting apart over time. A GPO with more than one link might be changed to meet the needs of one OU without even realising it affects another.

You need to think conceptually about what new sets you might have in future, and allow the structure to accommodate them. For example, you may not have kiosk desktops now, but you may want to have a structure that allows them in future. For your current desktop generation, it is not “all desktops”, but “all desktops of this generation”. If you design a new generation of desktop, with new GPOs, it will need a new OU with a new name. The OU effectively represents an environment, and you may have more than one over time. Of course, you may even have left the on-premises AD behind by that time.

For completeness, you probably should also think about potential changes in the structure of the organisation. OU structure does not follow organisation structure. It doesn’t matter, for example, whether staff are in one department or another, if their accounts and groups are administered by the same people and configured by the same GPOs. OU structure is for administration of the directory, not for users in the directory. Any large-scale changes in organisation structure might result in new domains or new forests, but not new OUs in an existing domain. However, you should document your organisational assumptions and let the Enterprise Architect agree it.

GPOs can also apply to non-exclusive sets, by using security filtering. An account can be in one group, or both, or none, provided it is in an OU within the scope of the GPO. This can also be used to avoid sets that are very small. If you have a few GPOs that configure, say, Finance apps, you could choose to place those desktops in a separate OU, or you could use a security filter. There’s no real cost to using security filtering. You have to place the computer (or user, depending) into the group; but you would otherwise have to place the computer (or user) into the OU. You can use WMI as a dynamic filter, but these can be costly to process. That probably doesn’t matter on a server OS, but might matter on a desktop OS. Similarly, item level targeting for Group Policy Preferences can be useful, but is costly if it requires a query over the network to evaluate, and can only be used for those policies that are Preferences.

This is all part of good GPO design, but I mention it here because it can effect how you design the OU structure. For example, should you have a separate OU for each server OS, with separate security baseline policies linked to each OU; or can you use a WMI filter for OS version as a filter on the GPO instead? In the case of a server, boot time typically doesn’t matter, within reason, so you might decide to go with WMI.

Both delegations and GPOs allow deny permissions. You can deny an access right on an OU, or even a child object. You can set an Apply Deny for a security group on a GPO. You can also block inheritance of permissions entirely. But both should be used sparingly, because they raise the complexity of maintaining the integrity of the structure.

There is also a matter of readability and searchability. It helps if engineers can see and understand the structure easily, so that new objects get created in the right place. If you have created OUs based on exclusive sets of objects, the structure should be fairly clear and obvious already. A case where you may choose to separate objects for readability is AD groups: security groups; mail-enabled security groups; and distribution groups (or lists). It is easy for these to become disorganised with duplicates, near duplicates and faulty naming conventions. Seeing the lists separately makes them slightly easier to administer.

I hesitate to mention this, because I think it should play a very small part if the structure is already logical, and if your administration is already well-managed. In the case of AD groups for example, if you have a separate team of Exchange engineers, then you may already have a separate delegation and so a separate OU.

Finally, my preference is to place all your new OUs in a single top-level OU, with a generic name. This top-level OU is then the root for custom delegations and GPOs. The name should be generic (like “Firm”, or “Top”, or “Org”) to allow for a change of business name. This avoids splatting your OUs across the root of the directory. I would also place all administrative resources (admin accounts, admin groups, admin workstations) in a separate top-level OU, so that the administration of regular objects is entirely separate from the administration of admin objects.

Once you have the exclusive sets of OUs, you can make a spreadsheet with a row for each, showing:

  • Level in the hierarchy
  • Name
  • Description (to be used as the Description attribute)
  • Path (the DN of the parent OU)

With this spreadsheet, it is a simple matter to use it to create the OUs with New-ADOrganizationalUnit. The level column in the spreadsheet is useful, because you can then ensure you create the parent OU before the child: create all level 1 OUs, then all level 2 etc. Next step is migration!

AD Remediation: GPOs

This is one of a series of posts about fixing problems that have accumulated in an old instance of Active Directory (AD). In this case, it is about Group Policy Objects (GPOs).

GPOs are as old as AD. They were introduced as a partner technology back in the year 2000. Group Policies are configurations that apply to a Windows computer, and GPOs are objects that contain a collection of policies. When a computer account or user account authenticates to the domain, it obtains the GPOs that apply to it and sets the policies contained in the GPOs.

Over the years, you may have accumulated hundreds of GPOs. You can see how many you have with this cmdlet: (Get-GPO -All).count. In an ideal world, someone would have tidied up continuously, but often, in my experience, that is not part of anyone’s role. Tidying up retrospectively can be an enormous task.

Why is it difficult? Surely you just need to look at each GPO and decide if it is still needed or not. But GPOs don’t work like that. As you might expect, there is a great deal of flexibility and complexity in how configurations are applied: precedence; inheritance; blocking; enforcement; ordering; merging; loopback processing; item-level targeting. To tidy up the GPOs, you first need to unravel all the complexity in how they have been created and applied over many years.

Why do it at all? In the end, a computer applies policies based on an algorithm to determine which ones should apply. You can see the winning configurations in a Resultant Set of Policy (RSoP), either in the GUI or with PowerShell Get-GPResultantSetOfPolicy -Computer [name of a domain computer to test] -User [name of a user to test] -ReportType [html or xml] -Path [path to the report]. So, arguably, if the RSoP is what you want, it doesn’t matter how it is achieved. Certainly, from a security point of view, you would audit the end result and not how it is achieved.

The main reason to tidy up GPOs is an operational one. A large number of accumulated policies is hard to understand. It is hard to make small changes without error or unintended consequences. If it takes to long to make changes, it could be because the existing GPOs are too complicated to understand.

Who is this a problem for? The content of GPOs belongs to individual service owners, not to the directory service owner. The directory is just the vehicle for delivering configurations, just as a network is the vehicle for delivering email. So you could ask the service owners to tidy up their policies. But it is the lack of ownership that has caused the problem in the first place.

If you start to tidy up policies, but are not the owner of the configuration (i.e. the service owner), it is important to recognise that the objective has to be to maintain the same RSoP. If you start to change the RSoP, then you are engaged in a service design, which is a quite separate matter.

This brings us back to the idea that you can avoid much of this by migrating to cloud-only services. If your devices are managed by Intune, and your user accounts are in Entra ID (whether hybrid or not), then all the GPOs applying to them in AD are redundant. You may still have GPOs, for the on-premises services, but far fewer and far easier to administer.

If you do decide to go ahead, here are my steps and methods to do it:

  1. Find and unlink all the redundant GPOs, being those with: no Apply permissions; applying only to unknown SIDs (representing a deleting account or group); GPO disabled; configuration disabled; link disabled; no link; obsolete WMI filter (for example, an OS version that you know is no longer used).
  2. Unlinking a GPO allows you to restore it quickly if you need to. You can make a backup and delete it when it has been unlinked for a while. You can back up and delete any GPOs that are already unlinked. This is a progressive action. In your directory clean up, as you disable unused accounts, and delete empty groups, and delete the resultant empty OUs, you will have more redundant GPOs.
  3. Fix the GPOs that sit outside the OUs where your computer and user accounts are. This will avoid the need for blocking inheritance.
  4. Find the RSoP for each group of accounts. Rationalise the GPOs in the RSoP. By “group of accounts”, I mean each large cluster of user and computer accounts. The biggest one, of course, will be a standard user on a standard desktop. Another might be for virtual desktops. As you get to smaller and smaller clusters (e.g. administrators on file servers), it can be easier just to examine the GPOs manually.
  5. Deal with each of the policies that is filtered on a subset of accounts. Some of them may be needed, for example to configure an application. Some may be obsolete policies developed for testing and never used.

In Step 1, I use scripts based on PowerShell Get-GPO and Get-GPOReport. Get-GPO only returns a set of meta data about the GPO itself, not the settings in the GPO. Get-GPOReport returns the configurations of the GPO as XML, which can be parsed to find what you are looking for. Get-GPPermission gets the permissions on a GPO, which you can filter to find who the GPO is applied to, with the GPOApply permission. Get-GPInheritance gets the GPOs that are linked to a specified OU, together with the order in which they are linked. You can see examples of my discovery scripts here: obsolete GPOs, Apply permissions, and GPOs by OU.

In Step 2, you can script a backup of the GPO before unlinking or deleting it, with Backup-GPO -Guid [GUID of the GPO] -Path [path to the backup directory]. I always use the GUID for these actions, in case the object has been deleted and replaced with another of the same name.

In Step 3, the problems are distinct and separate:

  • The Default Domain and Default Domain Controllers GPOs should contain only the settings that are in the defaults created with the domain. You can customise each of the settings, but should not add other settings. These GPOs are not the place to add settings that you want to apply to all computers, or all users, or all domain controllers: those should be in separate GPOs. There is an obscure reference to this in the documentation for dcgpofix, which is a utility to recreate the default GPOs.
  • GPOs in the root of the domain are a legitimate way to configure settings for all computer accounts or all user accounts. GPOs here will apply to accounts in the default Computers and Users containers. Because they are containers and not OUs, you cannot add GPOs to these directly. But they do inherit from the root.
  • But, if you don’t need to apply GPOs to these default containers, and if you find you are blocking inheritance to avoid GPOs in the root, then the solution is to unlink them from the root and apply them only where they are not already blocked.

In Step 4, the RSoP will show you the “Winning GPO” for each setting. If you take each winning setting, and only those, and put them in a new set of GPOs, you will be able to replace all the existing GPOs in the RSoP. If you make a copy of the existing GPOs, you can edit these to keep only the winning settings. If you want to re-organise the settings into a more logical collection of GPOs, you can create new ones and move the settings into them.

You can cross-check the winning policies by using the Microsoft Policy Analyzer, part of the Microsoft Security Compliance Toolkit. Policy Analyzer will not show you the winning policy. But it will show every separate policy in the GPOs in an Excel spreadsheet, together with every duplicate and conflict. If you load Policy Analyzer with every GPO that applies to all your target accounts, and if you know the winning policy from the RSoP, then you can identify all of the duplicates and conflicts that should be removed.

In Step 5, you will have a long tail of GPOs that apply to only a subset of computer or user accounts, based on filtering of the GPO Apply permission. These may be accounts with an allowed exception, or to configure an application. Mostly, you will want to keep all of these.

But you will need to be careful with them. The settings may conflict with other policies, or with the RSoP for the same accounts. In this case, they will rely on ordering. Ordering is a subtle property. It is not a property of the GPO itself. It is a property of the link. It can be obtained by Get-GPInheritance for a set of GPOs either linked directly to an OU, or inherited by it.

Just because a GPO has a higher precedence (lower link order) does not mean it needs or uses the order to take effect. The order only matters if there is a conflict. You could use Policy Analyzer to detect the conflict. But, if you use naming and granularity to specify the purpose of each GPO, it should be easy to identify where you have a potential conflict.

My preference is break out policies that have exceptions as separate GPOs: both the rule and the exception. For example, if you have a rule that most people cannot write to USB, and an exception that allows some people to write, then you can have one GPO for the rule. This rule can be applied to authenticated users, ensuring it is always the default in the absence of an exception. You can then use a Deny Apply for the group of people who are exempt from this rule; and, optionally a second GPO to allow write. You don’t need this rule if it is the default setting in Windows, but creating it means that it cannot be accidentally changed. By applying this GPO to the same group as the Deny Apply for the main rule, you guarantee an account must be either one or the other, and never “Not configured”. Then you don’t rely on ordering, which can easily be changed unintentionally.

In Step 5, too, you can deal with GPOs that are applied only to what looks like test accounts; for example, a few users or computers by name, or a security group that looks like a test group. If you use the script Get-GPOApply to show every Trustee for every GPO, you can filter on the permissions that look doubtful.

You can see that, even with scripts and tools, if you have many redundant GPOs there is a large amount of work in rationalising them. There is also a significant risk of unintended impact, no matter how careful you are. For this reason, you need to be very sure you want to go ahead, rather than migrating to cloud-only services with no GPOs.

AD Remediation: Delegation

This is one of a series of posts about fixing problems that have accumulated in an old instance of Active Directory (AD). In this case, it is about delegation of control over objects in the directory.

Delegation in AD is the assignment of non-default permissions to objects in the directory: for example, permission to delete a user account. Over time, and with different service providers in that time, delegation can become muddled, creating a risk that administrators may have much wider permissions than they really need. If their account is used maliciously, or if their credentials are obtained by someone else, this can result in extensive damage to the organisation. This post covers how to restore delegation to a manageable state.

In Entra ID, and other modern applications, the rights to perform different administrative tasks are organised into different roles: role-based access control (RBAC). The idea is that different administrators should have different levels of rights, according to their role. In Entra ID, for example, there are built-in roles for Helpdesk Administrator, Password Administrator, Groups Administrator and so on. Administrative staff can be assigned one or more of these roles. This is a fundamental part of securing your service administration.

AD does not have these roles. It does have built-in and default groups, such as Accounts Operators; but these are not organised into intended roles and not suitable for least-privilege delegation: Default Security Groups. There are no groups, for example, for Helpdesk, Password or Groups administration.

If you are curious about the difference between rights and permissions, see the footnote.

In AD, permissions are assigned by Access Control Lists (ACLs) applying to objects in the directory. Like other ACLs, these can be inherited, or applied directly, and permissions can be granted or denied. In AD, they can apply to an entire object (like a user or computer account), or to specific attributes, or a set of attributes. It is an extremely complicated system. Simple delegations, like Full Control of computer objects, are quite easy to set and to see. But more granular permissions can be more difficult. For example, you may want helpdesk staff to be able to read the BitLocker recovery information for a computer). But this attribute has a Confidential flag and cannot be set in the Delegation GUI.

Over the two decades of AD, it is quite likely that different delegations have been made. You may have different delegations for each era of managed service provider, or each era of desktop. You may have some that have been applied at the root of the domain, and some Organizational Units (OUs) where the inheritance of these root delegations is blocked. If they apply at the root, then they will take effect on the default Users and Computers containers; whereas, if they have not been applied at the root, these containers will have the default permissions. This makes it difficult to know what level of control has been delegated. As an example:

  • Let’s say that the computer accounts for new laptops are stored in an OU called “Workstations”.
  • Let’s assume that the permissions on that OU are exactly what you want them to be. Helpdesk staff can do no more with computer accounts in that OU than you intend. They get these rights by being in Group A.
  • But there are also some laptops (possibly) in an old OU. This OU does not have direct permissions assigned over computer objects, but inherits them from the root of the directory, where full control of computer objects is delegated to Group B. So the helpdesk staff go in Group B as well.
  • Because the permission is assigned at the root of the directory, it is inherited by the default Computers container.
  • When new servers are built by server engineers, they are created initially, by default, in the Computers container. So the helpdesks engineers find that they have full control of new server objects created in the default container, which is not what was intended.

The first step in resolving this problem is to obtain the existing delegations. The PowerShell cmdlet Get-ACL fetches the ACL for a specified object in AD, for example an OU object.

Get-ACL is one of the more interesting and complex cmdlets in the Active Directory module. It gets the properties of the ACL, not of the Access Control Entries (ACEs) in the list themselves. The ACEs are contained within individual rules, which determine what right is granted, who it is granted to, and how it is granted. To get the collection of rules, you use the code property ‘Access‘ like so: $rules = (Get-ACL -Path "AD:[distinguished name of the object on which permissions are set]).Access.

An example of a rule is:

ActiveDirectoryRights : CreateChild, DeleteChild
InheritanceType : None
ObjectType : bf967aba-0de6-11d0-a285-00aa003049e2
InheritedObjectType : 00000000-0000-0000-0000-000000000000
ObjectFlags : ObjectAceTypePresent
AccessControlType : Allow
IdentityReference : BUILTIN\Account Operators
IsInherited : False
InheritanceFlags : None
PropagationFlags : None

The next thing you will notice is that the rules set a right on an object, identified by a GUID. So the rights are Create Child, Delete Child, and the object to which this is applied is referenced by the GUID bf967aba-0de6-11d0-a285-00aa003049e2. The object might be a primary object, like a user account, or it might be a property of the account, like Lockout-Time. There are many hundreds of these. To match them to a readable name, you need to refer to the directory schema.

Fortunately, there is an excellent description of how this works, by Faris Malaeb: Understanding Active Directory ACL via PowerShell. Faris also publishes an excellent script to export the rules: ADSecurityReporter.

Once you have the existing delegations in Excel, you can sort and filter them to make sense of what has been done in the past.

The next step is to define what delegations you would like to be applied; and the step after that is to plan the migration from the current set to the new set.

In an ideal world, you might perform an analysis of the separate tasks performed in administration of the directory, and then assemble these into roles. In practice, you may have a good idea what some of those roles are, based on first, second and third line support. From a security perspective, you want to understand the least privileges that are required to perform the task. Does someone on the helpdesk need to be able to create or delete a user account? Probably not. Do they need to be able to add a user to a group? Maybe.

As an example, I have used the following starter for roles:

  • A Level 1 and a Level 2 role, corresponding to first and second line, for different types of object. Level 1 typically can make small modifications. Level 2 typically has full control.
  • Level 1 user administration, for example, might include only: Reset password; Read/write pwdLastSet; Read/write lockoutTime (to unlock an account).
  • Separate roles for administration of different types of object: user accounts, workstation accounts, server accounts, groups, GPOs, OUs.
  • For server administration, separate roles for services that are administered separately, e.g. finance, Exchange.
  • Possibly separate again for Exchange related objects such as distribution groups and shared mailboxes, depending on how your Exchange administration is organised.
  • It is then up to the managers of the support service to assign one or more of those roles to individuals.

A second problem, in addition to muddled delegation, is that it is common in my experience to find a large, even very large, number of people with domain administrator rights. This is a problem to solve in itself, by reducing the number to those that actually administer the directory itself. It is also a particular problem for delegation, because it means the actual rights needed are not explicit. Mostly these people will need a collection of Level 2 roles. But there will also be a wide range of rights that are only used occasionally, for example: DNS admin; DHCP admin; Sites admin, OU admin. You might use custom delegation for these, or you might use some version of Privileged Identity Management (PIM) to assign a domain administrator role when needed for a specific task.

As with most operational and organisation changes, designing the change is one thing; migrating to it is another. You can apply the new delegation to your OUs, and you can add staff to groups for the new roles. But the new delegation does not replace the old until you remove it. You probably cannot simply remove staff from the old groups used for delegation. These group may well have permissions granted elsewhere, for example in the file system, or in an application like SCCM. So you cannot remove a member from a group without affecting their ability to do their job outside the directory. This makes removal of the old delegation a big bang. You have to remove the old delegation entirely, in one go.

An alternative is to create a new OU structure and apply the new delegation there. You can migrate objects (like workstations) progressively, to limit the risk. When an object is migrated, it can only be managed with the new delegations, regardless of the group membership of the administrator. However, that is a lot of work, which goes back to the original argument that it may be better to move to cloud-only services wherever possible to avoid this.

*** Permissions and rights. There is a difference in the way that Microsoft uses these terms. Broadly, I think it is true that a user is granted a right (or privilege), while a permission exists on an object. But the terms are not used consistently in the implementation. In the GUI, when you create a delegation, you select the permission to delegate. In PowerShell, the same thing is called a right. So I think something like “Full Control” is both a right assigned to a user and a permission set on an object.

AD Remediation: Obsolete Objects

This is one of a series of posts about fixing problems that have accumulated in an old instance of Active Directory (AD). In this case, it is about removing obsolete objects that remain in the directory but are no longer used.

If you have objects in AD that are obsolete, then this post will cover how to find them, and what to do about them. These objects can be: staff accounts, service accounts, administrator accounts, shared mailboxes and contacts; desktop computer accounts and server accounts; security groups and distribution groups; Organizational Units (OUs) and others. They also include Group Policy Objects (GPOs), but I will deal with those separately. There are many other object classes and categories, but these are the main ones we need to deal with.

Obsolete objects make the directory untidy, and perhaps more difficult to administer. But obsolete accounts are also a security risk. If an account is not disabled (or expired) it may be used maliciously – for example the account of a domain admin who has now left. Even if the account is disabled, it can easily be maliciously re-enabled, used and re-disabled. Obsolete security groups may give staff permissions they should not have. And obsolete distribution groups create a muddle as to which ones staff should use. The trouble with obsolete groups is that members will continue to be added, because memberships are often copied from one account to another. So you can have a situation where new staff, or administrators, are being added to groups and no-one knows whether they are needed or not.

To tackle obsolete objects, you really need to have policies for the administration of the lifecycle of an object. For example, when should an account be disabled? And should it be deleted, or left disabled permanently? If you have many obsolete objects, then you probably don’t have these policies. Developing these policies is a significant piece of service design, because you need to involve people from Security, HR, Legal, and service management. It is far from straightforward. With a user account, for example, what do you want to happen to their mailbox and OneDrive when they leave the organisation, or go on maternity leave?

For user and computer accounts, my preferred approach is disable the account, remove it from all groups, and move it to an OU with read-only permissions so it cannot easily be re-enabled. Then, after a period (say, a year) it can be deleted unless it is on an authorised list of accounts not to be deleted.

But, just to give an example of the complexity, a shared mailbox uses a regular user account. It should be disabled by default, because no logon is required, so being disabled does not mean the account is no longer needed. There is no purpose in the account being a member of a security group (because no-one logs on to it) but it can legitimately be a member of a distribution group. So how can you know if it is needed or not? You need a system of ownership so that one person is responsible for controlling who has permissions to the mailbox. If you think a shared mailbox be not be needed any longer, you can remove the mail related permissions first, before deleting it, to give an opportunity to discover if anyone is still using it.

For accounts, you may use the Last Logon Timestamp attribute to give an indication of whether the account is being used to log on or not. This is a replicated attribute, updated about every 14 days. This still isn’t perfect. You may have a service account that is used to authenticate to an application, for example, and this will not be recorded as a logon. So, even with the Last Logon Timestamp, you need to filter the lists for what you think are real people.

Groups in AD do not, themselves perform authentication, and there is no attribute to indicate whether they are being used or not. Group membership is part of the user logon token, but the group that enabled an authentication to take place is not recorded in the audit. With groups, you probably will want to establish a system of ownership (the ManagedBy attribute), so that owners periodically confirm the membership and whether the group is still needed. You could also use the description field to describe the purpose of the group. Security groups should probably belong to a service, and therefore have a service owner. Distribution groups could have as owner the person who requests the group.

Since groups perform no logon, they cannot be disabled. However, if you think a group may no longer be needed, you can move it to a different OU with read-only permissions. That way, members cannot be added easily. If they do need to be added, then the opportunity can be taken to record the purpose and ownership of the group. When a read-only group becomes empty, because all its members have been disabled and removed, then it can be deleted.

Finding obsolete objects is conceptually easy, but in practice more difficult and not clear-cut. I use PowerShell scripts to export all the important attributes of an object to Excel, where they can be sorted and filtered to produce a list of objects to tackle. I then use the same attributes to check an object before taking action on it. This takes care of the case where the object has changed since being found. For example, if a computer is named with its organisation-assigned asset number, then the computer may in fact have been rebuilt with the same name since you identified a previous instance as inactive.

The discovery and remediation of obsolete objects in AD is is a significant piece of work, if it has been neglected. It can easily take three months or more in a large directory. It is a rolling process. For example, you may identify inactive users and computer accounts, disable them, remove them from groups and move them to a new OU. When you have done that, you may have security and distribution groups that are newly empty, so you can delete those. When you have done that, you may have GPOs that are no longer applied to anyone, and you can remove those. When you have done that, you may have whole OUs that are newly empty and can be deleted.

Cleaning up requires a lot of PowerShell scripts, with a lot of gotchas for the attributes of different objects. I have provided a few scripts I use, for user accounts, computer accounts, security groups and distribution groups, here: AD Remediation scripts.

A few notes on the scripts:

  • They are not intended as off-the-shelf tools for finding obsolete objects. You should customise them for your needs.
  • For export to Excel I use a PSCustomObject and the pipeline. Each key value pair in the object is a column in the Excel spreadsheet. This makes it easy to add or change attributes that you want to export.
  • In Excel, the data can be filtered and sorted to find what you want. This can then be exported to a CSV, which can be used by another script to delete or disable the objects. This keeps an audit trail between what you discover and what you change.
  • I use a timespan to get the number of days since accounts last logged on. This means I don’t have to hard code an interval into the script. I can simply filter or sort the Excel spreadsheet based on the number of days: 90, 180, 360 or whatever.
  • I always fetch the GUID of the object because it is possible that, since the date of the discovery, an object has been changed. It can even have been deleted and another object created with the same name.

It is really a fascinating exercise to piece together the history of the directory in the discovery process. There are endless intricacies.

Active Directory (AD) Remediation

Active Directory (AD) was introduced by Microsoft in 2000, and became mainstream for desktop management with the launch of Windows XP in 2001. It was accompanied by a set of technologies called IntelliMirror, though that term was soon discontinued. These technologies included: Group Policy; Folder Redirection; Roaming Profiles; imaging (Windows Deployment Services) and software distribution (Windows Installer). They are only now being replaced, with services (rather than software) wrapped up as Microsoft 365: Entra ID (replacing AD); Intune; OneDrive; Autopilot.

The problem is that, if an organisation has not been through mergers and acquisitions, and has not yet fully adopted Microsoft 365, it may still have remnants of configurations dating all the way back to the early 2000s. This is especially true if it has outsourced to a service provider, or many providers, over that time. The result is a mish-mash of configurations that, quite possibly, no-one fully understands.

This matters for several different reasons:

  • You may not know whether computers have the right security settings or not; or you may know for sure that they do not
  • Administrators may have wildly greater permissions in the directory than they need; for example, a large number of administrators may have domain or enterprise administrator rights, simply because no-one knows what rights they really need for their job
  • Administration may be inefficient; it may take too long, with too many mistakes, to create accounts for new staff, or to disable accounts when staff leave
  • Staff and contractors may obtain permissions (e.g. to files, applications, mailboxes) that they should not have

The security risk is acute. If an administrator has domain admin rights, and if the credentials of the account are exposed, then there is a risk of catastrophic damage; for example through a ransomware attack.

You might wonder how that is possible. Why does the current service provider not understand what everything is in AD? There must be people they can ask? But they don’t, and they can’t. The reason is that service providers generally come in to run services as they are, or perhaps introduce a new service. They don’t (in my experience) have a contract to repair all the existing services. And staff move on. The person responsible for File and Print services today, for example, was not responsible for the perhaps several previous generations of services. They won’t know who is supposed to have permissions to old file shares or whether an old print server is still used. Likewise, the person responsible for email won’t know whether old distribution groups are still needed or not.

One problem is lack of ownership of AD. You can imagine that someone is responsible for managing the current desktop, or the finance system, or the Intranet; but usually (in my experience, again) no single owner is responsible for the directory. Although Group Policies, for example, are administered in the directory, the configurations they apply belong to the owners of the service using the configurations, not to the directory service owner.

This will be a series of articles about how to fix the problems in old Active Directories. It will cover things like what to do with inactive or obsolete objects; delegation of administrative permissions; how to tidy up old Group Policy Objects (GPOs); how to remove old Organizational Units (OUs).

The main conclusion to take away is that it is likely to take far longer, and be far more difficult, than you might imagine. If this is true, then it makes a stronger case for moving away from Active Directory to cloud-only services. For example, if you move your desktop management to Intune, you no longer need the GPOs, or the delegation of control, for desktop computers in AD.

A second conclusion is that it is impossible to clean up the objects in AD without, at least implicitly, setting policies for the administration of AD. How long should accounts be kept active before they are disabled? Should accounts be deleted or only disabled? What roles are performed in administering the directory, and what permissions does each role need? Are security configurations mandatory or optional? Who should have domain admin rights? How do you keep track of which security groups and distribution groups are needed and which are obsolete? To set policies, you need to have an idea of who is responsible for each policy and each service the policy applies to. If you do not currently have these policies, or service owners, you may find this is a big cultural change.

Topics:

Government Commercial Problems with IT Procurement

Working in IT, I come across procurement problems frequently. The root cause, it seems to me, is that government procurement rules are implicitly designed for a steady state, whereas IT projects implement change, which is inherently imprecise. These rules need a radical overhaul. The new Procurement Bill, currently (Feb 2023) going through the House of Commons, aims to do this.

Problems

What sort of problems? 1) Long delays. A procurement that might be a simple executive decision in the private sector can be a three or six month exercise in the public sector. On a project, delay has a cost. This cost often outweighs the potential benefit of the procurement process. 2) Inflexibility as requirements evolve. Sometimes you don’t know exactly what you need until you talk to suppliers. But you can’t talk to suppliers without a formal procurement process.

I cannot give specific cases, for reasons of client confidentiality. But I can highlight the areas of the procurement rules that create these problems. The intention of the public procurement policy is clear and legitimate: to achieve “the best mix of quality and effectiveness for the least outlay over the period of use of the goods or services bought”. The question is whether the rules do this in practice.

I must say at the outset, these thoughts are from a “user” perspective. I have no great knowledge of the procurement rules, only my experience in performing procurements as part of an IT project. The amount of regulation and guidance applying to procurement is vast, and I don’t know how anyone could master it. The scope is vast too: £ hundreds of billions of contracts, of every conceivable type, and ranging in value from £ billions to £10,000. I don’t believe it is realistic to try to codify the rules for this vast enterprise, but that is what the policy does.

Long delays

I led a piece of work to implement a small piece of software that integrated two different systems. There are four products that do this. It is quite a niche area, with not much published information. The value of the purchase would be small, in relation to the two systems being integrated. The products are priced by volume of usage, with annual subscriptions. There were various technical complications about integrating with the two specific systems in our case.

The obvious thing to do was to buy a few licences and learn on the job. We were not allowed to do this. The rules said that no purchase of any kind could be made without a selection process, in this case to decide which ones to trial. The public information was not sufficient to justify the selection of a single product to trial. The next obvious thing was to talk to vendors. We were strictly not allowed to do this. Talking informally to any vendor would prejudice a fair selection.

So we developed our selection criteria as best we could (based on what we could glean from the published information), and then carried out a systematic trial of all four products sequentially. The trial involved actually implementing all four products, and asking staff to evaluate their experience when using them. The experience was almost identical, as we expected.

Some of our important selection criteria were technical, for example compliance with security requirements, and licensing terms. For these, we had to ask the vendors to respond to an RFP. As you can imagine, the responses were inadequate to provide any assurance, without speaking further to the vendors.

After going through the selection process, amazingly, we had not actually completed the procurement. All the vendors sold licences through resellers, as you would expect. So, after the selection, we needed to pick a reseller. You’ve guessed it! We needed a procurement to pick a reseller to sell us the licences for the product we had selected. Fortunately, we were able to use the Crown Commercial Services framework to ask for quotes.

The end result was that we purchased a few licences for the product we expected to pick at the beginning, but many months later and at considerably greater cost than the cost of the licences.

The basic problem here is that we do not live in a world of perfect information. At the outset, we cannot know all the ins and outs of different products. Vendors design their product information to highlight advantages and hide weaknesses. Vendors do not publish real prices. Vendors do not respond to RFPs with full and honest answers to questions.

Think of it from the vendor’s point of view. Some government department wants to make a small purchase. The department invents a long and complicated process and invites them to participate. What should they do? Obviously, just send them the data sheet and the price list. Why would they go to the effort and expense of responding when the total profit if they won would be less than the cost of responding?

Inflexibility

I led a project to upgrade the technology of an existing system, the purpose of which was to enable integration with another system. Sorry if that is a bit obscure: the reason is confidentiality.

The original system was contracted for before the integration even existed. We were not allowed to select our new network supplier with the integration built in to their product. This service was not in the scope of their new contract, because no-one at the time knew we would need to do this. It would have required a completely fresh procurement of the primary product, which would have taken at least a year.

In this case we were allowed to vary the existing contract. The rules on variation are highly complex. They require a good understanding of Annex A – Regulations 72 and 73 of the Guidance on Amendments to Contracts 2016. We were allowed to vary the contract but only provided the contract used different technology to do the same thing.

This gave us a few big challenges to negotiate. One, we needed a new type of support for the new technology not provided in the original contract. Two, we needed a third party (at additional cost) to provide a service to assist in the integration.

After something like a year we had completed the integration. At this point there was less than a year to run on the existing contract. But we could not extend the contract. The rules on extension are especially severe: they are one of the “red lines” for IT procurement. So the next stage had to be a full procurement of the whole service, having just completed the transformation of the previous service.

The basic problem here is that we don’t live in a world of isolated products and services. They are all inter-related in some way. It is not possible to have perfect foreknowledge of all the ways the services might need to change in the future.

Observations

I have a few observations.

  1. Procurement rules do not take account of the cost of complying, in relation to the value obtained.
  2. They assume the availability of adequate market information to make perfect choices without speaking to vendors.
  3. They also assume vendors can and will respond with accurate detailed information about what they offer.
  4. They do not take sufficient account of the relationships with other products and services, and the way these all evolve over time.
  5. It is simply not possible to comply with the rules intelligently, without having a large and skilled Commercial department.
  6. Commercial department cannot have a full knowledge of the product or service being procured and, therefore, there will be extensive delay or bad choices made.
  7. Delay is built in to the system, and the cost of delay is not accounted for.
  8. The cost and delay of procurement means that people are incentivised to wrap up products and services into large contracts that preclude innovation and competition – the exact opposite of what is intended.

Procurement Bill

The original Public Contracts Regulation 2015 stemmed directly from the EU Public Contracts Directive. The intention was to make contracts open across Europe.

But the idea that you can regulate all procurement across all of Europe with a value of more than £138,760 (Jan 2022 threshold) seems unrealistic. Let’s say you have an organisation of 10,000 staff. Let’s say a contract might run for 5 years (printing, laptops, software etc.). The threshold means that any contract worth about £3 per member of staff per year must be subject to a full, open, procurement. Let’s say the vendor profit on the procurement is 20%, or £27,752. The procurement process will cost more than that!

The explicit aim of the current Public Procurement Policy is to obtain value for money. But people don’t need rules to enable them to obtain value for money when buying a holiday, or a car, or the weekly shopping. People will do this for themselves. What the public needs is rules to prevent corruption. Anything that knowingly does not obtain value for money is corrupt. The new Procurement Bill says it aims to do this: “Integrity must sit at the heart of the process. It means there must be good management, prevention of misconduct, and control in order to prevent fraud and corruption.”

I will leave it to others to describe the changes in the new bill. But it is interesting to consider how it might affect the two cases I mentioned.

  • A below-threshold contract is one worth more than £12,000 and less than (I think) £138,760
  • For a below-threshold contract, the contracting authority “may not restrict the submission of tenders by reference to an assessment of a supplier’s suitability to perform the contract [including technical ability]. I take that to mean that all procurements must be open to all potential suppliers and not shortlisted. That is admirable, and I see no difficulty in making all these tenders public. But for obscure and specialised requirements the result is likely to be a deluge of irrelevant tenders and/or no valid submissions at all.
  • This does not apply to frameworks, so the best way to procure anything below-threshold will always be through a framework. But frameworks can only sell commodities. They can’t sell niche specialised products.
  • Modifying an existing contract is covered in Section 74 and Schedule 8. I think a contract extension is limited to 10% of the term, i.e. 6 months of a five year contract. This is still not enough where a change of circumstances occurs during the contract.
  • The provision for additional goods, services or works during a contract seem less restrictive then before. “A modification is a permitted modification if (a) the modification provides for the supply of goods, services or works in addition to the goods, services or works already provided for in the contract, (b) using a different supplier would result in the supply of goods, services or works that are different from, or incompatible with, those already provided for in the contract, (c) the contracting authority considers that the difference or incompatibility would result in (i) disproportionate technical difficulties in operation or maintenance or other significant inconvenience, and (ii) the substantial duplication of costs for the authority, and (d) the modification would not increase the estimated value of the contract by more than 50 per cent.” That seems to be a lot more flexible than before.

The scope of government contracts, even just IT contracts, is vast and I don’t know how it is possible to codify the rules governing them except by introducing  a great deal of bureaucracy and expense.

Curiously, the word “integrity”, despite being one of the bill’s objectives, only occurs once in the bill, other than in the statement of the objective. It occurs in the context of the supplier’s integrity. But, when a private sector organisation contracts with a vendor, the organisation is relying on the integrity of the staff, not the vendor. If the staff act with integrity, the organisation is confident the best choice will be made.

Speaking for an SME, I’m glad the bill has provisions to make it easier for small businesses to obtain contracts from government. But I have difficulty seeing how that will work in practice. Bidding is an expensive process. The way a small business manages the cost of bidding is to screen the opportunities for a competitive advantage. This might be having a good reputation with previous clients, or offering a high quality of service, or having strong skills in a particular area. These are intangibles that are screened out in a bureaucratic tendering process.

Fault with Company Portal

This is a story about the complete failure of Microsoft Premier Support to diagnose and resolve a fault in the Company Portal.

It is difficult to put into words how complete the failure is. But it includes a failure to define the problem; a failure to capture it or reproduce it; and a failure to provide any diagnosis of the cause.

The Company Portal is the Modern App, or Store App, that enables a user to install applications that have been published and made available to them from Intune. It is an essential part of the Mobile Device Management (MDM) platform. Without the Company Portal, a user can only have the applications that are “Required”. So, after Autopilot, Company Portal will often be the first place a user goes, to obtain the rest of the applications that they need to work with. An example is specialist finance applications. These might be made available to a community of finance users, but each person will install the ones they need individually.

The problem we have had for several months is that the Company Portal will suddenly disappear from a user’s desktop. It is gone. The user can search for “Company Portal” and it is not there. Where has it gone? No idea. How do you get it back? Well, you obviously can’t use the Company Portal to get it!

The facts of the problem are simple and clear, though you would not believe it from the number of times we have been asked to explain and provide logs:

  • After Autopilot completes, Company Portal is present and working.
  • Some short time later, it has disappeared from the user’s Start menu.
  • If you run Get-AppXPackage as the user, the Company Portal is not listed. However, if you log on as an admin, and run Get-AppXPackage –AllUsers, then the portal is shown as installed and available but only for the admin user.
  • The AppX event log does not show any obvious relevant events.
  • It seems to happen in episodes. And it seems to happen sometimes and not others.

We have been asked repeatedly to provide logs: Autopilot logs and One Data Collector logs. But, obviously, if you gather logs before it has disappeared, then there is nothing to see. If you gather logs after it has disappeared, then there is also nothing to see.

After a while, we asked Microsoft Premier Support to try to reproduce the fault themselves instead of continuously asking us for logs. Amazingly, they are unable to do this. Microsoft Premier Support does not have access to virtual machine, or physical machines, that can be used to reproduce faults in Intune and Autopilot. Just let that sink in. Premier Support is unable to attempt to reproduce a fault in Autopilot. It depends on the customer to reproduce it.

We had a long discussion with Premier Support about Offline versus Online apps. The Microsoft documentation for Autopilot recommends in several places that you should use the Offline version of Company Portal. This is counter-intuitive. Offline apps are designed, intended, to be used offline. The scenario given in Microsoft documentation is a kiosk or shared device that is not connected to the internet. The Offline app is installed by DISM in a task sequence, and is used offline. Company Portal, by definition, is of no use offline. It is used to install applications from Intune. If the device were offline, it would not connect to Intune. So why install the Offline version?

We eventually established, at least we think, that an Offline app is in some way cached by Intune; whereas an Online app is obtained directly from a Microsoft Store repository. This seems relevant to the case of the disappearing portal, but we never discovered more about the true difference.

In an early occurrence, we found an AppX event to say that the Company Portal was not installed because of a missing dependency. The missing dependency was the Microsoft Services Store Engagement app. This is the app that enables users to provide feedback. But this app is (apparently) an embedded part of Windows 10 and cannot be missing. We heard no more about this.

The Company Portal stopped disappearing for a while, and we deduced that the fault was in some way related to version updates. It occurred frequently when the version changed from 111.1.107.0 to 11.1.146.0. It has started to occur frequently again now the version is 11.1.177.0. Of course, we have no idea how it is related to the update. We don’t even really know how an update of an Offline app happens.

Finally Microsoft Premier Support has asked us to gather a SysInternals Procmon log, together with a UXTrace log. I have done a lot of troubleshooting with Procmon. It generates huge log files, of every file and registry operation, as well as some TCP operations. To use Procmon effectively, you need a way to stop it when the fault occurs. Microsoft Premier Support simply asked us to run it and stop it when the fault occurred. There are several problems with this. The first is that the user needs to run UXTrace and Procmon with elevated rights. In our environment (as in almost any production environment) the user does not have admin rights and cannot elevate. The second is that Procmon creates huge logs. You can’t just run it for an unspecified length of time, then stop it and save the log. Microsoft Premier Support were clearly unable to understand the problem of gathering the logs, let alone provide a solution. This is dismal. I would expect a competent second-line engineer to have the skills to work out a strategy for collecting logs. It is part of the basic skill of troubleshooting.

So, three months on, Microsoft Premier Support has no clue, and no practical problem-solving approach.

The thing we have found is that Premier Support seems to have no access to the people who develop and run Intune, Autopilot and Company Portal. They are just as much in the dark as you or I.