The list of network requirements for Intune and Autopilot is extensive. This post is about finding out if the client cannot connect to one or more of the required endpoints during Autopilot.
Microsoft publishes the list of required endpoints for Autopilot and Intune. There is no point in repeating the information here. The important points to note are:
- The documents also contain references to secondary services. For example, it requires access to Windows Activation, Azure AD (for authentication), NTP time service and others
- In an enterprise, HTTP and HTTPS traffic will be intercepted by a proxy. Proxy servers may inspect the traffic, and require authentication. All this traffic needs to bypass the proxy; or bypass inspection and authentication. This is not easy to do, and it is easy to make mistakes.
- The lists often require URL’s and wildcards, for example *.microsoftaik.azure.net for TPM attestation.
As a result, you may need to troubleshoot things that should be working, but are not. You need to be able to find out what is blocked, or not connecting.
The simplest way, which everyone will be familiar with, is to use Wireshark. You can install Wireshark at the beginning of Autopilot, and use Alt+Tab to get back to it after the problem has occurred (see Autopilot and Shift+Fn10).
You may think you are looking for a dropped connection. But the place to look for that is on the firewall. In reality the packet capture will require a lot more work to analyse than that. Almost all the traffic is encrypted HTTPS. You cannot see the contents of the transactions. What you will see (roughly) is:
- A DNS query to resolve one of the required endpoints
- A DNS response resolving the query to an IP address corresponding to a traffic manager, load balancer, CDN
- TCP SYN
- TCP ACK
- Client Hello
- Server Hello, Certificate, Server Key Exchange, Server Hello Done
- Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message
- Application Data
Azure Traffic Manager is a DNS service that distributes client traffic to the closest available Azure endpoint. If the client makes a DNS query for (for example) fe2cr.update.microsoft.com, it will obtain the IP address of an endpoint corresponding to fe2cr.update.msft.com.trafficmanager.net.
This means you can’t expect to see traffic to and from the endpoints contained in the lists of endpoints. A query for account.live.com will result in traffic to and from l-0013.l-msedge.net.
If you use a display filter of “dns”, you will see all the queries made. You can check on the proxy that all those names bypass inspection and authentication. If they do not, you may expect to see:
- No application data after a handshake
- A TCP FIN followed by another attempt to connect with a new TCP SYN.
As the Autopilot traffic is nearly all HTTPS, you could also use Fiddler to capture the exchange between client and server. When you install Fiddler, you choose to accept the Fiddler certificate. This means that the traffic can be proxied and decrypted by Fiddler. As a result, you should be able to see the content of the transactions.
I recently had a case where Autopilot failed at the start of the Account Setup phase of Autopilot. As it happens, I could see in the Microsoft-Windows-Shell-Core/Operational event log that a “AAD user token request failed” but not why it failed. I wanted to do a packet capture during the time this request failed. But you cannot break into Autopilot during the account setup phase and, even if you could, it would be too late to start a capture. Ideally you want to start a capture at a stopped point in the process, so that you have time to set it up before continuing.
This was a case where the Network Shell utility (netsh.exe) can help. Netsh can record a trace file (*.etl) using different providers. Most importantly, it is not limited to packet capture. It records the activity of each provider specified in the command. Providers are bundled up in “scenarios” for different types of network problem. The “InternetClient” scenario, for example, contains the following providers that we may be interested in:
- Web-Http
- WinInet
- WebIO
- WinHttp
- DNS-Client
- TCPIP.
A unique advantage of Netsh is the “Persistent” parameter. This enables the trace to continue through a restart. By making it persistent, we can start a trace in the Device Setup phase, continue through a restart into the Account Setup phase, and then stop the trace when the fault has occurred.
A difficulty to overcome is that the trace needs to be started and stopped in the same security context. In Device Setup, the console user is DefaultUser0 with admin rights. But this account does not persist into Account Setup.
To get around this, we need to use PsExec. This is the SysInternals utility that enables me (among many other things) to run a command in the system context. If I start the trace in the system context, then I can do the same to stop it. So the full plan of attack is:
- Shift+Fn10.
- In the console, download PsExec or copy it from a USB stick.
- Run: “psexec.exe –s –i cmd.exe”. This brings up a new console running in the System context.
- In the new console, run: “netsh trace start capture=yes report=yes persistent=yes traceFile=[path to trace.etl] scenario=InternetClient”. Depending on the circumstances, you can use maxSize to limit the size of the trace file. By default, the logging is circular.
- Continue through to when the fault occurs.
- In this case, because Autopilot was failing in Account Setup, I needed to use the “Continue” option in the Enrollment Status Page (ESP) profile, so that I could finish and get back to a user session, to be able to run PsExec again and stop the trace with “Netsh trace stop”.
A significant disadvantage of Netsh is that the best way to read the ETL file is with Microsoft Message Analyzer. But this is now obsolete and no longer available to download. There are utilities to convert the ETL to pcap, so it can be read in Wireshark, but that loses the benefit of the network provider logging, like DNS-Client and WinInet. I keep a copy of Microsoft Message Analyzer just for this.
So, this post has given three tools and methods to investigate a network fault causing Autopilot to fail. You will need to adapt the methods to the particular circumstances of the case.