Docker Overlay Network Details

Docker Swarm uses an overlay network for communication between containers on different hosts, and for load balancing incoming traffic to a service. On Windows Server 2016 before Windows Update KB4015217 this overlay network is not supported. After KB4015217 the communication between containers works, but the routing mesh that load balances incoming traffic is not supported. Now with Windows Server 2016 version 1709 the routing mesh works as well. The purpose of this post is to take an in depth look at how the overlay network and the routing mesh work in practice.

Testing environment

This is my environment for testing:

  1. Two hosts with Windows Server 2016 version 1709 on the same vnet in Azure
  2. Both hosts with the Hyper-V role and the Windows Containers feature
  3. Both hosts running experimental Docker 17.10
  4. A Docker Swarm service with three containers, running the image Microsoft/IIS:windowsservercore-1709, with a published port 80
  5. A third host running Portainer and the new Project Honolulu server management gateway.

I tested before that I can reach any container on any host, on port 80, from an external client. I also tested that I can ping and telnet between containers.

Theory

The Docker documentation describes how this works on Linux: Designing Scalable, Portable Docker Container Networks. Containers are assigned to a Virtual Extensible LAN (VXLAN) and traffic between containers on different hosts is encapsulated in UDP packets on port 4789. The routing mesh is implemented by Linux IP Virtual Server (IPVS) layer 4 switching.

On Windows, it is a bit more difficult to piece together the documentation. This is because containers on Windows are just part of a swathe of Azure, Hyper-V and Windows technologies.

SDN comes from implementing multi-tenant architectures in Azure, where VM’s on different hosts, in different datacentres, need to communicate securely and in isolation from other tenants. This is not very different from containers in different Swarm services communicating with each other but not with other services.

VXLAN is a generic standard documented in RFC 7348. There are a lot of different diagrams of the VXLAN, but basically a Layer 2 switched packet between containers on different hosts is encapsulated in a UDP packet and sent across the host network.

Implementation

When we initialise the Docker Swarm, a default overlay network is created, called “ingress”. We can see this with docker network ls.

NETWORK ID NAME DRIVER SCOPE
xio0654aj01a ingress overlay swarm
5bcf2a6fe500 nat nat local
cef0ceb618b6 none null local

This is in addition to the default NAT network created when we add the Containers feature. With docker network inspect ingress we can see the details of this network:

  • It has an ID of xio0654aj01a6x60kfnoe4r12 and a subnet of 10.255.0.0/16
  • Each container on the network has: an endpoint ID; an IP address on the subnet, and a unique MAC address
  • Each node has one ingress-endpoint, again with: an endpoint ID; an address and a MAC address.
"ConfigOnly": false,
"Containers": {
"206fe3c22aa9682f6db7c0ff2d2665ea647d2d2825218a9a1a6ee6bda4c80de7": {
"Name": "web.2.03uu9bab6n416jqi0reg59ohh",
"EndpointID": "136a5e8a952b7bc3da6b395e9ff3fb138cd93c97e3fafda1299f804f9cbe2bf1",
"MacAddress": "00:15:5d:71:af:d8",
"IPv4Address": "10.255.0.6/16",
"IPv6Address": ""
},
"92d6b5d2c353d43dad6e072e25865bdf91003b069fd3a527d953b9a62384f0a0": {
"Name": "web.3.nzxp6uhcvxhejp2iodd29l3gu",
"EndpointID": "b1937b9d22d2aa9881d0e45b16bc7031b2d4d07d4d0059531d64a6ade5a5242e",
"MacAddress": "00:15:5d:71:a4:c5",
"IPv4Address": "10.255.0.7/16",
"IPv6Address": ""
},
"ingress-sbox": {
"Name": "ingress-endpoint",
"EndpointID": "7037a8b3628c9d5d49730472c37a800e4d1882f0cb125ec75e75477c02104526",
"MacAddress": "00:15:5d:71:a7:dd",
"IPv4Address": "10.255.0.2/16",
"IPv6Address": ""
}
},

In this case there are two containers on the host. If we look on the other host, we see the third container (of three replicas in the service) and a different endpoint.

We can also see the ingress network, the web service and the containers in Portainer, a simple management GUI for containers:

Docker Network Ingress

If we look inside a container, with docker exec -it web.2.03uu9bab6n416jqi0reg59ohh powershell and ipconfig /all, we can see that the endpoint ID is the ID of the NIC, and the IP address and MAC address also belong to this NIC:

Ethernet adapter vEthernet (136a5e8a952b7bc3da6b395e9ff3fb138cd93c97e3fafda1299f804f9cbe2bf1):
Connection-specific DNS Suffix . : nehng5n4bb2ejkdqdqbqdv4dxe.zx.internal.cloudapp.net
Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #5
Physical Address. . . . . . . . . : 00-15-5D-71-AF-D8
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::7dfd:d3f7:6350:759d%32(Preferred)
IPv4 Address. . . . . . . . . . . : 10.255.0.6(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . : 10.255.0.1
DNS Servers . . . . . . . . . . . : 10.255.0.1
168.63.129.16
NetBIOS over Tcpip. . . . . . . . : Disabled

To see how the ingress network is implemented, we need to look at the host networking configuration. With Get-VMSwitch we can see that there is a Hyper-V virtual switch with the same name as the Docker ingress network ID:

Name SwitchType NetAdapterInterfaceDescription
---- ---------- ------------------------------
nat Internal
xio0654aj01a6x60kfnoe4r12 External Microsoft Hyper-V Network Adapter #5

With Get-VMSwitchExtension -VMSwitchName xio0654aj01a6x60kfnoe4r12 we can see that the switch has a Microsoft Azure VFP Switch Extension:

Id : E9B59CFA-2BE1-4B21-828F-B6FBDBDDC017
Name : Microsoft Azure VFP Switch Extension

If we do ipconfig /all on the host we see two network adapters. The primary host network adapter:

Ethernet adapter vEthernet (Ethernet 5)

and an adapter attached to the Docker NAT network:

Ethernet adapter vEthernet (nat)

But if we run Get_NetworkAdapter we see three:

Name InterfaceDescription ifIndex Status MacAddress LinkSpeed
---- -------------------- ------- ------ ---------- ---------
vEthernet (Ethernet 5) Hyper-V Virtual Ethernet Adapter #2 16 Up 00-22-48-01-00-03 40 Gbps
vEthernet (nat) Hyper-V Virtual Ethernet Adapter 3 Up 00-15-5D-6A-D6-E2 10 Gbps
Ethernet 5 Microsoft Hyper-V Network Adapter #5 11 Up 00-22-48-01-00-03 40 Gbps

The extra one, named “Ethernet 5” with Interface Description “Microsoft Hyper-V Network Adapter 5”, on the same MAC address as the primary host adapter, and with no IP address, is the ingress endpoint on the overlay network.

We can see this in the Project Honolulu browser-based server manager.

The adapters:

Honolulu Docker1 Adapters

The Hyper-V ingress network switch:

Honolulu Docker1 Ingress Switch

Trace: incoming

I previously did a trace of the traffic, first into a container from a remote client and second, between containers. With Microsoft Message Analyzer we can see what happens.

Here is the flow of an HTTP request on port 80 from a remote client to one of the swarm nodes, and load balanced to a container on the same host.

In the first message a TCP packet arrives at the IP address of the host adapter:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
3526None2017-11-08T17:02:15.98638390.187222500TCPFlags: ......S., SrcPort: 53711, DstPort: HTTP(80), Length: 0, Seq Range: 1862583515 - 1862583516, Ack: 0, Win: 65535(negotiating scale factor: 3)

In the second message, the packet is received by the Hyper-V switch for the overlay network:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
3527None2017-11-08T17:02:15.98639200.000008100Microsoft_Windows_Hyper_V_VmSwitchNBL 0xFFFF880A4600B370 received from Nic /DEVICE/{DAB8937D-9AD5-460E-8652-C2E152CCE573} (Friendly Name: Microsoft Hyper-V Network Adapter #5) in switch A404BC57-741B-4C79-8BA5-1D7D3FDA92C1 (Friendly Name: xio0654aj01a6x60kfnoe4r12)

In the third message the packet is routed to the container adapter:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
3591None2017-11-08T17:02:15.98659060.000002200Microsoft_Windows_Hyper_V_VmSwitchNBL 0xFFFF880A492B1030 routed from Nic 533EF66B-A5F3-4926-A1EE-79AF499F85C7 (Friendly Name: Ethernet 5) to Nic F3EA5A0C-2253-472F-8FFA-3467568C6D00 (Friendly Name: 136a5e8a952b7bc3da6b395e9ff3fb138cd93c97e3fafda1299f804f9cbe2bf1) on switch A404BC57-741B-4C79-8BA5-1D7D3FDA92C1 (Friendly Name: xio0654aj01a6x60kfnoe4r12)

In the fourth message, the packet is received by the container adapter:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
3592None2017-11-08T17:02:15.98659320.000002600Microsoft_Windows_Hyper_V_VmSwitchNBL 0xFFFF880A492B1030 delivered to Nic F3EA5A0C-2253-472F-8FFA-3467568C6D00 (Friendly Name: 136a5e8a952b7bc3da6b395e9ff3fb138cd93c97e3fafda1299f804f9cbe2bf1) in switch A404BC57-741B-4C79-8BA5-1D7D3FDA92C1 (Friendly Name: xio0654aj01a6x60kfnoe4r12)

And in the fifth message the first packet is delivered:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
3593None2017-11-08T17:02:15.98661680.00002364288TCPFlags: ......S., SrcPort: 65408, DstPort: HTTP(80), Length: 0, Seq Range: 1862583515 - 1862583516, Ack: 0, Win: 65535(negotiating scale factor: 3)

You will notice that the sent packet is from port 53711 to port 80. But the arrived packet is from port 65408 to port 80. You can’t see it in this summary of the message, but the sent packet is from the client IP address 92.234.68.72 to the host IP address 10.0.0.4 while the arrived packet is from the ingress-endpoint IP address 10.255.0.2 to the container IP address 10.255.0.6. The virtual switch has re-written the source port and address of the packet. The container sends a reply packet to the ingress-endpoint, where the switch again re-writes the source and destination addresses to send the reply back to the client.

From the point of view of the host, there is:

  • no route to the ingress network 10.255.0.0/16
  • no ARP cache addresses for endpoints on the ingress network
  • no host process listening on port 80
  • a virtual adapter (Friendly Name: Microsoft Hyper-V Network Adapter #5), with the same MAC address as the primary adapter (00-22-48-01-00-03), but with no IP address, attached to a virtual switch (Friendly Name: xio0654aj01a6x60kfnoe4r12), which is the switch for the ingress network.

The virtual switch intercept the request on the published port 80 (using the Azure Virtual Filtering Platform switch extension?) and forwards it to one of the containers.

From the point of view of the container, there is:

  • no route to the host network 10.0.0.0/24
  • no ARP cache address for endpoints on the host network
  • an ARP cache address for the ingress-endpoint 10.255.0.2, with the same MAC address as the primary host network adapter (00-22-48-01-00-03)
  • a process (web server) listening on port 80
  • a virtual adapter (Friendly Name: 136a5e8a952b7bc3da6b395e9ff3fb138cd93c97e3fafda1299f804f9cbe2bf1) attached to the same virtual switch (Friendly Name: xio0654aj01a6x60kfnoe4r12) as the phantom adapter on the host.

The virtual switch receives the reply from the container and forwards it to the MAC address of the ingress-endpoint, which is the same as the MAC address of the primary network adapter of the host. The host network adapter sends the reply to the remote client.

This trace has been for incoming traffic from an external client. The next trace is for inter-container traffic across hosts.

Traffic: inter-container

Here is the flow of a ping from a container on one host to a container on the other. The trace is being performed on the receiving host. We need to dissect each packet to see what happens.

The first packet arrives, an echo (ping) request. This is the content of the packet:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
8852None2017-11-08T18:34:34.80668870.033675500ICMPEcho Operation
8852None2017-11-08T18:34:34.80668870.000000000ICMPEcho Request
8852None2017-11-08T18:34:34.80668870.000000000IPv4Next Protocol: ICMP, Packet ID: 29796, Total Length: 60
8852None2017-11-08T18:34:34.80668870.000000000EthernetType: Internet IP (IPv4)
8852None2017-11-08T18:34:34.80668870.000000000VXLANVXLAN Frame
8852None2017-11-08T18:34:34.80668870.000000000UDPSrcPort: 1085, DstPort: VXLAN(4789), Length: 90
8852None2017-11-08T18:34:34.80668870.000000000IPv4Next Protocol: UDP, Packet ID: 30052, Total Length: 110
8852None2017-11-08T18:34:34.80668870.000000000EthernetType: Internet IP (IPv4)

From inside to outside, the packet is structured as follows:

  • ICMP Echo Eequest
  • IPv4 protocol ICMP, from source address 10.255.0.5 (the remote container) to destination address 10.255.0.7 (the local container)
  • Ethernet from source MAC address 00-15-5D-BC-F9-AA (the remote container) to destination MAC address 00-15-5D-71-A4-C5 (the local container). These are Hyper-V MAC addresses on the ingress network. The host network does not know anything about these IP or MAC addresses.
  • ———– so far, this is the original packet sent by the remote container————
  • VXLAN header with network identifier 4096. This is the VXLAN ID shown by docker network inspect ingress
  • Outer UDP header, from source port 1085 to destination port 4789 (the standard port for VXLAN traffic)
  • Outer IPv4 header, protocol UDP, from source address 10.0.0.5 (the remote host) to destination address 10.0.0.4 (the local host)
  • Outer Ethernet header, from source MAC address 00-22-48-01-9E-11 (the primary adapter of the remote host) to destination MAC address 00-22-48-01-00-03 (the primary adapter of the local host)

Following the flow of messages, the packet is received by the Hyper-V switch for the overlay network:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
8853None2017-11-08T18:34:34.80669300.000004300Microsoft_Windows_Hyper_V_VmSwitchNBL 0xFFFF880A4626D6A0 received from Nic /DEVICE/{DAB8937D-9AD5-460E-8652-C2E152CCE573} (Friendly Name: Microsoft Hyper-V Network Adapter #5) in switch A404BC57-741B-4C79-8BA5-1D7D3FDA92C1 (Friendly Name: xio0654aj01a6x60kfnoe4r12)

The packet is routed to the container adapter:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
8867None2017-11-08T18:34:34.80672460.000001300Microsoft_Windows_Hyper_V_VmSwitchNBL 0xFFFF880A4626D6A0 routed from Nic /DEVICE/{DAB8937D-9AD5-460E-8652-C2E152CCE573} (Friendly Name: Microsoft Hyper-V Network Adapter #5) to Nic 0330EF2B-74AB-4E06-A32D-86DA92145374 (Friendly Name: b1937b9d22d2aa9881d0e45b16bc7031b2d4d07d4d0059531d64a6ade5a5242e) on switch A404BC57-741B-4C79-8BA5-1D7D3FDA92C1 (Friendly Name: xio0654aj01a6x60kfnoe4r12)

The packet is received by the container adapter:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
8868None2017-11-08T18:34:34.80672690.000002300Microsoft_Windows_Hyper_V_VmSwitchNBL 0xFFFF880A4626D6A0 delivered to Nic 0330EF2B-74AB-4E06-A32D-86DA92145374 (Friendly Name: b1937b9d22d2aa9881d0e45b16bc7031b2d4d07d4d0059531d64a6ade5a5242e) in switch A404BC57-741B-4C79-8BA5-1D7D3FDA92C1 (Friendly Name: xio0654aj01a6x60kfnoe4r12)

The original packet is delivered, minus the VXLAN header and UDP wrapper:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
8869None2017-11-08T18:34:34.80672960.000002700ICMPEcho Operation
8869None2017-11-08T18:34:34.80672960.000000000ICMPEcho Request
8869None2017-11-08T18:34:34.80672960.000000000IPv4Next Protocol: ICMP, Packet ID: 29796, Total Length: 60
8869None2017-11-08T18:34:34.80672960.000000000EthernetType: Internet IP (IPv4)

You can see it has taken 0.4 milliseconds to process the packet in the switch.

Traffic: incoming across hosts

With the routing mesh, incoming traffic from a remote client to any node in the swarm can be load balanced and routed to a container on a different node. This uses the routing mesh to handle the incoming and outgoing traffic, and the overlay network to handle the traffic between container and node.

In this example the incoming packet arrives at host Docker2. It is load balanced to a container running on host Docker1. The trace is running on Docker1, receiving the packet from Docker 2.

This time the incoming TCP packet has the same VXLAN and UDP headers as inter-container traffic (when it is across hosts):

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
11165None2017-11-08T17:02:50.33488900.048658400TCPFlags: ......S., SrcPort: 65408, DstPort: HTTP(80), Length: 0, Seq Range: 4237068666 - 4237068667, Ack: 0, Win: 29200(negotiating scale factor: 7)
11165None2017-11-08T17:02:50.33488900.000000000IPv4Next Protocol: TCP, Packet ID: 41609, Total Length: 60
11165None2017-11-08T17:02:50.33488900.000000000EthernetType: Internet IP (IPv4)
11165None2017-11-08T17:02:50.33488900.000000000VXLANVXLAN Frame
11165None2017-11-08T17:02:50.33488900.000000000UDPSrcPort: 40558, DstPort: VXLAN(4789), Length: 90
11165None2017-11-08T17:02:50.33488900.000000000IPv4Next Protocol: UDP, Packet ID: 41865, Total Length: 110
11165None2017-11-08T17:02:50.33488900.000000000EthernetType: Internet IP (IPv4)

The UDP and VXLAN headers are stripped off by the switch, routed and presented to the container as standard TCP, coming from the ingress-endpoint on the other host with address 10.255.0.3:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
11186None2017-11-08T17:02:50.33495200.000004000TCPFlags: ......S., SrcPort: 65408, DstPort: HTTP(80), Length: 0, Seq Range: 4237068666 - 4237068667, Ack: 0, Win: 29200(negotiating scale factor: 7)
11186None2017-11-08T17:02:50.33495200.000000000IPv4Next Protocol: TCP, Packet ID: 41609, Total Length: 60
11186None2017-11-08T17:02:50.33495200.000000000EthernetType: Internet IP (IPv4)

This time the container makes an ARP request to find the MAC address of the ingress-endpoint on the other host that sent it the packet:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
11187None2017-11-08T17:02:50.33503730.000085345944ARPREQUEST, SenderIP: 10.255.0.7, TargetIP: 10.255.0.3
11187None2017-11-08T17:02:50.33503730.000000045944EthernetType: ARP

The ARP request is intercepted by the VFP extension in the switch and dropped:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
11192None2017-11-08T17:02:50.33505780.000001745944Microsoft_Windows_Hyper_V_VmSwitchNBLs were dropped by extension {24C70E26-D4C4-42B9-854A-0A4B9BA2C286}-{E9B59CFA-2BE1-4B21-828F-B6FBDBDDC017}-0000 (Friendly Name: Virtual Filtering Platform VMSwitch Extension) in switch A404BC57-741B-4C79-8BA5-1D7D3FDA92C1 (Friendly Name: xio0654aj01a6x60kfnoe4r12). Source Nic 0330EF2B-74AB-4E06-A32D-86DA92145374 (Friendly Name: b1937b9d22d2aa9881d0e45b16bc7031b2d4d07d4d0059531d64a6ade5a5242e), Reason Outgoing packet dropped by VFP

The switch fabricates an ARP reply:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
11200None2017-11-08T17:02:50.33522190.000004839363284ARPREPLY, SenderIP: 10.255.0.3, TargetIP: 10.255.0.7
11200None2017-11-08T17:02:50.33522190.000000039363284EthernetType: ARP

The container replies to the SYN with an ACK:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
11201None2017-11-08T17:02:50.33522900.000007139363284TCPFlags: ...A..S., SrcPort: HTTP(80), DstPort: 65408, Length: 0, Seq Range: 3626128581 - 3626128582, Ack: 4237068667, Win: 65535(negotiating scale factor: 8)
11201None2017-11-08T17:02:50.33522900.000000039363284IPv4Next Protocol: TCP, Packet ID: 17960, Total Length: 52
11201None2017-11-08T17:02:50.33522900.000000039363284EthernetType: Internet IP (IPv4)

This is routed by the virtual switch and emerges at the host adapter as a reply, wrapped in the VXLAN and UDP headers:

MessageNumberDiagnosisTypesTimestampTimeDeltaEventRecord.Header.ProcessIdEventRecord.Header.ThreadIdModuleSummary
11217None2017-11-08T17:02:50.33528510.000001039363284TCPFlags: ...A..S., SrcPort: HTTP(80), DstPort: 65408, Length: 0, Seq Range: 3626128581 - 3626128582, Ack: 4237068667, Win: 65535(negotiating scale factor: 8)
11217None2017-11-08T17:02:50.33528510.000000039363284IPv4Next Protocol: TCP, Packet ID: 17960, Total Length: 52
11217None2017-11-08T17:02:50.33528510.000000039363284EthernetType: Internet IP (IPv4)
11217None2017-11-08T17:02:50.33528510.000000039363284VXLANVXLAN Frame
11217None2017-11-08T17:02:50.33528510.000000039363284UDPSrcPort: 37734, DstPort: VXLAN(4789), Length: 82
11217None2017-11-08T17:02:50.33528510.000000039363284IPv4Next Protocol: UDP, Packet ID: 18216, Total Length: 102
11217None2017-11-08T17:02:50.33528510.000000039363284EthernetType: Internet IP (IPv4)

This reply is forwarded across the host network to the other host. The virtual switch on the other host fabricated a reply to the remote client. This is not shown here, but is the same as the reply in the first trace above.

So there we have it: Windows Server 2016 version 1709 with the Docker overlay network and routing mesh, using Software Defined Networking, Hyper-V switches and the Azure Virtual Filtering Platform virtual switch extension.

Docker Swarm Networking

Docker Swarm enables containers to operate together to provide a service, across different nodes in a cluster. It uses an overlay network for communication between containers on different hosts. It also supports a routing mesh, which load-balances and routes incoming connections to the containers. On Windows Server 2016 before the latest version this routing mesh is not supported. Now it is, with the release of version 1709, so we can see how it all works.

Docker Swarm enables containers to operate together to provide a service, across different nodes in a cluster.

It uses an overlay network for communication between containers providing the same service. You can read an excellent description of it here, in the Docker Reference Architecture: Designing Scalable, Portable Docker Container Networks. The overlay network is implemented as a Virtual Extensible LAN (VXLAN) stretched in software across the underlying network connecting the hosts.

The network has a built-in routing mesh that directs incoming traffic on a published port, on any node, to any container running the service on any node. This diagram illustrates the routing mesh on Linux, where it is implemented in the kernel by the IP Virtual Server (IPVS) component:

Docker_Reference_Architecture-_Designing_Scalable _Portable_Docker_Container_Networks_images_routing-mesh

On Windows Server 2016 version 1607 the routing mesh does not work. Now, with the new Windows Server 2016 version 1709 it does.

Microsoft introduced support for Docker Swarm with overlay networks in April 2017, with KB4015217. This document Getting Started with Swarm Mode describes it, but down at the bottom it says that the routing mesh is not supported. Although you can still publish a port, this limits your options to either one per host, or a dynamic port, and a separate load balancer.

To get the terms straight:

  • Overlay network: a VXLAN shared by containers on different hosts, transported by the underlying host network
  • Routing mesh: load balanced routing of incoming traffic on published ports to the destination port on one of the containers in the service
  • Ingress mode: the port publishing mode that uses the routing mesh, instead of direct connection to ports on the container host (host mode or global mode)
  • "Ingress": the name of the default overlay-type network created by Docker, just as "nat" is the name of the default NAT-type network; but you can create your own overlay network.

Support for the routing mesh and ingress mode has arrived in Windows Server 2016 version 1709 and is now available in Azure too. It is still at an early stage. It requires:

  • A new installation of Windows Server 2016 version 1709
  • Docker EE version 17.10, still in Preview.

To install Docker EE Preview, run:

Install-Module DockerProvider
Install-Package Docker -ProviderName DockerProvider -RequiredVersion Preview -Force

To test this, I created a Docker Swarm service with three replicas on two nodes. I am using the microsoft/iis:windowsservercore-1709 image to have something to connect to:

docker service create --name web --replicas 3 --publish mode=ingress,target=80,published=80 microsoft/iis:windowsservercore-1709

The service is created by default on the "ingress" overlay network, because it has a published port.

With three containers on two nodes, I should be able to see:

  • Both nodes responding to a connection on port 80
  • Two containers servicing the same published port, on one node
  • One container servicing port 80 on the other node
  • Traffic arriving at a node, and going to a container either on the same node, or crossing to a container on the other node
  • All containers able to communicate with each other, on the same Layer 2 switched network.

I am using Portainer as a simple GUI to view the Docker Swarm service. Here is the web service:

Portainer Service List

and the service details:

Portainer Service Details

with the service overlay network:

Portainer Service Network

Using Portainer or the Docker command line (docker service inspect web and docker network inspect ingress), I can see that the containers are on a subnet of 10.255.0.0/16. The network also has one "ingress-endpoint" for each node, with addresses of 10.255.0.2 and .3.

First let’s check that the routing mesh works. Here you can see four different connections (click to see details):

Docker 1 to web.2 – container on same host;

Docker 1 Container 2 crop

Docker 1 to web.3 – different container on same host;

Docker 1 Container 3 crop

Docker 2 to web.1 – container on the other host;

Docker 2 Container 1 crop

Docker 2 to web.3 – container on different host;

Docker 2 Container 3 crop

If I run a network trace I can see how it works. Below isthe conversation between client and container, where the incoming request is routed to a container on the same node:

Connection to Container on Same Host

It consists of exact pairs of packets, If we take a look at one pair:

Source Destination Content
IP address MAC address IP address MAC address TCP
92.234.68.72 12:34:56:78:9a:bc 10.0.0.4 00:22:48:01:00:03 53711 → 80 [SYN]
10.255.0.2 00:22:48:01:00:03 10.255.0.6 00:15:5d:71:af:d8 65408 → 80 [SYN]

00:22:48 is the Vendor ID of adapters in the Azure VMs. 00:15:5d is the Vendor ID of Hyper-V adapters created by the Host Network Service for containers.

The packet has come from the external client on 92.234.68.72. The host adapter has received the packet from the client on its external IP address of 10.0.0.4, on port 80; and sent it with the same MAC address, but with the IP address of the ingress-endpoint 10.255.0.2, to port 80 on one of the containers. The same process happens in reverse with the reply.

Below is the conversation between client and container when the incoming request is routed to a container on a different node:

Connection to Container on Different Host

In this case we don’t see the translation between node and ingress-endpoint, because it is on the other container. Instead we see that the request comes from the ingress-endpoint of the sending node, using the MAC address of the host adapter. The reply is sent to the ingress-endpoint using the MAC address of the overlay network adapter.

Source Destination Content
IP address MAC address IP address MAC address TCP
10.255.0.3 00:22:48:01:9e:11 10.255.0.7 00:15:5d:71:a4:c5 65408 → 80 [SYN]
10.255.0.7 00:15:5d:71:a4:c5 10.255.0.3 00:15:5d:bc:f5:40 80 → 65408 [SYN, ACK]

In between the two packets, we see the container broadcast to find the MAC address of the ingress-endpoint. All communication between entities in the overlay network is by Layer 2 switching.

Below is the conversation between two containers on different nodes:

Ping Container to Container on Different Host

The containers are on the same Layer 2 broadcast domain. There is no firewall between them, even though the two nodes both operate the Windows Firewall and do not communicate openly with each other. The containers can ping each other and connect on any listening port.

We will have to dig a bit deeper to find out what makes this work, but for the moment we can see that:

  • The overlay network is a switched LAN segment stretched across the hosts
  • The ingress-endpoints act as load-balancing and routing gateways between the nodes and the container network.

Docker Swarm on Windows

Docker Swarm enables containers to be managed across different hosts. It work on Windows Server 2016 hosts, but the built-in routing mesh is not supported until the newest Windows Server version 1709, released in October 2017.

Docker Swarm is the tool for managing containers across separate docker machines. It defines machines as managers or workers. They communicate with each other to implement docker services. A service is a collection of containers running with the same configuration, and following a set of rules to define the service.

Just to complete the picture, Docker Compose is the tool that creates an application from a set of services. The Containers feature in Windows Server 2016 by default includes Docker Swarm but not Docker Compose.

To set up the Swarm cluster we need more than one machine, obviously. Azure Container Service (ACS) does not currently include Windows hosts, although it is changing so fast that may be out of date any time soon. Instead we can create a cluster of Windows hosts using the Azure virtual machine scale set with Windows Server 2016 Datacenter – with Containers.

We need to open ports on the Windows firewall on each host to allow communication between the docker machines:

  • TCP port 2377 is for Docker communication between manager and worker.
  • TCP and UDP port 7946 is for the “control plane” communication between hosts (worker to worker). This trafffic synchronises the state of a service between hosts.
  • UDP port 4789 is for the “data plane” VXLAN encapsulated traffic between applications in containers.

To create the swarm, run:

docker swarm init --advertise-addr [IP address of manager]

The default is to listen on all addresses on port 2377 (0.0.0.0:2377), so there is no need to specify it. The dialogue returns a token.

To join a host as a worker, run:

docker swarm join --token [the token number returned when creating the swarm] [the listening address of the manager]

We can add or remove nodes later, as workers or managers. The documentation for setting up and managing the swarm is here: Docker Swarm.

If we want to use a GUI to see what is going on, we can use Portainer. I have described setting it up here: Windows Containers: Portainer GUI. This is what we see in the dashboard after creating the swarm:

Docker Swarm Portainer Dashboard

In the Swarm section, we can see an overview of the cluster:

Docker Swarm Portainer Swarm Cluster

And the default overlay network:

Docker Swarm Portainer Swarm Network

Before we create a service, we need to decide how external clients will connect to containers, and how containers will connect to each other. The default network type in Docker is nat. A port on the host is translated to a port on the container so, for example, we use --publish 80:80. But this limits us to one container only, on that port. If we do not define the host port (by using --publish 80), then one is created dynamically on the host, and so we can have more than one container listening on the same port. But then the client does not know what port on the host to connect to. We would need to discover the dynamic ports and put them into an external load balancer. In the case of a docker service, we would need to do this whenever a new replica is created or removed.

Alternatively we can set up a transparent network, where the container has an externally reachable IP address. This way we can have more than one container listening on the same port. But we would still need to manage the addresses in a load balancer whenever a replica is created or removed.

This is a general problem with service scaling across hosts. The Docker solution is to use an Overlay network for swarm traffic. Connections from external clients arriving at any host are routed to any replica in the service (a “routing mesh”). Connections from one container to another are on a private subnet shared across containers in the swarm, rather than on the subnet shared with the host. 

Windows Server before version 1709 supports the overlay network for communication between containers, but not the routing mesh for communication between external clients and containers. This leads to some confusing documentation.

For version 1709 and beyond, the command to create a service using the overlay network and routing mesh is, for example:

docker service create to create a new service
--name to give the service a friendly name
--replicas to specify the numbers of replicas at any one time
--publish if any ports are to be published externally
[image name] for the name of the image to run.

We can include other options, both for the configuration of the service, and the configuration of the containers. The full command for an IIS web server would be:

docker service create --name web --replicas 2 --publish 80:80 microsoft/iis

By default the containers are attached to the swarm overlay network (called “ingress”). The publishing mode is also “ingress”. Any client connection to any host on port 80 is routed in a round robin to one of the containers on any host participating in the service. The containers can reach each other on their internal network on any port.

Here is the service in Portainer:

Docker Swarm Portainer Service 2

A wide range of parameters is shown in the Service Details:

Docker Swarm Portainer Service Details 2

Portainer shows the published port, in ingress mode:

Docker Swarm Portainer Service Publish Mode Ingress

We can see all the parameters of the service with docker service inspect [service name]. The overlay network has a subnet of 10.255.0.0/16. The service has created a Virtual IP of 10.255.0.4. With docker container inspect [container name] we can see the IP addresses of the containers are 10.255.0.6 and 10.255.0.7.

For version 1607 the routing mesh does not work. The approach that works on the earlier build is to publish the ports in host mode. Each host publishes the port directly, and maps it to the container. If we use a defined port on the host, then we can only have one container per host. Instead of defining the number of replicas we need to specify --mode global, so that one container is created on each node. The command to create the service this way is:

docker service create --name web --mode global --publish mode=host,published=80,target=80 microsoft/iis

If we use a dynamic port on the host, then we can have more than one, but we have to discover the port to connect to. The command to create the service this way is:

docker service create --name web --replicas 2 --publish mode=host,target=80 microsoft/iis

Doing it this way, the container is created on the “nat”network. Portainer shows the published port, in host mode:

Docker Swarm Portainer Service Publish Mode Host

Now we have containers running as a service. If a container fails, another is created. If a node fails or is shutdown, any containers running on it are replaced by new containers on other nodes.

Windows Containers: Hyper-V

An option with Windows Containers is to run a container in Hyper-V Isolation Mode. This blog shows what happens when we do this.

When we run a container normally, the processes running in the container are running on the kernel of the host. The Process ID and the Session ID of the container process are the same as on the host.

When we run a container in Hyper-V Isolation Mode, a utility VM is created and the container runs within that. We need to have the Hyper-V role installed on the host. Then we need to add --isolation hyperv to the docker run command.

Here are some of the main differences.

The processes in the container are isolated from the host OS kernel. The Session 0 processes do not appear on the host. Session 1 in the container is not Session 1 on the host, and the Session 1 processes of the container do not appear on the host.

Container:

Get Process Hyper-V Container

Host:

Get Process Hyper-V Host Same SI

There is no mounted Virtual Hard Disk (VHD):

Disk Management Hyper-V

Instead we have a set of processes for the Hyper-V virtual machine:

Hyper-V Processes on Host

A set of inbound rules is not automatically created on the host Windows firewall. There are no rules for ICC, RDP, DNS, DHCP as there are when we create a standard container:

Firewall Rules Hyper-V Host

But the container is listening on port 135, and we can connect from the host to the container on that port, as we can with a standard container:

Netstat Hyper-V Container Established

And if we create another, standard, container, they each respond to a ping from the other.

Hyper-V does not add to the manageability of containers. The Hyper-V containers do not appear in the Hyper-V management console.

Hyper-V Manager

So in summary: in Hyper-V Isolation Mode the container processes are fully isolated; but the container is not on an isolated network, and is still open to connections from the host and from other containers by default.

Windows Containers: Data

A container is an instance of an image. The instance consists of the read-only layers of the image, with a unique copy-on-write layer, or sandbox. The writable layer is disposed of when we remove the container. So clearly we need to do something more to make data persist across instances. Docker provides two ways to do this.

When Docker creates a container on Windows, the container is instantiated as a Virtual Hard Disk (VHD). You can see the disk mounted without a drive letter, in Disk Management on the host. Docker keeps track of the layers, but the file operations take place inside the VHD.

Host Disk Manager

If we use the interactive PowerShell console to create a new directory in the container, C:\Logs, then this is created directly inside the VHD:

Sandbox Logs

When Docker removes the container, the VHD is also removed and the directory is gone.

Docker provides two ways to mount a directory on the host file system inside the container file system, so that data can persist across instances:

  1. Bind mount
  2. Volume mount.

A bind mount is simply a link to a directory on the host. A volume mount is a link to a directory tracked and managed by Docker. Docker recommends generally using volumes. You can read more about it in the Docker Storage Overview.

The parameter you commonly see to specify a mount is -v or --volume. A newer parameter, and the one Docker recommends, is --mount. This has a more explicit syntax.

In this example, we mount a volume on the host called MyNewDockerVolume to C:\MyNewDockerVolume in the container:

docker run -it --rm --name core --mount type=volume,src=MyNewDockerVolume,dst=C:\MyNewDockerVolume microsoft/windowsservercore powershell

If the volume does not already exist, it is created inside the docker configuration folder on the host:

Docker volumes MyNewDockerVolume

The Hyper-V Host Compute Service (vmcompute.exe) carries out three operations inside the VHD:

CreateFile: DeviceHarddiskVolume13MyNewDockerVolume. Desired Access: Generic Read/Write, Disposition: OpenIf, Options: Directory, Open Reparse Point, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
FileSystemControl:DeviceHarddiskVolume13MyNewDockerVolume. Control: FSCTL_SET_REPARSE_POINT
CloseFile

Now if we look in the VHD, in Explorer, we see the directory implemented as a shortcut:

Sandbox MyNewDockerVolume

In PowerShell, we can see that the directory mode is “l”, to signify a reparse point, or link:

Dir MyNewDockerVolume

Files already in the volume will be reflected in the folder in the container. Files written to the folder in the container will be redirected to the volume.

Windows reparse points come in several flavours: directory or file link; hard link (“junction”) or soft link (“symbolic link” or “symlink”). If we use the command prompt instead of PowerShell we can see that the Docker volume is implemented as a directory symlink:

Dir in Command

Working with data in Windows Containers requires keeping three things in mind:

  1. The difference between bind mount and volume mount
  2. The different syntax for --volume and --mount
  3. Differences in behaviour between Docker on Linux and Windows hosts.

The first two are well documented. The third is newer and less well documented. The main differences I can find are:

  • You cannot mount a single file
  • The target folder in the container must be empty
  • Docker allows plugins for different drivers. On Linux you can use different storage drivers to connect remote volumes. On Windows the only driver is “local” and so the volume must be on the same host as the container.

If you reference a VOLUME in the Dockerfile to create an image, then the volume will be created automatically, if it does not already exist, without needing to specify it in the docker run command.

Windows Containers: Build

This post is a building block for working with containers on Windows. I have covered elsewhere installing the Containers feature with Docker, and running containers with the Docker command line. We can’t do much that is useful without building our own images. Doing this tells us a lot about what we can and cannot do with containers on Windows.

Some preamble:

  1. A container is not persistent. It is an instance of an image. You can make changes inside a running container, for example installing or configuring an application, but unless you build a new container image with your changes, they will not be saved.
  2. A Windows container has no GUI. Any installation or configuration will be done at the command line.
  3. Therefore we should make our changes in a script, containing the instructions to build a new image.
  4. This script is a Dockerfile.

The command to build an image is: docker image build with a range of options, including the path to the Dockerfile.

You can also run: docker image commit to create a new image from a running container. This gives scope for configuring a container interactively before saving it as a new image. But, since the only interface to configure the container is the command line, and since the same commands can be performed in the Dockerfile, this has limited use.

Building an image in Docker is a similar idea to building an image for OS deployment. The Dockerfile is like the task sequence in MDT or SCCM, being a scripted set of tasks. The documentation is here: Dockerfile reference. An example is this one, from Microsoft, for IIS on Windows Server Core:

FROM microsoft/windowsservercore
RUN powershell -Command Add-WindowsFeature Web-Server
ADD ServiceMonitor.exe /ServiceMonitor.exe
EXPOSE 80 
ENTRYPOINT ["C:\ServiceMonitor.exe", "w3svc"]

The basic structure of a Dockerfile is:

  • FROM to specify the image that the new image is developed from
  • ADD or COPY from source to destination to put new files into the image
  • RUN to execute commands to configure the image
  • CMD to specify a command to start the container with, if no other command is specified
  • EXPOSE to indicate what port or ports the application listens on
  • ENTRYPOINT to specify the services or executables that should run automatically when a container is created.

We can immediately see some implications:

  1. We don’t have to build every part of the end image in one Dockerfile. We can chain images together. For example, we could build a generic web server FROM microsoft/iis, then build specific web sites with other components in new images based on that.
  2. Adding a single feature is easy, like: Add-WindowsFeature Web-Server. But configuring it with all the required options will be considerably more complicated: add website; application pool; server certificate etc.
  3. We may want to bundle sets of commands into separate scripts and run those instead of the individual commands.
  4. There is no RDP to the container, no remote management, no access to Event Logs: and arguably we don’t need to manage the container in the same way. But we can add agents to the image, for example a Splunk agent.
  5. Static data can be included in the image, of course, but if we want dynamic data then we need to decide which folders it will be in, so we can mount external folders to these when we run the container.

It is rather like doing a scripted OS deployment without MDT. I would not be surprised if a GUI tool emerges soon to automate the build scripting.

You may find a number of Dockerfiles for Windows using the Deployment Image Servicing and Management (DISM) tool. There is a confusing choice of tools and no particular need to use DISM (or reason not to). DISM is typically used for offline servicing of Windows Imaging Format (WIM) images. For example it can be used to stream updates and packages into a WIM image by mounting it. But in the case of Docker images the changes are made by instantiating a temporary container for each RUN, and the DISM commands are executed online. This means we can use three different types of command to do the same thing:

  • Install-WindowsFeature from the ServerManager module in PowerShell
  • Enable-WindowsOptionalFeature from the DISM module in PowerShell
  • dism.exe /online /enable-feature from DISM.

Just to make life interesting and keep us busy, the commands to add a feature use different names for the same feature!

Windows Containers: Portainer GUI

When you first set up Containers on Windows Server 2016, you would imagine there would be some kind of management console. But there is none. You have to work entirely from the command line. Portainer provides a management GUI that makes it easier to visualise what is going on.

The Windows Container feature itself only provides the base Host Compute and Host Network Services, as Hyper-V extensions. There is no management console for these. Even if you install the Hyper-V role, as well as the Containers feature, you can’t manage images and containers from the Hyper-V management console.

Images and containers are created and managed by a third party application, Docker. Docker also has no management console. It is managed from the Docker CLI, either in PowerShell or the Command Prompt.

There is a good reason for this. But, for me at least, it makes it hard to visualise what is going on. Portainer is a simple management UI for Docker. It is open source, and itself runs as a container. It works by connecting to the Docker engine on the host server, then providing a web interface to manage Docker.

Portainer Dashboard

Portainer Dashboard

Setting up the Portainer container will also give us a better idea of how to work with Docker. Docker has a daunting amount of documentation for the command line, and it is not easy to get to grips with it.

Configure Docker TCP Socket

The first step in setting up Portainer is to enable the Docker service to listen on a TCP socket. By default Docker only allows a named pipe connection between client and service.

Quick version:

  • create a file with notepad in C:\ProgramData\docker\config
  • name the file daemon.json
  • add this to the file:
    {"hosts": ["tcp://0.0.0.0:2375","npipe://"]}
  • restart the Docker service.

The long version is: the Docker service can be configured in two different ways:

  1. By supplying parameters to the service executable
  2. By creating a configuration file, daemon.json, in C:\ProgramData\docker\config.

The parameters for configuring the Docker service executable are here: Daemon CLI Reference. To start Docker with a listening TCP socket on port 2375, use the parameter

-H tcp://0.0.0.0:2375

This needs to be configured either directly in the registry, at HKLM\SYSTEM\CurrentControlSet\Services\Docker; or with the Service Control command line:

sc config Docker binPath= ""C:\Program Files\docker\dockerd.exe\" --run-service -H tcp://0.0.0.0:2375"

The syntax is made difficult by the spaces, which require quotation with escape characters.

The easier way is to configure the Docker service with a configuration file read at startup, daemon.json. The file does not exist by default. You need to create a new text file and save it in the default location C:\ProgramData\docker\config. The daemon.json file only needs to contain the parameters you are explicitly configuring. To configure a TCP socket, add this to the file:

{
 "hosts": ["tcp://0.0.0.0:2375","npipe://"]
}

Other options for the configuration file for Docker in Windows are documented here: Miscellaneous Options. For example you can specify a proxy server to use when pulling images from the Docker Hub.

Just to add complexity:

  • the Docker service will not start if the same parameter is set in service startup and in the configuration file
  • You can change the location of the configuration file by specifying a parameter for the service:
    sc config Docker binPath= ""C:\Program Files\docker\dockerd.exe\" --run-service --config-file "[path to file]""

Ports 2375 (unencrypted) and 2376 (encrypted with TLS) are the standard ports. You will obviously want to use TLS in a production environment, but the Windows Docker package does not include the tools to do this. Standard Windows certificates can’t be used. Instead you will need to follow the documentation to create OpenSSL certificates.

Allow Docker Connection Through Firewall

Configure an inbound rule in the Windows firewall to allow TCP connections to the Docker service on port 2375 or 2376. This needs to be allowed for all profiles, because the container virtual interface is detected as on a Public network.

netsh advfirewall firewall add rule name="Docker" dir=in action=allow protocol=TCP localport=2375 enable=yes profile=domain,private,public

Note that, by default, containers do not have access to services and sockets on the host.

Pull the Portainer Image

Back in an elevated PowerShell console, pull the current Portainer image from the Portainer repository in the Docker Hub:

docker pull portainer/portainer

If we look in the images folder in C:\ProgramData\docker\windowsfilter we can see that we have downloaded 6 new layers. We already had two Nano Server layers, because we pulled those down previously.

Portainer Layers

If we look at the image history, we can see the layers making up the image:

docker image history portainer/portainer

Portainer Image History

The two base layers of the Portainer image are Windows Nano Server. We already had a copy of the Nano Server base image, but ours was update 10.0.14393.1593, so we have downloaded a layer for the newer update 10.0.14393.1715. We can also see the action that created each layer.

If we inspect the image, with:

docker image inspect portainer/portainer

we can see some of the things we need to set it up

  1. The container is going to run portainer.exe when it starts
  2. The exposed port is 9000
  3. The volume (or folder) to mount externally is C:\Data

Set up Portainer container

Quick version:

  1. Create a folder in the host called: C:\ProgramData\Containers\Portainer
  2. Open an elevated PowerShell console on the host
  3. Run this command:
    docker run -d --restart always --name portainer -v C:\ProgramData\Containers\Portainer:C:\Data -p 9000:9000 portainer/portainer

The long version is: we need the command line to run the Portainer image:

  1. Standard command to create a container: docker run
  2. We want to run the container detached as a free standing container, with no attached console: -d or --detach
  3. There is no need to remove the container if it is stopped. Instead, we want to restart the container automatically if, for example, the host is rebooted: --restart always
  4. We can give the container a name, to make it easier to manage: --name portainer
  5. Portainer reads information about images and containers directly from Docker, so it does not need to store that. But it needs to store it’s own configuration, for example settings and user passwords. To do this, we need to save the configuration data outside the container.  We can do this in Docker by mounting an external folder in the file system of the container. The folder in the container has already been designated as C:\Data in the image, but the folder in the host can be anything you choose. In this example we are using C:\ProgramData\Containers\Portainer. The folder needs to exist before using this: -v C:\ProgramData\Containers\Portainer:C:\Data
  6. The Portainer process is listening on port 9000 (see above). We can connect to this directly from the host itself, without doing anything more. But the outside world has no access to it. The container is running on a virtual switch with NAT enabled. This does port forwarding from the host to the container. We need to decide what port on the host we would like to be forwarded to port 9000 on the container. If we don’t specify a port, Docker will assign a random port and we can discover it through docker container inspect portainer. Otherwise we can specify a port on the host, which in this case can also be 9000: -p 9000:9000
  7. The image to run: portainer/portainer
  8. We don’t need to specify a command to run, since the image already has a default command: portainer.exe

Putting the parameters together, the full command is:

docker run -d --restart always --name portainer -v C:\ProgramData\Containers\Portainer:C:\Data -p 9000:9000 portainer/portainer

Connect to Portainer

Using a browser on your desktop, connect to the Docker TCP port on the remote host: http://192.168.1.144:9000. Set up a password for the admin user:

Portainer Setup

Set up the Docker host as the endpoint:

Portainer Setup Endpoint

Note that the endpoint is the IP address of the host virtual interface on the container subnet (in this case 172.17.64.1). This address is also the gateway address for the container, but in this context it is not acting as a gateway. The virtual interface on the host is listening on port 2375 for Docker connections.

And we are in:

Portainer Dashboard

We can also connect directly from a browser on the host to the container. For this, we need to use the IP address of the container itself, in this case 172.17.68.78, or whatever address we find from docker container inspect portainer.

The Portainer Container

We don’t need to set up a firewall rule to allow access to the container on port 9000. Docker sets up a bunch of rules automatically when the container is created:

Container Automatic Firewall Rules

These rules include: DHCP; ICMP; and DNS. They also include port 9000 on the host, which we specified would be forwarded to port 9000 in the container:

Container Automatic Firewall Rule for Portainer

In Portainer, when we set up the endpoint (being Docker on the host) we need to specify the virtual interface of the host that is on the same subnet as the container (the 172.17.64.1 address). This is because Windows does not allow the container to connect directly through the virtual interface to a service on the physical interface (192.168.1.144).

If we look at the TCP connections on the host, with: netstat -a -p tcp, we see that there is no active connection to Portainer in the container, although my browser is in fact connected from outside:

Portainer Host TCP Connections

However, if we look at the NAT sessions, with Get-NetNATSession, we see the port forwarding for port 9000 to the container:

Host Get-NetNATSession

Docker has attached a virtual hard disk to the host, being the file system of the container:

Host Disk Manager

If we give it a drive letter we can see inside:

Portainer Container System Drive

The portainer executable is in the root of the drive. C:\Data is the folder that we mounted in the docker run command. Other folders like css and fonts are part of the application. These are contained in the first layer of the image, after the Nano Server layers. The layer was created by the COPY command in the Portainer Dockerfile used to create the image:

FROM microsoft/nanoserver
COPY dist /
VOLUME C:\\data
WORKDIR /
EXPOSE 9000
ENTRYPOINT ["/portainer.exe"]

And here is the portainer process running on the host in Session 2, using:

Get-Process | Where-Object {$_.SI -eq 2} | Sort-Object SI

Portainer Process Running on Host

Security

You can see in the Portainer GUI for creating endpoints that we can connect to Docker with TLS. This assumes we have set up Docker with certificates and specified encrypted TCP connections, covered in the Docker daemon security documentation.

We should also connect to Portainer over an encrypted connection. We can do this by adding more parameters to the docker run command: Securing Portainer using SSL.

More about using Portainer

You can read more about using Portainer in the Portainer documentation.

Windows Containers: Properties

If we create an instance of an image in interactive mode, and run a PowerShell console in it, then we can see inside the container.

In a previous post I used the Nano Server image, because it is small and quick. But Nano Server is a cut down OS so, for the purposes of seeing how a container works, let’s take a look inside a Windows Server Core container. The question of when Nano Server can be used in place of Core is a subject for another time.

The Docker command to do this is:

docker run --rm -it --name core microsoft/windowsservercore powershell

The system information, with systeminfo, shows a Windows server where some of the properties belong to the container, and some to the host. For example, the language and locale belong to the container, but the BIOS and boot time belong to the host:

Container SystemInfo

The TCP/IP information, with ipconfig /all shows that the container has its own:

  • hostname
  • Hyper-V virtual ethernet adapter
  • MAC address
  • IP address in the private Class B subnet, which we saw previously was allocated to the Hyper-V virtual switch
  • gateway, which we saw previously was the Hyper-V virtual ethernet adapter on the host
  • DNS server addresses.

Container IPConfig All

I can connect to the outside world, with ping 8.8.8.8 and get a reply:

Container Ping World

The running processes, from Get-Process, show the PowerShell process, as well as what look like typical user processes. If I run Get-Process | Sort-Object SI I can see that there are two sessions: a system session in Session 0, and a user session in Session 2.

Container Get Process Sort SI

I can start other processes. For example, if I start Notepad, then I see it running as a new process in Session 2.

Container Start Notepad

The services, from Get-Service, show normal system services. It is easier to see if I filter for running services, with:

Get-Service | Where-Object {$_.Status -eq "Running"} | Sort-Object DisplayName

Container Get Service Filter and Sort

I have listening ports, shown with Get-NetTCPConnection, but nothing connected:

Container Get TCP Connection

There are three local user accounts, shown with Get-LocalUser:

Container Get Local User

PowerShell tells me that it is being executed by the user ContainerAdministrator:

Container Get Process PowerShell UserName

In summary, I have something that looks similar to an operating system running a user session. It can start a new process and it can communicate with the outside world.

Let’s see what it looks like from outside. From the host I can ping the container:

Host Ping Container

I can telnet from the host to port 135 (one of the ports that I saw was listening) in the container, and make a connection:

Host Telnet Container

But I can’t make a connection from outside the host. I already know there is no route to the container subnet. What happens if I supply a route? Still no reply. I am not really surprised. The connection would have to go through the host, and there is nothing in the host firewall to allow a connection to the container.

World Ping Container

If I start another container, though, it can ping and get a reply from the first container:

Container Ping Container

If I look in Task Manager on the host, there is no obvious object that looks like a container. I don’t even know what size I would expect it to be. But I notice that the PowerShell process in the container shows as the same process on the host.

Get-Process PowerShell in the container:

Container Get Process PowerShell

Get-Process PowerShell on the host:

Host Get Process PowerShell

The process ID 3632 is the same process. All three PowerShell processes, including the one in the container, are using the same path to the executable. You could say that the container is a virtual bubble (session, namespace or whatever you want to call it) executing processes on the host:

Host Get Process PowerShell Path

If I look at all the processes on the host, I can see that the container’s Session 2 is also Session 2 on the host. Here are the host processes filtered by session:

Get-Process | Where-Object {$_.SI -eq 0 -or $_.SI -eq 2} | Sort-Object SI

Host Get Process Filter and Sort SI

Session 0 (the System session) has a lot more processes than shown inside the container, but Session 2 is the same. Processes like lsass, wininit, csrss are the normal processes associated with a session.

The host does not see the user who is executing the processes. In the container the user is ContainerAdministrator, but there is no such user on the host, and the host does not have the username:

Host Get Process PowerShell UserName

A container is ephemeral. But if I create files inside the container they must be stored somewhere.

In the image folder of the host I can see a new layer has been created:

Docker Images Folder

The layerchain.json file tells me that the layer is chained to the Windows Server Core base image layers. The layer has a virtual hard disk drive called “sandbox”, which sounds like the kind of place that changes would be saved.

If I look in Disk Manager, I can see that a new hard disk drive has been attached to the host:

Host Disk Manager

The disk is the same size as the apparent system drive inside the container. It is shown as Online, but with no drive letter. However, if I give it a drive letter, then I can see the same file system that I was able to see inside the container:

Container Sandbox Disk on Host

So the file system of the container is created by mounting a Hyper-V virtual hard disk drive. This only exists for the lifetime of the container. When the container is removed, any changes are lost.

In summary:

  • From inside, the container appears to have similar properties to a normal virtual machine.
  • The container has a network identity, with a host name, virtual Ethernet adapter and IP address.
  • It can connect to the outside world, and with other containers, but the outside world cannot (until we change something) connect to it.
  • It has a file system, based on the image the container was created from.
  • On the host, the container processes are implemented as a distinct session.
  • The file system of the container is implemented as a virtual hard disk drive attached to the host.
  • Files can be saved to the virtual hard disk drive, but they are discarded when the container is removed.

Windows Containers: Run an Image

A container is an instance of an image. When we “run” the image, a container is created. Let’s see what happens when we do this.

In previous posts I covered installing the Windows Containers feature, and downloading a base image of Windows Nano Server or Windows Server Core.

Docker is the daemon or service that manages images and containers. It is managed at the command line, in PowerShell or the Command Prompt. To use Windows Containers we need to get familiar with the Docker commands.

I am using the base image of Nano Server as an example, because it is simple and small. When you see how it works, it is easy to imagine how other images based on Windows Server Core might also work.

Lets just run the Nano Server image and see what happens. In an elevated PowerShell console:

docker run microsoft/nanoserver

The console blinks, briefly changes to C:\ prompt, then returns to the PowerShell prompt. It seems that nothing happened at all!

Docker Run Nano

We can try:

docker container ls

to see if any containers exist. It shows none. But:

docker container ls -a

(or –all) shows a container that has exited.

Docker Container LS

So there was a container but it exited. I can see that the container has an ID, and a name.

I can start the container with:

docker start [ID]

but it exits again. Clearly it is configured to do nothing and then exit.

Docker Start Container

This container is no use to me, since it just runs and exits. I can remove it with:

docker container rm [ID]

Docker Container Remove

How can I get it to hang around, so I can see what it is? Normally a server waits for something to do, but this container seems to exit if it has nothing to do. It acts more like a process than a server. I could try giving it a command that continues until stopped, like:

docker run microsoft/nanoserver ping -t 8.8.8.8

Now I can see that the container continues to perform the ping. If I disconnect the terminal with Ctrl+C, the container is still running.

Docker Run Nano Ping

If I run:

docker container attach [ID]

then the PowerShell console attaches again to the output of the running ping process.

Docker Container Attach

I need to run:

docker container stop [ID]

to stop the ping process, and:

docker container rm [ID]

to remove the container.

If I want to run a container and see what it is doing, then I can run an interactive container. Putting the commands together:

    • create the container:
docker run [image]
    • remove it when it exits:
--rm
    • double dash is for a full word, or an abbreviation:
--
    • single dash is for a concatenation:
-
    • so -[interactive][tty] is:
-it
    • give the container a name so that I don’t have to find the ID or the random name:
--name [friendly name]
    • a command parameter at the end is the executable to run in the container:
powershell

So:

docker run --rm -it --name nano microsoft/nanoserver powershell

gives me a running container with an attached PowerShell console:

Docker Run Interactive

Now we can see the inside of the container as well as the outside, and get a good idea of how it works.