↓ Archives ↓

Category → DR

Methods for relocating network connectivity

Methods for redirecting network connectivity

When a client talks to a service over a network, and the server providing the service fails, or the service needs to be moved for administrative reasons, what methods are available for redirecting network references to that service refer to a different server?

This post outlines the methods that I know of for doing this.  But note the word redirect - that word is the key to all these methods.  These different methods are ways of redirecting various layers in the networking stack.  So let's first look at what happens for a normal IPv4 network connection over ethernet, and what all layers are involved, and what all places there are to redirect (or reroute) the network traffic to another server.

For simplicity's sake, we will ignore load balancers - both in hardware form, and DNS-level load balancers, and we assume a modern "switched" network.

Information Routing

What happens when a client establishes a connection to a service on a server?

Here is a brief synopsis of how connections get established.

  1. The client is given or obtains from configuration information, bookmarks, etc. a name of the server running that service.
  2. The client consults a DNS server to translate the server/service name into a 32-bit IPv4 address.
  3. The client holds onto this name/IPv4 mapping in order to optimize future references to the server name.  DNS lookup libraries normally do this themselves, but some applications also perform their own address caching.
  4. The client then examines the IPv4 address, and determines which interface and gateway to send it out on on the basis of its local configuration and the IPv4 address itself.
  5. The client OS then sends out an ARP packet to determine the 48-bit Media Access Control (MAC) address of the gateway, or the server itself, if the client and server are on the same subnet.  It may have this MAC/IP correspondence cached from earlier packets it had received from the server.
  6. The client OS sends out the packet to the MAC address determined in step 5 over the interface selected in step 4.
  7. At some earlier time, the switch network will have "learned" which switch port the corresponds to the selected MAC address.  It does this by observing which port sends packets for that given MAC address.
  8. The switch network then routes the packet to the chosen MAC address on the subnet (this could either be the MAC address of the gateway or the server - as discussed earlier).
  9. If the server is on the same subnet as the client, it receives the packet and examines the packet to see if the destination IP address is one it provides.  If it does, then all is well.  If it does not, then the packet is dropped.    So ends the "same subnet" case.
  10. Assuming the server is not on the same subnet as the client...
  11. The gateway receives the packet, and examines its routing table to decide where to route the packet to.  This is determined by the routing protocol the gateway is running - for example, OSPF or BGP.
  12. The "network cloud" routes the packet to the "final gateway" on the same subnet as the destination server.  As before, this is determined by the various routing protocol(s) along the way from the first gateway to the last one.  (This explanation is similar to the "then a miracle occurs" in the middle of a math proof).
  13. The final gateway sends out an ARP packet to determine the MAC address of the destination server.  It is typically cached for a few minutes up to an hour.
  14. The final gateway then sends the packet out to the MAC address above over the selected interface based on the routing protocol it is running.
  15. The destination server receives the packet and examines the packet to see if the destination IP address is one it provides.  If it does, then all is well.  If it does not, then the packet is dropped.

There are several address transformations that transform from one conceptual address space into another lower level address space.  These are:

  • Translation from "conceptual knowledge" of the server to the DNS name of the server.
  • Translation from the DNS server name to the destination IPv4 address
  • Translation of the destination IPv4 address to the destination gateway using routing information.
  • Translation from the destination IPv4 address to the destination MAC address.
  • Translation from MAC address to destination switch port.

Each of these transformations is a place where a redirection can occur.

  • The conceptual knowledge layer can be redirected by telling all clients to switch to a new server name.
  • The DNS layer can be redirected by updating DNS entries.
  • The network routing layer can be redirected by updating routing information in the network and pushing out the new route information.
  • The IPv4->MAC layer can be redirected by updating the ARP information and forcing the various ARP caches to be updated.
  • The MAC->switch port can be redirected by updating MAC addresses and forcing the switch network to learn the new MAC->switch correspondence.

Subsequent sections present detailed explanations of how to perform these various kinds of redirections.

Conceptual Knowledge Layer

There is no universal automated to update the conceptual knowledge layer - nevertheless, server relocations are sometimes handled at this layer.  One can use automated client update tools to update client configuration files, one can use word-of-mouth, email, or any number of ad hoc tools.  This is the least commonly used method for redirection on failure.  Arguably, since it is hard to automate, it doesn't have much place on a blog on managing with automation.

DNS layer

Updating the DNS layer can be easily automated.  The advantages are - it's universal, and little or no prior preparation has to occur, and no server/network political boundaries have to be dealt with, and the two servers don't have to be on the same subnet.  The disadvantages are - not all clients use DNS addresses, Client OS DNS caching can interfere, Client software itself can interfere by caching the address outside DNS.  Even with Dynamic DNS, it can take minutes to hours for changes to propagate and the new server address become known (and usable) to all clients.  If the client application caches the address itself, then client applications have to be restarted.  This last subcase can be difficult to automate.

Network routing layer

If a server fails, routing can be used to redirect the traffic for the failed server to another server on a different physical access segment.  The advantages of this are - the two servers don't have to be on the same network segment, routing protocols are designed to deal with this kind of situation.  The disadvantages include - if the IP address is public, then  you have to move over at least 256 addresses at a time, there are often political boundaries making it hard for servers to automatically update network routing information, the additional routes for handling a large number of such movable addresses may slow down the routers involved.

ARP layer

When a server fails, another server can bring up the IP address of the dead service, update the ARP cache (typically using gratuitous ARPs - sometimes called ARP spoofing) and packets destined for the now-dead server go to the live one.  The advantages include:  IP address takeover can occur in less than a second, there is well-tested software for doing this, most organizations have a good method for allocating and managing additional IP addresses.  The disadvantages include:  The two servers have to be on the same network segment, some organizations lock down their network gear to make this "impossible" (which it doesn't - it just slows it down), and it typically increases the number of IP addresses needed by the servers and services.

MAC->switch port layer

MAC address takeover is a technique where a given network card is given multiple MAC addresses - one for an administrative address, and one for each group of independently-failable services.  Retraining the switches to understand which switch port services the given MAC address is accomplished by simply sending any IPv4 packet with the new MAC address.  The advantages include:  Takeover can be very fast and quite reliable.  The disadvantages include: the two servers have to be on the same network segment, some organizations lock down their network gear to make this "impossible" (which it doesn't - it just slows it down), and it typically increases the number of MAC addresses needed by the servers and services, organizations almost never have methods for allocating and managing MAC addresses like they do IP addresses.

Which Method is "Best"?

I have heard it said that when you ask an engineer a question the answer is always the same regardless of the question - "It Depends".  So it is here...

  • For servers on the same VLAN (or network segment) - IP address takeover (IP address spoofing) is the most common.
  • For servers on different VLANs or network segments:
    • Network route updating - if politically and technically feasible
    • DNS updating
    • Updating the conceptual knowledge layer
Note that there are some circumstances in which there is no easy answer.  You may have to change your network configuration, solve political problems, buy an external netblock, or execute various other combinations of uncomfortable or difficult steps.

Quorum Server Illustrated – updated

In two earlier posts [1] [2], I gave brief descriptions of the quorum server which seem to have left as much confusion as they provided clarity.  This post is only about the Linux-HA quorum server, and includes illustrations for clarity.

The Linux-HA Quorum API

In the Linux-HA quorum API, you can configure a number of quorum modules which are used as follows.  If a quorum module returns HAVEQUORUM, then the cluster has quorum.  If it returns NOQUORUM then the cluster does not have quorum.  If a quorum module returns QUORUMTIE, then the next quorum module in the list is consulted.  If the final module returns QUORUMTIE, then it is treated as a NOQUORUM event.

The quorum daemon is normally used in conjunction with the nomal arithmetic voting quorum module, so that it is only consulted when the number of nodes in the cluster is exactly half the number of configured modules in the system.  So, it is worth noting that the quorum server will never be consulted if a cluster has an odd number of nodes.

Quorum Server Scenarios

Below, I'll go through the basic quorum server cases so you can see how all this works in more detail - with pictures, even!

Normal Situation - Everything up
Quorum_server_normalsm_2

In the picture above, everything is normal.  The quorum server is up, and both sites are also up.  Because the cluster has all its nodes up, the quorum server is irrelevant.

Single Site Failure
Quorum_server_nj_failedsm_3

In the situation above, we show the "New Jersey" site as down.  In this case, the conventional voting quorum has a tie (1/2 - exactly half of the nodes).  In this case the quourm server is consulted.  Since only New York is talking to the quorum server, the quorum server grants quorum to the New York site.

Split Brain Avoided
Quorum_server_splitbrainsm_2

In the case above, the link between the sites has been lost, but both sites and the quorum server are all up.  In this case, both New York and New Jersey contact the quorum server because each sees 1/2 nodes as being up - resulting in a tie condition.

In this case, the quorum server will choose one of the two sites to provide quorum to, and I assume in this case that New York was chosen.  Because New Jersey  wasn't granted quorum, it will shut its resources down.

What happens when the quorum server goes down?
Quorum_server_failed_both_upsm

That is the situation shown above.  Because New York and New Jersey are both up, they have 2/2 votes and both provide service as they should.  This illustrates the point that the quorum server is not a single point of failure.

Multiple Failures -> Loss of Service

Multiple_failures_no_servicesm_3

In this final case, multiple failures have occurred - both New Jersey and the quorum server are down.  In this case, New York doesn't have quorum, so it shuts down services and none are provide by any node in the cluster.  Of course, this situation can be overridden in the cluster configuration by changing the quorum policy, but from an automated perspective, this is all that can be (should be) done.

Security Concerns

If you want to run your quorum server communications across networks which mig