Building a Fully Supported High Availability Exchange 2010 Environment on vSphere
Building an Exchange 2010 environment on VMware is a fairly easy task, however if you want the installation to be fully supported by both VMware and Microsoft you are going to have to read a lot of documents to make sure your configuration (even though I’m sure it works) is not deemed unsupported. Of course if you never have to make a support call to Microsoft or VMware you probably don’t care, but in those situations when you need assistance because all hell has broken loose, you may appreciate the extra effort you went through in the beginning. I hope this will serve as a guideline for engineering your Exchange 2010 solution. Hopefully the readers of this article both fully understand the Exchange 2010 architecture and VMware vSphere architecture.
First of all let’s talk about why creating a supported installation even requires this article. VMware’s vSphere product is built with the capability of creating high availability (HA) and fault tolerant (FT) virtual machines right out of the box. For VMware HA means automatically restarting virtual machines (VM) that were running on a host (physical server) on a different host should the host they are running on go down. With HA there is downtime of those servers but they are automatically powered back on with no manual intervention. FT means running a VM on two hosts simultaneously so that if one of the hosts were to go down the VM never reboots or goes down. This helps prevent an outage due to a host going down, but does not prevent the guest OS from becoming unavailable if the guest OS were to crash or become unresponsive. So between these two capabilities a VM is decoupled from a physical host. Microsoft’s Exchange solution is designed for the possibility of a near 100% up-time statistic and the methods it employs for being able to achieve that uptime are different and are at odds with the VMware’s (and even Hyper-V and Xen Server for that matter).
In general VMware’s HA feature is widely supported as an approved and supported feature to have enabled, as it is analogous to restarting a server after a unexpected power loss, but there is downtime as the server reboots. VMware’s FT is generally not going to be supported by Microsoft most likely because it is a new and unfamiliar technology. It works but has limitations based on the number of CPUs that can be in the VM, but in general Microsoft would prefer that other more traditional technologies are used to handle an unexpected outage. VMware’s Thin Provisioning and Snapshots are not supported by Microsoft for Exchange.
The 5 roles in Exchange 2010 are Edge Transport, Hub Transport, Client Access, Mailbox and Unified Messaging. I will step through each of these roles and explain how to configure the server roles to be highly available as VMs and also supported by both vendors.
Client Access
The first role you will be rolling out in your Exchange 2010 environment is the Client Access (CAS) role. Outlook Web App (OWA) and Outlook 2007/2010 clients use these servers to communicate with their mailboxes. From a high availability stand point you can create a CAS array which is a collection of CAS serves that are grouped together and referenced in DNS and Active Directory, however all load balancing is done outside of Exchange. Exchange has a mechanism to build a grouping of CAS servers, but something like Network Load Balancing or a 3rd party software/hardware load balancer will actually have to handle the load balancing of traffic.
Exchange doesn’t provide a mechanism for the load balancing so the only limitation with using VMware’s HA is that you will want to ensure that both of your CAS servers are not running on the same physical host. This suggestion is not because of some support requirement but because should your host go down and both your CAS servers are on that host your clients will lose connectivity. Even if it is only for a few minutes while the VMs are restarted on another host, there will be an outage. Another consideration that should go without saying is that if you are using a third party software/hardware solution you will want to ensure that the solution you choose does not have a single point of failure. In other words don’t buy one software/hardware load balancer.
Hub Transport
In Exchange 2010 the Hub Transport (HT) role is going to be located in each site where you have a Mailbox server. The nice thing about the HT role is that it is designed to be highly available right out of the box. It handles load balancing and rerouting of mail provided you have multiple servers. If your organization decides to have the HT role directly connected to the Internet in lieu of an Edge Transport server you will need to provide external load balancing by way of multiple MX records in your external DNS. Remember with MX records you set up an associated cost where the lower cost is considered the primary and ones at higher values are secondaries. This way if the primary node goes down mail will flow to the next MX record.
The use of VMware HA again is not prohibited for this role but remember don’t place both HT servers on a single host. Also many people often combine the HT and CAS roles together because of their similar high availability requirements.
Mailbox
The Mailbox role is the server that contains all the mail databases. In Exchange 2010 Microsoft completely changed the way that Exchange handles high availability. They introduced the concept of Database Availability Groups (DAG) and dropped clustering. The implications of this are huge and you are probably going to need additional storage space (read this as “we’re gonna need a bigger boat”).
With clustering there was this concept of shared storage. If you had 2 servers that were clustering 400GB of mailboxes both servers would connect to the same 400GB volume/LUN and would share read/write access to it. Now that clustering has been dropped if you want to have two servers make 400GB of databases highly available you need 800GB of storage space. The way that DAGs function in Exchange 2010 is very similar to log shipping with Microsoft SQL. Lets say you have 8 mailbox databases. An example would be configuring 4 active databases on one server and 4 active on the other and they each also hold copies of the other’s mailboxes in passive copies. So each server has a copy of all 8 databases. Constant log shipping is going on to keep the active / passive databases in sync so that if a mailbox server goes down the passive copies are made active and the client doesn’t even know that a failure has happened. Since there can be a maximum of 16 servers in each DAG group you can see that the possibilities for configuring this are near limitless. You could have each server have copies of all databases or you could break it up in much more elaborate configurations.
Since Microsoft has built the DAG concept to handle and be tolerant of failures it is recommended to avoid HA and FT to maintain supportability with Microsoft. Both technologies will work, but it is recommended to let the Exchange 2010 application handle faults. If you are going to have DAG servers, make sure that all the active/passive databse copies are not running on the same host.
Edge Transport
The Edge Transport (ET) is an optional role that allows for DMZ like spam and virus filtering for your incoming and outgoing email. If you were to deploy this role it is recommended that you place it in the DMZ with a firewall between it and the inside production network, and a firewall between it and the Internet. Much like the Internet facing HT role mentioned earlier, from an external perspective you want to ensure that you have multiple MX records with proper costs associated so that primary and secondary ET are identified. From and internal perspective you want to ensure that your HT and ET servers are properly configured with the Edge Sync and Edge Subscriptions so that the ET and HT can communicate in case of any failures of those two roles.
If your vSphere environment is configured to properly handle DMZ type VMs whether by a VLAN tag and separate vSwitch or dedicated NICs and vSwitches then feel free to run these as VMs. VMware HA can be used ensure that all ET servers are not running on the same host.
Unified Messaging
This is another option role that enables the unified mailbox concept allowing faxes, voicemail and email to be centrally stored in your mailbox. This role requires compatible PBX and in some cases additional hardware to allow for the integration of voicemail and faxes. With that being said, this is the easiest role to engineer for a VMware environment in a supported fashion. Microsoft does not support Unified Messaging (UM) role on any virtualization platform. The reason for this is because the product that they integrated to create the Unified Messaging platform that ensure the real-time voice recording (call quality) for voicemail messages does not support virtualized platforms. This is not to say that it wouldn’t work, it just is not supported. So if you want to engineer this role to be highly available you will need to use physical servers and you will want to have at least two. When you configure your dial plans you want to ensure that you configure the same dial plan and configuration settings on both UM servers so that they can both service that dial plan should the other one be unavailable.
Example
In the small example I’ll run through has 5 physical servers and is located at one data center. The server configuration looks like this:
- 1 vCenter 4.1 Server running on Server 2008 R2
- 2 ESXi 4.1 Servers
- 2 Unused Servers
- There is a DMZ network and a Production network that are seperated by a firewall.
- The DMZ is separated from the Internet by a firewall.
Now that we know the relationship with high availability for each of the Exchange 2010 roles I will show you a quick solution to provide high availability our scenario.
CAS and HT roles are installed on two VMs. The VMs are placed on separate hosts. CAS is configured in a CAS array and Windows 2008 NLB is configured to handle high availability. HT roles need no additional configuration to provide redundancy. VMware HA is configured. It is understood that since there are only 2 hosts in this example should 1 host fail and HA kick in both CAS and HT servers will be running on the same host.
Mailbox role is installed on two VMs and are configured in a DAG group. There are 4 Mailbox databases (DB). DB 1 and 3 are made active on Mailbox server 1 and DB 2 and 4 are made active on Mailbox server 2. They each hold passive copies of the other two databases. HA and FT are not configured for these VMs.
The ET role is installed on two VMs that are placed on the DMZ vSwitch. Edge Sync and Edge Subscriptions are properly configured between the two ET and two HT servers. Two MX records are configured with the primary have a cost of 10 and the secondary having a cost of 20. VMware HA is enabled for these VMs.
The UM role servers are deployed on two physical servers and are configure with the same dial plans and are connected to the PBX .
Hopefully this has shed some light and maybe given you some ideas on how to best deploy your Exchange 2010 solution on vSphere. This is by no means the only way to deploy the solution but should be helpful in planning. If you care to read the whole Exchange 2010 requirements from Microsoft feel free to do so, http://technet.microsoft.com/en-us/library/aa996719.aspx.