Thursday, May 14, 2009

Exchange 2010 High Availability

A. Intro
1. Solution delivered:
· Unified technology for high availability and site resilience
· New framework for creating highly available mailboxes
· Evolution of continuous replication technology
· Can be deployed on range or storage options
· Native to exchange not bolted onto the side; no longer uses cluster model but HA is built in
B. Legacy versions:
1. Exchange 2003 server:
· Challenges were using cluster tools/ resources to manage exchange
· Failover was always at the server level
· 3rd party site resilience
2. Exchange 2007:
· Clustering only allows mailbox
· You need clustering knowledge to deploy mailbox CCR/SCR
C. What’s new in Exchange 2010:
· Database replication at the DB level
· On 2007 replication was at SG level
· You can choose replication target where as on 2007 it was automatically done
· Failover is at database level, exchange manages failover
· All clients connect to RPC client access server. CAS server knows which mailbox server to connect to client access
· You can have mailbox server in different sites and still replicate
1. Feature names;
· Mailbox resiliency – name of unified HA and site resilience solution
· Database availability Group – a group of up to 16 mailbox server that host a set of replicated database
· Mailbox database copy – the ability to deploy high availability/site resilience after exchange is installed
· Exchange 3rd party replication API – an exchange – provided API that enabled use of 3rd party replication of a DAG in lieu of continue of replication
· New API will be provided for 3rd party could write to so continuous replication can be done by 3rd party
2. Terminology:
· HA – solutions must provide data availability, service availability, and automatic recovery from failures
· Disaster recovery – process used to manually recover from a failure, step you take when HA is no longer there
· Site resilience – DR solution used for recovery from a site failure,
· *over – short for switchover/failover: a switchover is a manual activation of one or m ore database; a failover is an automatic activation of one or more database after a failure
3. Exchange 2010 *over: (could be failover or switchover)
· Within a datacenter: database or server * server
· Datacenter level: switch over
· Between datacenters:
a. Database for server * over’s
b. Assumptions:
- Each datacenter is a separate active directory site
- Each data center has live, active messaging services
- Standby datacenter must be active to support single database *over
4. Exchange 2007 concepts brought forward
· Extensively storage engine: database and log files
· Continuous replications:
a. Log shipping and replay
b. Database deseeding
c. Store service, replication service
5. Not coming over:
· Storage groups
· Database identified by the t server on which they live
· Server names as part of database names
· Clustered mailbox servers:
a. Preinstalling a windows failover cluster
b. Running setup in clustered mode
c. Moving a CMS network identity between servers
d. Shard storage
· Two HA copy limits
· Private and public networks (mapi network and replication network are used)
6. 2010 ha fundamental:
· Database availability group server
· Database
· Database copy
· Active manager
· RPC client access
7. DAG:
· Base component of HA and site resilience
· A group of 16 servers that host a set of replicated db
· Wraps a windows failover cluster
a. Manages membership (DAG member = node)
b. Provide heartbeat of dag members servers
c. Active manager stores data in cluster database
· Defines a boundary for:
a. Mailbox db replication
b. Database server *over
c. Active manager
8. Active manager:
· Brain of HA
· Exchange component that manages *over
a. Runs on every sever in a dag
b. Selects best available copy on failovers
c. Is the definitive source of information on where a database is active:
- Stores this information in cluster database
- Provides the information to other exchange components (e.g. RPC client access and hub transport
· Tow active manger role
· Pam – primary active manager
a. Runs on a node that owns the cluster group
b. Gets topology change notification
c. React to server failures
d. Selects the best db copy on *over
· SAM – standby active manger
a. Runs o n every other node in DAG
b. Responds to queries about which server hosts the active e copy
9. How does it select:
· Active manger select the best copy to become active when existing active fails
· Ignores servers that are unreachable or activation is temporarily or regularly blocked
· Sorts copies by currently to minimize data loss
· Breaks ties during sort based on activation preference
· Selects from sorted listed based on copy status of each copy
10. 10 criteria:
· Phase 1: looks at catalog health, copy status is healthy, copy queue length is <10 and replay is <50
· Phase 2-10: As we move down in phases the criteria’s becomes less strict
11. Example: database failover
· Database failure occurs
· active manager moves acxtitve db
· Datgabase copy is restored
· Similarly within and across datacenter
12. Server failure:
· Server failure occurs
· Cluster notification of node down
· Active manager moves active db
· Service restored
· Cluster notification of node is up
· Db copies resynchronize with active db - no need to reseed, this is all automatic
· Similar flow within across datacenters
13. Dag life cycle:
· Dag is created initially as empty object in ad: continuous replication or 3rd party replication using 3rd party replication mode
· When first mailbox serve is added to a dag
a. A windows failover cluster formed with a node majority quorum using the name of the dag
b. These severs added to the dag objects in ad
c. A cluster network object for the dag is created in the build in computers container
d. One or more ip addresses is assigned to the dag
e. The name and the IP address of the dag is registered in dns
· When second and subsequent mail server is added to the dag
a. The server is joined to the cluster for the dag
· After server has been added to a dag
a. Configure the dag: network encryption and network compression
b. Configure dag networks: network subnets, enable/disabled map traffic recitations
c. Create mailbox db copies; seeding is performed automatically –(manually seeding, used when seeding from passive copy of db, this is now possible with 2010)
d. Monitor health and status of db dopes
e. Perform switch over as needed
· Before you can remove a server a from a dag you must first remove all replicated db from sever
· When a server is removed from a dag
a. The server is evicted from cluster
b. The cluster quorum is adjusted as needed
c. The server is removed from the dag object in ad
· Before you remove a dag you must first removal server form the dag
14. Deploying exchange 2010 HA features:
· Prepare Hw, install OS and update
· Run setup and install mailbox role
· Create a dag and replicate db
· Test and *over
· No need to do the ground work for HA, you can do this later
15. 2010 incremental deployment
· Create a dag: new-databaseavaillabilitygroup – name dag1 –filesharewitnesshare
· Add first mailbox server to dag: add-databaseaavaillabiltiuygorup – name dag1 –fileshrewintesshare
· Add second and subsequent mailbox server to
· Add mailbox database copy
· Extend as needed
16. Demo:
· Db is decoupled from the server and this is evident from EMC
· Activation preference – preferred list of sequence number
· DAG configuration:
a. member server
b. Witness sever and witness path
c. DAG network, collection of subnet that you provide, if you have multiple sites, you'll have subnets here configured to a DAG
· Database activation: (switch over)
a. Test-replicationhealth – this can run remotely
d. Get-mailboxcopystatus (instead of storage group copy status)
b. Switch over before the maintenance
c. Auto database mount configuration: lossless, good availability best availability, best effort, none, these refer to log loss
d. It’s not moving anything, the data is already there, it may be copying some logs

· Owa experience:
a. Seamless, user had to refresh using owa
· Delta between beta and rtm
· Creating a DAG, really quick
· Adding members to a dag: this takes little longer
a. Add dag servers serially, you can multi select as many as you want
· Create a mailbox database copy
a. It does a db seed basically
b. It assigns activation preference
17. Transitioning to exchange 2010 ha
· Verify that you meet the requirement for 2010
· Deploy 2010
· Use 2010 mailbox move feature to migrate
· Unsupported transitions
a. In place upgrade to exchange 2010 from any previous version of exchange
b. Using database portability between exchange 20010 and non exchange 2010
c. Backup and restore of earlier versions of exchange db on exchange 2010
d. Using continuous replication between exchange 2010 and exchange 2007
18. End to end improvement:
· Online move mailbox
a. Supported between exchange 2010 db and between exchange 2007 sp2 and exchange 2010 db
b. User can access their mailbox while move is in progress
c. move is performed asynchronous by new service called the ms exchange mailbox replication service (MRS) which runs a client access servers
d. you can move from anywhere because remote control.
e. Built in throttling and recovery, doesn’t impact replication
· RPC client access server
a. New service that establishes a rpc endpoint for client access on the cas role to replace the existing rpc endpoint on the mailbox role
· Shadow redundancy:
a. Protection for message while it’s in transit against edge failure
b. Shadow queue is kept there until message is delivered
? what about when
· Transport dumpster:
a. Gets feedback from replication pipeline to let it know when to delete items
j- once something has been delivered, and the logs are the messages are replicated, transport dumpsters can delete the message
replay is not required for deleting items from dumpster, only data in dumpster is data that has not yet been replicated. This reduced iops from hub transport
b. Responds to requests for redelivery after lossy failover both within its ad sites and across ad sites (old site and new site)
· Replication for other purpose:
a. Site /server/disk failure
b. If you have at least 3 copies you can use ha for backup
c. Archiving/compliance: e-mail archive provides this
d. Recover deleted items: recovery deleted items retention, you can set this to longer, use the hold policy( this is function of dumpster)
e. No longer need to have copy at storage level (raid)

19. Examples: small office:
a. Hardware load balancer
b. cAS/hub/mbx all running on one server, with two servers
c. no more than 8proc core, no more than 16 gig memory
20. Example: double resilience
a. single site
b. 2 nodes, 3 HA copies
c. Jbod, 3 copies
21. Example: Better option, 4 node dag:
a. Single site
b. You protect against failure during maintenance
c. You can lose two copies and still have quorum
22. Take away
· greater end to end availability mailbox resilience
· Unified frame work for high availably and site resilience
· Faster and easier to deploy with incremental deployment

No comments:

Post a Comment