Thursday, May 14, 2009

Storage in Exchange 2010 -UNC321

A. Introduction:
1. Exchange storage background
2. Storage technology 2010+
3. Large mailbox value
4. E2010 storage architecture
a. Storage innovation
b. ESE database innovations
5. E2010 storage design
6. Summary
B. Exchange storage 2003 HA /storage design
1. MSIT 4+3 SCC san example
a. +1 IOPS/mailbox
b. 4 active nodes
c. 3 passive nodes
d. 4 gig ram
e. 4000 users/server
f. 250mb mailboxes
g. Backups: daily full, stream to disk/tape
2. Problem with this example
a. Disk was single point of failure

C. Exchange 2007 HA / storage design
1. MSIT CCR + DAS example
a. .33 IOPS/mailbox
b. ~4000 mailboxes/cluster
c. 8 processor cores
d. 2 GB mailboxes
e. Backups: DPM , 15 min incremental, daily express full
f. Using RAID 5
g. No single point of failure
D. Disk technology:
1. Disk capacity trend predicted to continue
a. 2 TB desktop class sata disk available
b. 1 TB near line/midline SAS disk available
2. Sequential through put increasing linearly based on area density
a. 2010 stat = ~250mb/sec
3. Random IO performance not expected improve substantially
a. 15k rpm
4. Random vs. sequential disk IO
a. Random IO
· Disk head ahs to move to process subsequent IO
· Head movement = high IO latency
· Seek lentency limits IOPS
b. Sequential IO
· Disk head doesn't move to process subsequent IO
· Stationary head = low IO latency
· Disk rpm speed limits IOPS
· 7.2 k SATA disk 20ms latency
5. Flash/SSD : 2010 scenario
a. Flash best utilized by 2010 when used as a cache within storage stack
b. Price delta is huge
E. Email trend
1. Average corporate user today can expect to send and receive about 156 messages a day and this numbers exceeded growth about 233 messages a day
F. Large mailbox value:
1. Expectation is growing for large mailbox
2. Large mailbox = 1=10 GB
a. Aggregate mailbox = primary mailbox + archive mailbox
b. 1 year of mail (minimum)
c. 1 year, 48,000 items, 2,400 MB
3. Increased knowledge worker productivity
a. Reduced mailbox management
b. Client accessibility (owa, outlook, mobile)
4. Eliminate/reduce PST
5. Eliminate reduce 3rd party archive solution
6. Client experience:
a. Outlook 2007 performance in cache mode
· Problem with large OST
· Office 2007 sp2 solves this problem (will allow up to 10gig of OST)
· 2010 archive mailbox to reduce data cached to OST (archive is online only)
· 2010 store /ESE changes improves cache mode sync speed
b. Outlook 2007 online /owa performance
· Item/folder item limitations
· View creation performance
· 2010 store/ese changes will allow up to 100k items
c. Client search performance:
· 2010 search performance improvements: real time result views, 3x increasing indexing performance
7. Large mailbox challenges and solutions
a. Long backup times:
· Backup off passive copies
· Daily incremental/weekly full backup
· DPM express full backups
· 2010 HA + hold policy as your backup*
b. Fast recovery: requirements (RT))
· 2010 HA is the fast recovery solution
c. High storage cost: IPS (efficiently utilizing low performance/high capacity disk ) raid overhead
d. Move mailbox downtime
·
e. Database maintenance
G. 2010 storage vision:
· IO reduction
· Sequential IO
· Large, fast low cost mailboxes: sata/tier 2 disk optimization
· Raid less storage (JBOD)
· Storage design flexibility
H. IOPS reduction: store schema changes
1. Store schema = the way the store organizes SATA in the ese database
2. 2010: one simple theme
· Move away from doing many, random small size disk IO to doing fewer sequential, large size disk IO
3. Significant benefits
4. 2007 schema: store table architecture
a. Mailbox take
b. Folder table
c. Message table
d. Attachment table
e. Message/folder table/(per folder)
· Give benefit of single instance storage
· Great way but causes random access
5. 2010 schema:
· Per database: mailbox table
· Per mailbox: folder table, message header table, body
· Per view: view table (e.g. from)
· Single instance storage is gone completely
6. Store schema changes: physical contiguity
· Few large IOs and sequential reads...
7. Store schema changes: logical contiguity
8. Store schema changes as: lazy view updates: reducing IO defying view updates, view updates utilize sequoia IO
· 2007 : IO when ever new messages come in
· 2010: doesn’t do any IO if you are not accessing
· View updates only happen upon user request
· Reads are sequential so they are fast

9. Demo:
· 2007 user on one side and 2010 on other side looking at 100k items mailbox and opening the mailbox for the first time
· Perfmon is running to capture numbers
· 2007, took about 5 sec
· 2007: doing first time sort, you get the pop up.....
· 2007: looking at IO, you can see that it’s random IO and it took 25 sec to open a view
· 2010: starting outlook, 2.5 sec
· 2010: first time sort of 100k items – 5 sec
· 2010: looking at performon, sequential IO
· You can put more users on disk and IO goes down, better experience
10. IOPS reduction: ese changes
· Optimize for new store schema
a. Allocate dab space in contiguous manner
b. Maintain db continuity over time
c. Utilize space efficiently (db compression)

· Increase IO size
a. Db page size increase from 8kb to 32 kb
b. Improve read/write IO coalescing (gap coalescing)
c. Provide improved asynchronous read capability (pre read)
· Increased cache effectiveness
a. 100 mb checkpoint depth (ha configuration only
b. Db chce
11. IIOPS reduction: space manamgement: allcoate space based contiguity
· Databas space allcation hints
· Allocate db space based on either data compactness or data contiguity ( usage pattern)
12. IPs reduction: maintenance contiguity:
· New db maintenance architecture
· Clean up performed at run t time when hard delta occurs. Happened during dumpster cleanup OLM, pages are zeroed by default
· Space compaction: db is compacted and space reclaimed at run-time. auto throttled
· Maintain contiguity: database is analyzed for contiguity and space at run time and is defragmented in the back ground ( b + tree defrag/old2). Auto throttled. Sacrificing space for contiguity
· Database checksumming: two options (both active and passive copies)
a. Run db checksum during run time
13. IOPS reduction: DB contiguity results:
· Message 2007 message folder table – looks random and fragmented
· Messages header table on 2010 – you see contiguity
14. Mitigate db space growth: database compression:
· Store schema changes, space hints, b+tree defrag & 32kb page size combine to increase db file size by 20%
· Growth is 100% compression for message headers and text/html bodies (long values)
· With the compression we can bring our db size down, with mix of html /txt message
15. IPs reduction: db page size increased to 32kb
· Now we don’t need as many page files and we can fit more per page and it’s contiguous
· When comparing 2007 to 2010, a 20kb message can be read in 1 IO while it took 3 IO in 2007
16. IOPS reduction: IO gap coalescing:
· Read case
· For 2007 3 IO reads to get a message off disk
· For 2010: 1 IO
17. IOPS reduction: 100mb check point depth:
· Checkpoint depth = the amount of data that is waiting to be committed to the db file (Edb)
· 2010 default check point depth max is increasing form 20mb to 1000mb only on db protected by 2010 ha (stand alone is 20mb )
· Deep checkpoint benefit + efficient db writes (~40 % reduction)
· Deep checkpoint risks = long store shutdown times, long crash recovery times:
· Risk mitigation: shutdown dab in parallel, failover on store crash
18. Database cache compression:
· Problem: new store schema + 32kb pages ca n reduce efficient of cache. E.g. a page with 8kb of data consumes 32kb of memory in db cache
· Solution: implement db cache compression to shrink partially used cached pages in memory, allowing more effective cache
· Up to 30% more cache/mailbox server = less DB IO
· More mailboxes you have more benefit.
19. DB cache priority:
· Problem: background and recovery db operations can pollute the cache. E.g. db check summing, old2, ha log replay
· Solution: implement d cache priority to allow lower cache priorities for background/replay operations
· There is competition of past, now, future , and cache eviction on the time line of cache
· HA log replay (passive) will come in as past
· DB maintenance will come in at very tail end of the time line


I. Exchange 2010 storage and feeds:
· Db IO increased by 5x
· Log IO write is the same
· For 3000 mailbox:
a. 70% reduction for DB IO/sec
J. Exchange iops trend:
· 90% storage design change
K. Optimize for sata/tier 2 disks:
1. Db writes IO ‘burstiness’: bursty db writes negatively affects db read and log write latency: the more write IO’s issued at a time, the more disk contention
· Solution: throttle db writes abased on checkpoint target (QoS), DB write smoothing
· Works great on sata 7.2 k disk midline
· Results: 50% reduction in RPC latency due to IO smoothing
L. Putting it all together:
1. 2010 storage improvements cannot be quantified in iops reduction alone
M. Jbod/raid less storage: now an option:
1. Jbod: 1 disk = 1 database/log
2. Requires 2010 HA ( 3 _ db copies)
3. Annual disk failure rate (AFR) ~ 5%
4. Advantages:
· Reducing storage cost
· Eliminate unnecessary db cost: server and storage redundancy can be symmetrical
· Reduce disk IO
· Enable simpler storage dsign 1 disk = 1 db
· Enable simple storage failure recovery
5. DisadvantAGES
· Disk stipp8ng performance cannot be leveraged
· Disk failures – db failover
· Re-enabling resiliency = spare disk assignment/partitioning/format/db reseed
· Soft disk errors: bad blocks must be detected and repaired
6. 2010 optimization”
· Improve ha storage failures
· Optimize ha
· Improve storage failures detection (bad blocks /corruption)
a. Active/passive copy background scan
b. Active passive copy lost write detection
· Improve db seeding repair
a. Utilize db passive copy for seeding source
b. Seed capability for content index catalog
7. Reduce reseed by using single page restore (active and passive)page
· Page corruption detection on active copy
· Active db places marker in log stream to notify passive copies to ship up to date page
· Passive receives log and replays up to marker, retrieves good page involves replay service callback

No comments:

Post a Comment