收藏 分享(赏)

3._业务永续-数据中心的业务连续性规划与设计_Final.ppt

上传人:fcgy86390 文档编号:4996021 上传时间:2019-01-28 格式:PPT 页数:50 大小:7.07MB
下载 相关 举报
3._业务永续-数据中心的业务连续性规划与设计_Final.ppt_第1页
第1页 / 共50页
3._业务永续-数据中心的业务连续性规划与设计_Final.ppt_第2页
第2页 / 共50页
3._业务永续-数据中心的业务连续性规划与设计_Final.ppt_第3页
第3页 / 共50页
3._业务永续-数据中心的业务连续性规划与设计_Final.ppt_第4页
第4页 / 共50页
3._业务永续-数据中心的业务连续性规划与设计_Final.ppt_第5页
第5页 / 共50页
点击查看更多>>
资源描述

1、业务永续数据中心的业务连续性规划与设计,客户需求,这个世界比过去有着更多的风险,环境在不断变化 风险接触范围在不断扩大 全球及区域间的依存关系在不断增大 供应链每时每刻都存在中断的风险 业务中断将导致更大的影响 宕机可能导致更大的财务影响 宕机可能对品牌造成伤害 宕机可能导致数据失去完整性 更繁杂的规范 产业和监管标准在不断变化 产业分工在地理分布上更趋分散 每个国家都可能有自己相应的规范 更多的灾害 经济危机、恐怖主义、飓风、地震、停电、火灾和疾病的大规模威胁,灾难的分类,每年的发生频率,每次发生的结果 (单次发生损失) (美元),1,000 100 10 1 1/10 1/100 1/1,

2、000 1/10,000 1/100,000,1美元 10美元 100美元 1千美元 1万美元 10万美元 100万美元 1000万美元 1亿美元,病毒,蠕虫,磁盘故障,组件故障,电源故障,常见,不常见,低,高,自然灾害,应用中断,数据损坏,网络问题,建筑火灾,恐怖行动/国内动荡,与可用性相关的,与恢复相关的,业务连续运营,业务连续性问题和挑战,更多的业务在线 更多的应用和数据增长的需要,更多复杂的系统 更少的恢复时间窗口 更小的对停机时间的容忍度,备份与恢复 vs. HA高可用 重新运行批处理日终作业 手动的应用与数据的恢复 丢失数据 最好的意图 vs. RTO, RPO, SLA规范的设计

3、 ,收入和利润受损失 负面的社会影响 罚款和罚金 涉及法律依从及会计的问题 员工的劳动强度和费用 对日常的业务规划和运作产生影响 ,60%的客户正在关注如何提高可用性接近50%的客户希望有显著的安全提升超过25%的客户希望实施高可用集群,业务延续运性之考虑 Consideration for Business Continuity,Fault-tolerant hardware, redundancy, automatic detection and isolation, predictive analysis, call-home,Real-time replication of data

4、over metropolitan and/or continental distances,Automated protection against unplanned outages with meeting recovery point and recovery time objectives,高可靠性 High Availability,数据复制 Data Replication,灾备技术 Disaster Recovery,业务延续运性Business Continuity,IBM Power Systems高可用性解决方案,HA和DR 的差别,High Availability,自

5、动的接管一般适用发生在本地的错误针对物理设备的保护 服务器 硬盘 适配器卡 网络针对致命的软件错误的保护 操作系统 数据库 应用 服务,基本,可用性,无,数据丢失,恢复点目标,Data Currency,Latest,持续可用性,数据传输 (每个交易的价值),可用水平,恢复时间目标,和,缩短的,计划内,停机时间,Availability Level,SAN 磁盘,备份服务,多服务器,解决方案,iSeries,single-server,单服务器,solutions,解决方案,备份,周- 日,场外,存储,RAID-5,日志,组合,磁盘,镜像,SAN,AIX, Linux,Intel 群集,连续数

6、据,复制群集,可切换集群,在线,维护,CUoD,高速,磁带,LPAR,TSM,BCRS,SWA,支持业务连续性与灾备的系统组件,面向开放平台的高可用(High Availability)解决方案的架构,Availability by 应用 按照高可用的要求来设计应用架构 Availability by 中间件 DB2 HADR、WAS 集群, CICS 集群 Oracle RAC Availability by 操作系统 AIX LVM 镜像、HACMP for AIX Availability by 硬件冗余 服务器 冗余的处理器 / I/O适配器卡/ 电源 / 内置磁盘RAID技术保护 外

7、置磁盘, I/O 总线、SAN 交换机、LAN、LAN 交换机 冗余的部件 磁盘 RAID 多路径(Multi-Path)软件(SDD、RDAC) 通过磁盘复制的可用性 FlashCopy, Metro/Global mirror 网络,Hardware Power Systems ( RAS ) Live Partition Mobility,Power Systems Software PowerHA PowerHA/XD,Application,Operating System AIX, I Live Application Mobility,IBM Power Systems High

8、 Availability Solution,Impact of Maint on Availability,IT Availability 24 x 7,Time for Planned Maintenance,Is this you? Your users demand continuous availability (24x7),Do you agree? As IT availability approaches 24x7, top-notch maintenance practices become more critical,Is this your problem? As IT

9、availability approaches 24x7, the time for maintenance work approaches zero!,The Power Systems High Availability Solution can show you how POWER6 processors and AIX 6 help to address the maintenance crunch! Learn how Live Partition Mobility, Live Application Mobility, Workload Partitions and PowerHA

10、 can enable non-disruptive maintenance anytime!,Do you want more availability and less work on weekends?,High Availability Hardware - Reliability, Availability and Serviceability,IBM Power Systems RAS架构,Processor Instruction Retry Alternate Processor Recovery First Failure Data Capture DDR Chipkill

11、memory Bit-steering/redundant memory Service Processor Failover* Dynamic Firmware Maintenance* Hot I/O Drawer Add* I/O error handling extended beyond base PCI adapter ECC extended to inter-chip connections for the fabric/processor buses Memory and L3 Cache soft scrubbing Hardware Assisted L2 & L3 Ca

12、che Line Delete Hardware Assisted Memory Scrubbing Live Partition Migration 570 Concurrent Add & Cold Repair,Primary POWER RAS Features,HMC required to enable these functions,Primary POWER RAS Features - Continued,HMC required to enable these functions,Redundant power, fans Dynamic Processor Dealloc

13、ation Dynamic processor sparing ECC memory Persistent memory deallocation Hot-plug PCI slots, fans, power Internal light path diagnostics Hot-swappable disk bays,Core System Design High quality parts Fewer parts = Fewer failures Designed for low power consumption (less heat = fewer failures) Manufac

14、turing methods, packaging, cooling Continuous System and Commodity Quality Actions Integrated RAS features Failure Avoidance Methodology Designed for Ease of Service,Fault Resilience N+1 Power Supplies, regulators, power cords Dual redundant fans Dynamic Processor Deallocation and sparing “Chipkill“

15、 Technology Predictive Failure Analysis Auto Path Reassignment - data paths, power Processor Instruction Retry,Fault Isolation & Diagnosis First Failure Data Capture Run Time Self Diagnostics Service Processor Rifle-shot repairs (no “plug and pray“ parts replacement approach),System Restore Deferred

16、 Repair Concurrent Repair LED Service Identification Service Consoles Migration to Guided Maintenance,Summary of key Power Systems RAS features,World-class Hardware RAS,High Availability Hardware - Live Partition Mobility,Live Partition Mobility with POWER6*,Allows migration of a running LPAR to ano

17、ther physical server,Reduce impact of planned outagesRelocate workloads to enable growthProvision new technology with no disruption to serviceSave energy by moving workloads off underutilized servers,Movement to a different server with no loss of service,Virtualized SAN and Network Infrastructure,*

18、All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals and objectives only. Any reliance on these Statements of General Direction is at the relying partys sole risk and will not create liability or obligation for IBM.,Continuo

19、us Application Availability,With Live Partition Mobility and Live Application Mobility, planned outages for hardware and firmware maintenance and upgrades can be a thing of the past,Relocate all partitions from one server to another when performing maintenance. Move the partitions back when maintena

20、nce is complete,* All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals and objectives only. Any reliance on these Statements of General Direction is at the relying partys sole risk and will not create liability or obligation

21、 for IBM.,Workload Balancing with Live Partition Mobility*,As computing needs spike, redistribute workloads onto multiple physical servers without service interruption,As one server gets overtaxed from a spike in demand, relocate partitions to other servers,* All statements regarding IBM future dire

22、ctions and intent are subject to change or withdrawal without notice and represent goals and objectives only. Any reliance on these Statements of General Direction is at the relying partys sole risk and will not create liability or obligation for IBM.,High Availability Operating System - AIX,UNIX Re

23、liability, Availability and Serviceability,The “Number One” Customer Requirement,Competition,AIX - 2007,AIX - 2006,AIX 2005,Enterprise Continuous Availability Capability,Time,AIX FunctionalityKernel Storage Keys Concurrent AIX updates Cross System Workload Mobility Dynamic Tracing with probevue Func

24、tional Recovery RoutinesComponent Trace Memory Overlay Protection Parallel Dump Lightweight Malloc debugLightweight Memory Trace Consistency Checkers Component RAS infrastructure AIX errorlog Subsystem Resource Controller,Exploitation of a POWER6 processor hardware feature to provide additional isol

25、ation of kernel and application data Storage keys can prevent invalid changes to memory cause by programming errors Application use of POWER6 storage keys is enabled in AIX V5.3 AIX Kernel exploitation of POWER6 storage keys is included in AIX V6.1,What is it?,AIX exclusive feature not available in

26、UNIX, Linux, or Windows!,AIX Storage Keys,AIX 6 Concurrent Maintenance,Kernel Space,User Space,Interim Fix,Concurrent update vmmove() patch,emgr,vmmove(),getgidx(),sleepx(),Non-disruptive fixes to executable code in a running AIX kernel Base AIX Kernel (/unix), kernel extension, or device driverNo d

27、owntime (reboot) required to apply fix and make it activeConcurrent updates will be packaged as Interim Fixes,Fix selected AIX kernel problems without a service outage,vmmove(),AIX 6 dynamic tracing with probevue,Trace existing programs without recompiling Dynamic placement of trace probes For debug

28、ging and performance analysis Tracable Calls: AIX system calls, application functions, and application calls to library functions Dynamic tracing language called Vue Initial support only for “C” programs,#!/usr/bin/probevue /* countreads.v */ syscall.$1.read.entry count+; interval.*.clock.100 printf

29、(“Number of reads = %dn”, count); count = 0; ,# countreads.v 404 Number of reads = 22 Number of reads = 0 Number of reads = 1 Number of reads = 17 ,Formatted I/O,User,Kernel,Probe Location,User Process Code,Some thread hits probe point (1),Branches to probe code (2),Probe code (3),Returns to probe p

30、oint (4),Thread continues execution(5),Trace Consumer,Trace File,or,Trace Output,Trace Buffers,E-code,“Vue” probe code example,The AIX answer to Solaris dtrace,This information is intended only for IBM sellers and Business Partners,AIX V6.1 Workload Partitions (WPAR),Virtualized AIX operating system

31、 environments within a single AIX image Each WPAR shares the single AIX operating system but can be separately managed Applications and users inside a WPAR cannot affect resources outside the WPAR Each WPAR can have a regulated share of processor, memory and other resources Two types of WPAR System

32、WPARs have separate security and appear like a completely separate OS Application WPARs are manageability wrappers around a single application,What is it?,This information is intended only for IBM sellers and Business Partners,AIX V6.1 Live Application Mobility,The capability to relocate a running W

33、orkload Partition from one system to another without restarting the application The application running inside the WPAR resumes running after the relocation is complete Works with systems based on POWER4, POWER5 and POWER6 processors Requires the IBM Workload Partitions Manager for AIX Manual or aut

34、omatic, policy based relocation,What is it?,操作系统停机时间调查: AIX是业界最稳定的操作系统 (27个国家400个用户),The Yankee Group “2007-2008 Global Server Operating Systems Reliability Survey” as quoted in “Windows Server: The New King of Downtime” by Mark Joseph Edwards at March 5, 2008 and in http:/ are here!,This informati

35、on is intended only for IBM sellers and Business Partners,According to a recent Yankee Group study* of 400 Windows, Linux and UNIX users, AIX was the most reliable server operating system:“IBMs AIX achieved the highest level of reliability, with corporate enterprises reporting an average of only 36

36、minutes of downtime per server in a 12-month period”,* Source: “Unix, Linux Uptime and Reliability Increase; Patch Management Woes Plague Windows” 2008 Yankee Group Research, Inc. All rights reserved,AIX is “Most Reliable”,High Availability System Software - PowerHA - PowerHA/XD,IBM PowerHA,PowerHA

37、for AIX,PowerHA Cluster Management Monitors, detects and reacts to events Establishes a heartbeat between the systems Enables automatic switch-over IBM shared storage clustering Can enable near-continuous application service Helps eliminate impact of planned & unplanned outages Ease of use for HA op

38、erations PowerHA managing integrated IBM data resiliency Logical Volume Manager (LVM) Shared switchable disk topology XD (optional feature of PowerHA)GLVM (Global LVM) AIX based replication over IPMetro Mirror IBM storage based synchronous mirroring SVC IBM DS8000 Smart Assists Application deploymen

39、t and configuration,34,PowerHA for AIX V5.5,PowerHA V5.5 Features Simplified Management Manage multiple clusters from a single graphical user interface Can run on a server outside of the cluster Support for TCP/V6 connections to clients New focus on IPV6 from US governmentPowerHA/XD V5.5 Disaster Re

40、covery Global Logical Volume Manager* Global Logical Volume Manager (GLVM) asynchronous mode mirroring Asynchronous mode enables geographic dispersion San Volume Controller Global Mirror Asynchronous replication for geographic dispersion,*GLVM Asynchronous mode generally available March 2009,Shared

41、storage clustering Topology,Network Clients,Serial Heartbeat,Power Cluster Node,Power Cluster Node,IP Network,Service & Standby Network Adapters,Shared Disk,IP Heartbeats,Switched Disk Cluster (Local only),本地存储双机LVM 基于AIX功能(软件免费) 完全冗余,无切换中断时间 特别适合24X7环境 存储可靠性几何级提高 双存储可轮流定期修整维护,PowerHA/XD (HACMP/XD)

42、延伸PowerHA的概念到更远的距离,利用 SVC or DS8000/DS6000/ESS镜像技术,Router,Router,DS8/6/ESS Mirroring,Primary ESS/DS,Secondary ESS/DS,生产站点,恢复站点,SVC,SVC,SVC Mirroring or,GLVM Mirroring,利用 Global Logical Volume Manager (GLVM) 技术,IBM AIX Multi System Data Resiliency,PowerHA for AIX,Strategic building block for IBM AIX

43、High Availability and Disaster Recovery solutions Integrated and optimized with IBM AIX Cluster Resources,Switched Disk Storage agnostic LVM mirrored copy of data HA (Local only),Switched Disk Cluster,Basic San Copy Services,Metrol Mirror Boot From SAN DR/Tape Backup DR only,FlashCopy,Global Mirror,

44、高可用性整体解决方案,数据库服务器,高可用性的实现层次,高性能高可靠性的 并行文件系统 - GPFS,什么是GPFS,集群: 可以扩展至4096节点,高速、稳定地通讯,单点管理与控制; 共享磁盘:可以从集群中的任一节点直接访问磁盘上的数据; 并行访问:所有节点访问所有磁盘的数据流并行实现;,IBM为AIX和Linux集群系统设计的共享磁盘的并行文件系统,为什么要用GPFS并行文件系统,应用需求: 多个节点访问同一个数据文件或数据库 高性能文件访问 故障恢复文件系统需求: 可访问:从任一节点访问所有文件; 动态扩展:能动态地增加或减少节点与存储; 文件唯一存在:使得在集群环境中的应用开发更加容易

45、; 高容量:TB级文件,PB级的文件系统,测试过2PB; 高吞吐率:单文件的访问可达GB/s,现最高记录为102GB/s; 数据并行访问:并行访问单个文件或多个文件; 可靠和容错:当某个节点、磁盘或连接出现问题时,仍然可以提供服务;,GPFS的主要优势,高性能 条带化文件读写提高并发访问性能,实测带宽可达数百GB 智能预取机制和客户端数据缓存机制降低读写延迟 分布式的元数据服务器和字节锁管理 可自定义数据块大小可,从16K到4M NSD支持InfiniBand RDMA 高可用性 仲裁管理和自动故障切换 支持多路径磁盘访问,每块逻辑盘可支持8个NSD Server 支持元数据和用户数据的复制功能 在不停止服务的情况下可以动态加入和移除节点或磁盘,支持在线升级 支持日志功能,实现系统快速恢复,高可扩展性 支持最大299 字节的文件系统和20亿个文件 支持数千个节点的集群系统 支持不同存储、网络、处理器和操作系统 易管理 自动在各个节点间同步配置文件和系统信息 可在集群内任何一个节点上完成对GPFS的管理任务,命令将在所有节点上生效 管理网络和数据网络可以分开 其他 支持信息生命周期管理 支持CNFS 支持快照功能和数据备份 提供DMAPI,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 实用文档 > 工作计划

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报