收藏 分享(赏)

VCS 培训教材(中文).ppt

上传人:精品资料 文档编号:11332570 上传时间:2020-03-18 格式:PPT 页数:77 大小:5.28MB
下载 相关 举报
VCS 培训教材(中文).ppt_第1页
第1页 / 共77页
VCS 培训教材(中文).ppt_第2页
第2页 / 共77页
VCS 培训教材(中文).ppt_第3页
第3页 / 共77页
VCS 培训教材(中文).ppt_第4页
第4页 / 共77页
VCS 培训教材(中文).ppt_第5页
第5页 / 共77页
点击查看更多>>
资源描述

1、VERITAS Cluster Server,Page 2,CONTENTS,VCS 预备知识 VCS的基本概念和术语 VCS的管理集群服务 VCS常见问题的解决 总结,VCS 预备知识,Page 4,Metropolitan HA Disaster Recovery (over SAN, MAN or LAN),Wide Area Disaster Recovery,VCS的几个常用场景,LAN,Local Clustering,MAN,WAN,Cluster Server,Cluster Server, Volume Manager,Cluster Server, Volume Manag

2、er, Volume Replicator, Global Cluster Manager,Page 5,Clustering: Application and Database Failover 数据库和应用的失效转移,Page 6,A Cluster View of Applications,A “service group” is a collection of resources that monitor the status of an application (服务组是各种监控应用状态的资源的集合) Application failover is controlled by the

3、 service group(应用的失效转移是由服务组来控制的),1.,1.,2.,2.,3.,4.,5.,Page 7,Active/Passive Clustering (主备方式)“asymmetric configuration”(非对称配置),Primary server hosts application(主机提供服务,Primary server FAILS,Secondary server hosts primary application,备机处于等待状态,一旦主机发生故障,接管服务),Page 8,Active/Active Clustering (互备方式)“symmet

4、ric configuration”(对称配置),Primary server hosts primary application,Primary server FAILS,Secondary server hosts both primary and secondary applications,Secondary server hosts secondary application 两个节点提供不同的服务,互相备用,当一个节点故障,服务马上有第二个节点接管服务,VCS 的基本概念和术语,Page 10,集群,SCSI JBODS,Several networked systems几个节点

5、Shared storage共享存储 Single administrative entity单个管理节点 Peer monitoring相互监控,Fibre Switches,Page 11,systems 系统,Members of a cluster集群的一个成员 Referred to as nodes也称之为节点 Contain copies of: 包括如下内容 Communication protocol configuration files通信协议的配置文件 VCS configuration files VCS的配置文件 VCS libraries and director

6、ies VCS的安装目录 VCS scripts and daemons VCS的脚本和后台程序 Share a single dynamic cluster configuration 共享一个动态的集群配置 Provide application services 提供应用的服务,Page 12,Service Groups 服务组,A service group is a related collection of resources.服务组是资源的一个集合 Resources in a service group must be available to the system.服务组中

7、的资源在系统中必须是可用的 Resources and service groups have interdependencies.服务组和资源存在相互依赖关系,NFS Service Group,NFS,IP,Disk,Mount,Share,NIC,Page 13,Service Group Types 服务组的类型,Failover失效转移 Can be partially or fully online on only one server at a time同一时间只能在一台机器上运行 VCS controls stopping and restarting the service

8、group when components fail当服务组某个资源出错时,VCS控制它的停止和重启 Parallel并行 Can be partially or fully online on multiple servers simultaneously可以同时在多台机器上运行 Examples: Oracle Parallel Server Web, FTP servers,Page 14,Resources 资源,VCS objects that correspond to hardware or software components包括软件和硬件组件 Monitored and c

9、ontrolled by VCS通过VCS来监控和控制 Classified by type通过资源类型分类 Identified by unique names and attributes通过唯一的名称和属性来标识 Can depend on other resources within the same service group在同一服务组中可依赖其他资源,Page 15,Resource Types 资源类型,General description of the attributes of a resource通常描述一种资源的属性 Example Mount resource ty

10、pe attributes:例如mount资源类型的属性 MountPoint 挂载点 BlockDevice 挂载设备 Other example resource types:其他类型的资源 Disk磁盘 Share共享 IP浮动IP NIC网卡,Page 16,Agents 代理,Processes that control resources 控制资源的程序 One agent per resource type每种类型的资源对应一个代理 Agent controls all resources of that type.一个代理控制对应类型的所有资源 Agents can be ad

11、ded into VCS agent framework.用户可以加入自己的代理到VCS的框架中,Page 17,Dependencies依赖关系,Resources can depend on other resources. 资源可以依赖其他资源 Parent resources depend on child resources. 父资源依赖子资源 Service groups can depend on other service groups.服务组可以依赖其他服务组 Resource types can depend on other resource types.资源类型之间也存

12、在依赖,比如IP类型必须依赖NIC类型 Rules govern service group and resource dependencies.资源和服务组之间的依赖关系由规则管理 No cyclic dependencies are allowed.不允许出现循环依赖,Mount,Disk,(Parent),(Child),Page 18,Private Network 私有网络,Minimum two communication channels with separate infrastructure:至少需要两条独立的通信链路 Multiple NICs (not just port

13、s)多块网卡 Separate hubs, if used独立的hub Heartbeat communication determines which systems are members of the cluster.心跳之间的通信决定哪些系统是集群的成员 Cluster configuration broadcast updates cluster systems with status of each resource and service group.集群中的资源和服务组的状态信息通过广播更新到各个节点,Page 19,Low Latency Transport (LLT)低时延

14、传输协议,Provides fast, kernel-to-kernel communications提供快速,内核到内核的通信 Is connection oriented Is not routable 不需要路由 Uses Data Link Provider Interface (DLPI) over Ethernet 使用以太网的链路层,Page 20,Group Membership Services/Atomic Broadcast (GAB),Manages cluster membership 管理集群成员 Maintains cluster state 维护集群状态 Use

15、s broadcasts 使用广播 Runs in kernel over Low Latency Transport (LLT) 运行在llt之上,Page 21,VCS Engine (had)VCS的引擎,Maintains configuration and state information for all cluster resources维护整个集群的所有资源的配置和状态信息 Uses GAB to communicate among cluster systems通过gab与集群的其他成员通信 Is monitored by hashadow process由后台进程hasha

16、dow来监控,hashadow,SystemA,SystemB,LLT,LLT,Hardware,Kernel,Private Network,had,had,Page 22,VCS Architecture总体架构,SystemA,SystemB,Shared Cluster Configuration in Memory,Hardware,Kernel,Resources,Agents,Mount,hashadow,hashadow,/v,Disk,c1d0t0s0,hme0,NIC,IP,10.1.2.4,had,/v,Disk,c1d0t0s0,hme0,NIC,IP,had,Moun

17、t,LLT,LLT,GAB,GAB,VCS 管理集群服务,Page 24,Cluster Configuration集群配置,Page 25,Starting VCS 启动VCS,main.cf,Cluster Conf,Private Network,System2,System3,System1,Page 26,Starting VCS: Second System,main.cf,had hashadow,Private Network,had hashadow,System2,System3,Cluster Conf,Cluster Conf,System1,Page 27,Start

18、ing VCS: Third System,main.cf,had hashadow,had hashadow,main.cf,main.cf,had hashadow,System1,System2,System3,Shared Cluster Configuration in Memory,Private Network,Page 28,Stopping VCS 停止VCS,Page 29,The hastop Command 停止命令,The hastop command stops the VCS engine. Syntax: hastop option arg -option Op

19、tions: -local -force | -evacuate -sys sys_name -force | -evacuate -all -force Example: hastop -sys train4 -evacuate,Page 30,Displaying Cluster Status 显示集群的状态,The hastatus Command Displays status of items in the cluster. Syntax: hastatus -option arg -option arg Options: -group service_group -summary

20、Example: hastatus -group OracleSG,Page 31,Protecting the Cluster Configuration 保护集群的配置,Cluster configuration opened; .stale file created Resources added to cluster configuration in memory; main.cf out of sync with memory configuration Changes saved to disk; .stale removed,haconf -makerw,Cluster Conf

21、,hares add ,haconf dump -makero,main.cf ,main.cf,.stale,Page 32,Opening and Saving the Cluster Configuration 打开和保存集群配置,The haconf command opens, closes, and saves the cluster configuration. Syntax: haconf option -option Options: -makerw Opens configuration -dump Saves configuration -dump makero Save

22、s and closes configuration Example: haconf -dump -makero,Page 33,Starting VCS with a Stale Configuration,main.cf,had hashadow,Private Network,hastart,had hashadow,System2,System3,main.cf,.stale,main.cf,Page 34,Forcing VCS to Start on the Local System,System1,main.cf,Private Network,hastart -force,ha

23、d hashadow,System2,System3,main.cf,.stale,Cluster Conf,main.cf,Page 35,Forcing a System to Start,Page 36,The hasys Command,Alters or queries state of had Syntax:hasys option arg Options:-force system_name-list-display system_name-delete system_name-add system_name Example: hasys -force train11,Page

24、37,Propagating a Specific Configuration 配置文件的传播,Stop VCS on all systems in the cluster and leave applications running:hastop -all force Start VCS stale on all other systems:hastart staleThe -stale option causes these systems to wait until a running configuration is available from which they can buil

25、d. Start VCS on the system with the main.cf that you are propagating:hastart,Page 38,Summary of Start Options启动总结,The hastart command starts the had and hashadow daemons. Syntax:hastart -option Options:-stale -force Example:hastart -force,Page 39,Validating the Cluster Configuration 验证集群配置,The hacf

26、utility checks the syntax of the main.cf file. Syntax: hacf -verify config_directory Example: hacf -verify /etc/VRTSvcs/conf/config,Page 40,Modifying Cluster Attributes修改集群属性,The haclus command is used to view and change cluster attributes. Syntax:haclus option arg Options: -display -help -modify -m

27、odify modify_options -value attribute -notes Example: haclus value ClusterLocation,Page 41,Startup States and Transitions启动的状态和迁移,Page 42,Shutdown States and Transitions停止的状态和迁移,RUNNING,LEAVING,EXITING,EXITED,EXITING_FORCIBLY,FAULTED,hastop,hastop -force,Resources offlined, agents stopped,Unexpected

28、 exit,VCS Troubleshooting,Page 44,从以下几个方面来监控VCS:,VCS的日志文件 系统的日志文件 使用hastatus命令查看VCS的状态 SNMP 事件告警机制 集群管理图形界面cluster manager,Page 45,VCS Log Entries,VCS引擎日志: /var/VRTSvcs/log/engine_A.log 通过GUI图形界面查看日志或者 hamsg 命令:hamsg engine_A Example entries:TAG_D 2001/04/03 12:17:44 VCS:11022:VCS engine (had) start

29、edTAG_D 2001/04/03 12:17:44 VCS:10114:opening GAB libraryTAG_C 2001/04/03 12:17:45 VCS:10526:IpmHandle:recv peer exited errno 10054TAG_E 2001/04/03 12:17:52 VCS:10077:received new cluster membershipTAG_E 2001/04/03 12:17:52 VCS:10080:Membership: 0x3, Jeopardy: 0x0,Page 46,代理日志:Agent Log Entries,代理日志

30、在 /var/VRTSvcs/log目录下面 日志文件用 AgentName_A.log来命名,如:IP_A.log 日志级别的设置: none error (默认设置) info debug all 通过命令来改变日志级别: hatype -modify res_type LogLevel debug,Page 47,集群通信问题解决:,使用命令 hastatus summary检查VCS 如果输出类似如下,则表明集群之间的通信有问题VCS:11307:Node has not received cluster membership yet, cannot process HA comman

31、d 如果输出类似如下,则表明VCS的引擎启动有问题hatest1 STALE ADMIN WAIT: all system stale 首先用lltconfig命令检查llt模块是否是running状态,如果不是检查/etc/llttab文件,Page 48,LLT模块问题解决:,检查/etc/llthost文件,主机名必须与/etc/llttab中的主机名保持一致,主机序列号必须在0-31范围内如果llt的状态是running,用命令lltstat n检查是否所有的心跳线都是好的 ,请先确认在/etc/llttab中配置的网卡是否都是UP状态的,可以用ifconfig查看,类似输出如下: L

32、LT node information:Node State Links* 0 test-smc3 OPEN 31 storage-1 OPEN 3,Page 49,GAB模块问题解决:,首先检查GAB模块是否已经运行,gabconfig a 如果输出如下,则表明GAB模块有问题,请检查/etc/gabtab文件, GAB Port Memberships 如果GAB一起动马上关闭了,请检查LLT模块是否有问题 如果没有h端口的输出则表明HAD 有问题,正常的输出如下: GAB Port Memberships = Port a gen a76401 membership 01 Port h

33、gen a76404 membership 01,Page 50,HAD模块问题解决,首先确认LLT模块和GAB模块已经正确启动 使用hacf verify /etc/VRTSvcs/conf/config检查VCS的配置文件是否配置正确,无输出则表明是正确的 确认VCS的license是否是正确的:vxlicrep,如果输出类似如下,则需要重新输入license vxlicrep ERROR V-21-3-1003 There are no valid VERITAS License keys installed in the system. 重新输入有效的license,使用命令vxlic

34、inst,按照提示输入license 使用命令hastatus -sum 查看状态 STALE_ADMIN_WAIT: The system has a stale configuration and no other system is in a RUNNING state. ADMIN_WAIT: The system cannot build or obtain a valid configuration.,Page 51,STALE_ADMIN_WAIT,To recover from STALE_ADMIN_WAIT state:从这个状态恢复 Visually inspect th

35、e main.cf file to determine whether it is valid.验证配置文件是否正确 Edit the main.cf file, if necessary.如有必要修改该文件 Verify the syntax of main.cf, if modified. 修改之后验证语法的正确性hacf verify config_dir Start VCS on the system with the valid main.cf file:强制启动VCS使用有效的配置文件hasys -force system_name All other systems perfor

36、m a remote build from the system now running.其他的节点可以通过这个启动的节点进行远程启动,Page 52,ADMIN_WAIT,A system can be in the ADMIN_WAIT state under these circumstances:下列情形之一可能会出现这个状态 A .stale flag exists and the main.cf file has a syntax problem. 配置文件有问题 A disk error occurs affecting main.cf during a local build.

37、本地启动的时候硬盘有问题 The system is performing a remote build and last running system fails.该节点正在远程启动,结果那个节点失效了 Restore main.cf and use the procedure for STALE_ADMIN_WAIT.,Page 53,Identifying Other Problems 其他问题的确定,After verifying that HAD, LLT, and GAB are functioning properly, run hastatus sum to identify

38、problems in other areas:在检查了HAD,LLT和GAB正确之后就要使用hastatus sum 来确定其他区域的问题 Service groups 服务组 Resources 资源 Agents and resource types代理和资源类型,Page 54,Service Group Problems: Group Not Configured to Start or Run 服务组的问题,没有配置为自动启动,Service group not onlined automatically when VCS starts:Check AutoStart and Au

39、toStartList attributes: VCS启动的时候服务没有自动online,先检查AutoStart 和AutoStartList 这两个属性hagrp display service_group Service group not configured to run on the system:服务组没有配置为在这个节点上运行 Check the SystemList attribute. 检查SystemList 属性 Verify that the system name is included.确认这个节点属于这个集群,Page 55,Service Group Auto

40、Disabled 服务组自动失效,Autodisable occurs when:由下列情形会发生自动失效 GAB sees a system but had is not running on the system.节点已经运行gab,但是没有启动VCS的had Resources of the service group are not fully probed on all systems in the SystemList.在所有的检点上服务组的资源没有全部探测到 A particular system is visible through disk heartbeat only.通过

41、磁盘心跳只有部分节点是可见的 Make sure that the service group is offline on all systems in SystemList attribute.确认这个服务组在所有的节点上都是offline的 Clear the AutoDisabled attribute: 清除自动失效属性 hagrp autoenable service_group -sys system Bring the service group online.将这个服务组online,Page 56,Service Group Not Fully Probed 服务组没有全部探

42、测到,Usually a result of improperly configured resource attributes: 通常是资源的属性没有正确的配置 Check ProbesPending attribute:检查这个属性hagrp -display service_group Check which resources are not probed:查看哪个资源没有探测到 hastatus -sum Check Probes attribute for resources:检查资源的属性 hares -display To probe resources: 探测这个资源 har

43、es probe resource -sys system,Page 57,Service Group Frozen 服务组冻结,Verify value of Frozen and TFrozen attributes: 确认这两个属性的值 hagrp -display service_group Unfreeze the service group: 解冻这个服务组 hagrp -unfreeze group -persistent If you freeze persistently, you must unfreeze persistently.如果是持久冻结,解冻的时候必须要是持久解

44、冻,Page 58,Service Group Is Not Offline Elsewhere服务组在任何地方都没有offline,Determine which resources are online/offline:确定哪些资源是online和offline的 hastatus -sum Verify the State attribute: 确认状态属性 hagrp -display service_group Offline the group on the other system:在其他节点offline这个服务组 hagrp -offline Flush the servic

45、e group:使这个服务组可以被部分拉起 hagrp -flush service_group -sys system,Page 59,Service Group Waiting for Resource服务组在等待某个资源,Review Istate attribute of all resources to determine which resource is waiting to go online.查看哪个资源正在等待online的过程中 Use hastatus to identify the resource.使用hastauts来确认这个资源 Make sure the re

46、source is offline (at the operating system level). Clear the internal state of the service group:hagrp flush service_group -sys system Bring all other resources in the service group offline and try to bring these resources online on another system. Verify that the resource works properly outside VCS

47、. Check for errors in attribute values.,Page 60,Incorrect Local Name主机名不一致,A service group cannot be brought online if the system name is inconsistent in llthosts, llttab, or main.cf files. 如果在llthosts,llttab和main.cf中的主机名不一致则这个服务组不会被online Check each file for consistent use of system names.检查这些文件 Co

48、rrect any discrepancies. 修改成一致的 If main.cf is changed, stop and restart VCS. 如果main.cf 被修改了,停止和重启VCS If ltthosts or ltttab is changed:如果llthosts和llttab修改了,停止VCS,gab,和llt,重新启动llt,gab和VCS Stop VCS, GAB, and LLT. Restart LLT, GAB, and VCS.,Page 61,Concurrency Violations 网络冲突,Occurs when a failover serv

49、ice group is online or partially online on more than one system失效转移类型的服务组在多个节点上运行就会导致冲突 Notification provided by the Violation trigger: Invoked on the system that caused the concurrency violation Notifies the administrator and takes the service group offline on the system causing the violation Configured by default with the violation script in /opt/VRTSvcs/bin/triggers Can be customized: Send message to the system log. Display warning on all cluster systems. Send e-mail messages.,

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 企业管理 > 管理学资料

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报