1、1.1 登陆操作 CUDB 节点中有三类板卡,分别是 GEP3 板,SCXB(DMX)板和 NWI-E板。 我们需要登录这些板子收集相应的日志,可以用 SecureCRT,terminal或者其他 SSH 客户软件登录这些板卡。有两种方式可以登陆到 CUDB:1) Console 直连Console 直连的方式在日常操作维护中不推荐使用。通过 Console 直连的操作一般为对于硬件的操作,如更换板卡。CUDB 系统 Console 连接配置表。硬件名称 波特率 数据位 奇偶校验 停止位 流控SCXB 115200 8 None 1 NoneGEP3 115200 8 None 1 NoneN
2、WI-E 9600 8 None 1 None2) 通过网管网络连接在对于 CUDB 的日常操作维护时,推荐通过网管网络连接CUDB。从 OSS 登陆 SC 板卡和 DMX 板卡使用 SSH 协议,登陆 NWI 使用TELNET 协议。CUDB 系统网管登陆信息表登陆节点 登陆方式 端口 用户名 密码 登陆命令CUDB GEP3 SSH 22 root rootroot ssh root DMX SSH 2024 expert expert ssh expert -p 2024NWI Telnet 23 admin telnet 1.2 CUDB 系统检查通常情况下以下检查应该包括在每日健康检
3、查中。1.2.1 CUDB 总体系统检查验证整个系统状态。在 CUDB 某块 SC 板卡上执行这些指令。执行指令:# cudbSystemStatus命令描述:这条命令自动执行下面的系统状态检查。预期结果:Execution date: Tue Mar 25 11:29:36 CST 2014CUDB Software Version:!- CUDB DESIGN DISTRIBUTION: CUDB13B CXP9020214/6 R1KChecking BC clusters:Site 1SM leader: Node 1 OAM2Node 10.173.0.2BC server in S
4、C_2_1 . runningBC server in SC_2_2 . running (Leader)BC server in PL_2_5 . runningSite 2NoLeaderNode 10.173.0.34BC server in SC_2_1 . runningBC server in SC_2_2 . runningBC server in PL_2_5 . runningChecking System Monitor BC status in local node:SM-BC in OAM1 . runningSM-BC in OAM2 . runningCheckin
5、g Clusters status:Node 1:PL Cluster (2%) .OKDSG1 Cluster (1%) .OKDSG2 Cluster (1%) .OKDSG3 Cluster (1%) .OKDSG4 Cluster (1%) .OKDSG5 Cluster (1%) .OKDSG6 Cluster (1%) .OKDSG7 Cluster (1%) .OKDSG8 Cluster (1%) .OKDSG9 Cluster (1%) .OKDSG10 Cluster (1%) .OKDSG11 Cluster (1%) .OKDSG12 Cluster (1%) .OKD
6、SG13 Cluster (1%) .OKNode 2:PL Cluster (2%) .OKDSG1 Cluster (1%) .OKDSG2 Cluster (1%) .OKDSG3 Cluster (1%) .OKDSG4 Cluster (1%) .OKDSG5 Cluster (1%) .OKDSG6 Cluster (1%) .OKDSG7 Cluster (1%) .OKDSG8 Cluster (1%) .OKDSG9 Cluster (1%) .OKDSG10 Cluster (1%) .OKDSG11 Cluster (1%) .OKDSG12 Cluster (1%) .
7、OKDSG13 Cluster (1%) .OKChecking NDB status:PL NDBs (6/6) .OKDS1 NDBs (2/2) .OKDS2 NDBs (2/2) .OKDS3 NDBs (2/2) .OKDS4 NDBs (2/2) .OKDS5 NDBs (2/2) .OKDS6 NDBs (2/2) .OKDS7 NDBs (2/2) .OKDS8 NDBs (2/2) .OKDS9 NDBs (2/2) .OKDS10 NDBs (2/2) .OKDS11 NDBs (2/2) .OKDS12 NDBs (2/2) .OKDS13 NDBs (2/2) .OKC
8、hecking Replication Channels in the System:Node | 1 | 2 =PLDB _|_M_|_S1_DSG 1 _|_M_|_S1_DSG 2 _|_M_|_S2_DSG 3 _|_M_|_S1_DSG 4 _|_M_|_S1_DSG 5 _|_M_|_S2_DSG 6 _|_M_|_S2_DSG 7 _|_M_|_S1_DSG 8 _|_M_|_S2_DSG 9 _|_M_|_S1_DSG 10 _|_M_|_S2_DSG 11 _|_M_|_S2_DSG 12 _|_M_|_S1_DSG 13 _|_M_|_S2_Printing Alarms.
9、Mar 23 12:50:05( Preventive Maintenance Logchecker has found major error(s). )Checking MySQL server connection:MySQL Master Servers connection .OKMySQL Slave Servers connection .OKMySQL Access Servers connection .OKChecking Process:OAMs.Cluster Supervisor.RunningSystem Monitor BC.RunningReconciliati
10、on process.Running in: OAM2 Smp-client.RunningManagement Server Process (ndb_mgmd).RunningKeepAlive process.RunningESA.RunningLDAP counter.RunningLog Handler process.RunningPLs.Storage Engine process (ndbd).RunningLDAP FE.RunningKeepAlive process.RunningMySQL server process (Master).RunningMySQL ser
11、ver process (Slave).RunningMySQL server process (Access).RunningCudbNotifications process.RunningLDAP FE Monitor process.RunningDSs.Storage Engine process (ndbd).RunningLDAP FE.RunningKeepAlive process.RunningMySQL server process (Master).RunningMySQL server process (Slave).RunningMySQL server proce
12、ss (Access).RunningLDAP FE Monitor process.Running1.2.2 HA 状态检查在 CUDB Active OAM 板卡上验证所有 GEP3 板加入到 cluster 中。执行指令:#cudbHaState预期结果:LOTC cluster uptime:-Thu Mar 27 18:13:44 2014LOTC cluster state:-Node safNode=SC_2_1 joined cluster | Thu Mar 27 18:13:44 2014Node safNode=SC_2_2 joined cluster | Thu Ma
13、r 27 18:14:23 2014Node safNode=PL_2_3 joined cluster | Thu Mar 27 18:15:21 2014Node safNode=PL_2_4 joined cluster | Thu Mar 27 18:15:25 2014.AMF cluster state:-saAmfNodeAdminState.“safAmfNode=SC-1,safAmfCluster=myAmfCluster“: UnlockedsaAmfNodeOperState.“safAmfNode=SC-1,safAmfCluster=myAmfCluster“: E
14、nabledsaAmfNodeAdminState.“safAmfNode=SC-2,safAmfCluster=myAmfCluster“: UnlockedsaAmfNodeOperState.“safAmfNode=SC-2,safAmfCluster=myAmfCluster“: EnabledsaAmfNodeAdminState.“safAmfNode=PL-3,safAmfCluster=myAmfCluster“: UnlockedsaAmfNodeOperState.“safAmfNode=PL-3,safAmfCluster=myAmfCluster“: EnabledCo
15、reMW HA state:-CoreMW is assigned as ACTIVE in controller SC-1CoreMW is assigned as STANDBY in controller SC-2COM state:-COM is assigned as ACTIVE in controller SC-1COM is assigned as STANDBY in controller SC-2SI HA state:-saAmfSISUHAState.“safSu=SC-1,safSg=2N,safApp=ERIC-CUDB_BC_SERVER_MONITOR“.“sa
16、fSi=2N-1“: active(1)saAmfSISUHAState.“safSu=SC-1,safSg=2N,safApp=ERIC-CUDB_LDAPFE_MONITOR“.“safSi=2N-1“: active(1)saAmfSISUHAState.“safSu=SC-1,safSg=DS3_2N,safApp=ERIC-CUDB_CS“.“safSi=DS3_2N-1“: active(1)saAmfSISUHAState.“safSu=SC-1,safSg=DS4_2N,safApp=ERIC-CUDB_CS“.“safSi=DS4_2N-1“: active(1)saAmfS
17、ISUHAState.“safSu=SC-1,safSg=DS13_2N,safApp=ERIC-CUDB_CS“.“safSi=DS13_2N-1“: active(1)saAmfSISUHAState.“safSu=SC-1,safSg=DS12_2N,safApp=ERIC-CUDB_CS“.“safSi=DS12_2N-1“: active(1)saAmfSISUHAState.“safSu=SC-1,safSg=DS11_2N,safApp=ERIC-CUDB_CS“.“safSi=DS11_2N-1“: active(1)saAmfSISUHAState.“safSu=SC-1,s
18、afSg=DS2_2N,safApp=ERIC-CUDB_CS“.“safSi=DS2_2N-1“: active(1)saAmfSISUHAState.“safSu=SC-1,safSg=DS1_2N,safApp=ERIC-CUDB_CS“.“safSi=DS1_2N-1“: active(1)saAmfSISUHAState.“safSu=SC-1,safSg=DS7_2N,safApp=ERIC-CUDB_CS“.“safSi=DS7_2N-1“: active(1)saAmfSISUHAState.“safSu=Control1,safSg=2N,safApp=ERIC-EVIP“.
19、“safSi=2N“: active(1).SU States:-Status OK1.2.3 CMW 状态查询在某块 SC 板卡上输出所有 CUDB servers (OAM, PL and DS) 的磁盘使用率。执行指令:# cmw-status app csiass comp node sg si siass su pm命令描述:检查 CMW 状态。1.2.4 检查磁盘使用率在某块 SC 板卡上输出所有 CUDB servers (OAM, PL and DS) 的磁盘使用率。执行指令:for a in awk /node/ print $4 /cluster/etc/cluster.c
20、onf;doecho $a; ssh $a df -h;done;命令描述:检查磁盘使用率。预期结果:SC_2_1Filesystem Size Used Avail Use% Mounted onrootfs 2.0G 1.5G 543M 74% /root 2.0G 1.5G 543M 74% /tmpfs 12G 740K 12G 1% /dev/shmshm 12G 740K 12G 1% /dev/shm/dev/sdb1 4.0G 220M 3.6G 6% /boot/dev/sdb2 9.9G 3.5G 6.0G 37% /var/log/dev/mapper/cluster_v
21、g-data_lv 63G 11G 50G 18% /.cluster192.168.0.100:/.cluster 63G 11G 50G 18% /cluster/dev/sdb7 136G 1.2G 128G 1% /localcom_fuse_module 2.0G 1.5G 543M 74% /var/filem/nbi_rootSC_2_2Filesystem Size Used Avail Use% Mounted onrootfs 2.0G 1.5G 544M 74% /root 2.0G 1.5G 544M 74% /tmpfs 12G 740K 12G 1% /dev/sh
22、mshm 12G 740K 12G 1% /dev/shm/dev/sdb1 4.0G 220M 3.6G 6% /boot/dev/sdb2 9.9G 3.5G 5.9G 38% /var/log192.168.0.100:/.cluster 63G 11G 50G 18% /cluster/dev/sdb7 136G 1.1G 128G 1% /local1.2.5 检查网络状态输出所有 CUDB servers (OAM, PL and DS) 在每个接口的网络状态。执行指令:for a in awk /node/ print $4 /cluster/etc/cluster.conf;d
23、o echo $a; ssh $a netstat -i;done;命令描述:这条命令输出系统的网络连接,路由表,接口信息,组播连接信息。用 i 选项,显示所有网络接口的状态表。预期结果:CUDB1 SC_2_1 # netstat -iwarning: no inet socket available: SuccessKernel Interface tableIface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flgbond0 1500 0 29292700 0 0 0 20908491 0 0 0 BMm
24、RUbond1 1500 0 62795 0 0 0 2407895 0 0 0 BMmRUbond1:1 1500 0 - no statistics available - BMmRUbond1:2 1500 0 - no statistics available - BMmRUeth0 1500 0 28197145 0 0 0 20908491 0 0 0 BMsRUeth1 1500 0 31394 0 0 0 2407895 0 0 0 BMsRUeth2 1500 0 1095555 0 0 0 0 0 0 0 BMsRUeth3 1500 0 31401 0 0 0 0 0 0 0 BMsRUlo 16436 0 313493589 0 0 0 313493589 0 0 0 LRU1.2.6 检查 CPU 负载登陆某块 SC 板卡进行 CUDB CPU 负载查询。执行指令: 按 “ctrl + c” 可以退出并回到 CLI 模式。#cudbMpstat 命令描述:这条命令用于收集和报告每块板卡上的 CPU 性能统计信息。