时间:2021-07-01 10:21:17 帮助过:48人阅读
环境介绍:双机操作系统:solaris10数据库版本:oracle11gR164bit1、半夜接到电话,说数据库报大量错误,起来查看数据库,发现数据库已崩溃,查看alert日志,发
环境介绍:
双机 操作系统:solaris 10 数据库版本:oracle 11g R1 64bit1、半夜接到电话,说数据库报大量错误,起来查看数据库,发现数据库已崩溃,查看alert日志,发现I/O报错:
Wed Dec 18 00:36:57 2013 KCF: write/open error block=0x98abe online=1 file=89 /dev/raw/raw03 error=27063 txt: 'SVR4 Error: 5: I/O error Additional information: -1 Additional information: 8192' Wed Dec 18 00:36:57 2013 KCF: write/open error block=0x9d70f online=1 file=91 /dev/raw/raw05 error=27063 txt: 'SVR4 Error: 5: I/O error Additional information: -1 Additional information: 8192' Automatic datafile offline due to write error onAutomatic datafile offline due to write error on2、之前出过因为工程队碰到线,导致现网问题,问机房人员今晚是否有工程,机房人员说,今晚有新设备接入SAN网络,但经了解,无人碰到线,查看本机系统日志,报错如下:
Dec 17 23:33:10 fly-db01 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@fly6000c5d0008a0000006b131400440 (ssd28): Dec 17 23:33:10 fly-db01 SCSI transport failed: reason 'tran_err': retrying command Dec 17 23:33:10 fly-db01 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@fly6000c5d0008a0000006b131400930 (ssd52):3、查看其他使用到SAN存储的服务器数据库运行情况,alert日志,操作系统日志是否报错,以及挂载的为SAN存储的文件系统是否变成只读,,发现部分数据库也已崩溃,部分主机的文件系统变成只读,操作系统日志报lpfc的错误,之前也出现过这个错误,一般在出现lpfc的错误一段时间后,文件系统就会出现只读
lpfc错误: fly008:/var/log # cat messages | grep lpfc Dec 18 00:34:05 fly008 kernel: [10201542.768302] lpfc 0000:03:00.0: 0:(0):0203 Devloss timeout on WWPN 21:4g:00:0b:5e:6a:18:14 NPort x014400 Data: x40000 x1 x0 Dec 18 00:34:07 fly008 kernel: [10201544.816750] lpfc 0000:03:00.0: 0:(0):0203 Devloss timeout on WWPN 21:4h:00:0b:5e:6a:18:14 NPort x014500 Data: x0 x7 x0 Dec 18 00:34:07 fly008 kernel: [10201544.816802] lpfc 0000:03:00.0: 0:(0):0203 Devloss timeout on WWPN 21:4k:00:0b:5e:6a:18:14 NPort x014600 Data: x0 x7 x0 文件系统只读错误: fly008~ #df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_fly008_app-lv_fly008_app 99G 41G 53G 44% /home/fly008 fly008~ # cd /home/fly008 fly008/home/fly008 # touch 1.txt touch: cannot touch `1.txt': Read-only file system4、对今晚所做配置进行回退,问题消失,重新拉起数据库,数据库正常启动,拉起应用,应用拉起失败,在应用的日志中,报如下错误:
SQLErrorCode: 376 ORA-00376: file 92 cannot be read at this time ORA-01110: data file 92: '/dev/raw/raw06'5、在数据库的alert日志,也报相关错误
DDE: Problem Key 'ORA 1110' was flood controlled (0x5) (no incident) ORA-01110: 数据文件 92: '/dev/raw/raw06' *** 2013-12-18 05:04:16.284 ORA-12012: 自动执行作业 226 出错 ORA-00372: 此时无法修改文件 92 ORA-06512: 在 "FLY.DELETE_FLY_EXCEPTION_INFO", line 8 ORA-06512: 在 line 16、查看数据文件的状态,标记为recover,需要进行恢复操作
SQL> SELECT file_name, file_id, tablespace_name, status, online_status FROM DBA_DATA_FILES ORDER BY TABLESPACE_NAME; FILE_NAME FILE_ID TABLESPACE_NAME STATUS ONLINE_STATUS /dev/raw/raw06 92 FLY AVAILABLE RECOVER7、数据库开启了归档,有数据库的备份,对92的文件进行恢复操作
# su - oracle $ sqlplus / as sysdba SQL> archive log list; SQL> recover datafile 92; SQL> alter database datafile 92 online8、恢复后,应用拉起正常,业务测试正常。
本文出自 “斜阳悠悠寸草心” 博客,请务必保留此出处