观察警告日志时发现有些数据库存在一些情况,针对这些情况,做如下的分析及建议,下面的数据均来自10.2.10.39数据库。
一、问题的发现
警告日志
Current log# 3 seq# 159565 mem# 0: +SMS_DATA/sms/onlinelog/group_3.274.666752077
Current log# 3 seq# 159565 mem# 1: +SMS_DATA/sms/onlinelog/group_3.305.669404401
Wed Jun 23 09:40:12 2010
Thread 2 cannot allocate new log, sequence 159566
Checkpoint not complete
Current log# 3 seq# 159565 mem# 0: +SMS_DATA/sms/onlinelog/group_3.274.666752077
Current log# 3 seq# 159565 mem# 1: +SMS_DATA/sms/onlinelog/group_3.305.669404401
Wed Jun 23 09:40:15 2010
Thread 2 advanced to log sequence 159566 (LGWR switch)
Current log# 4 seq# 159566 mem# 0: +SMS_DATA/sms/onlinelog/group_4.275.666752077
Current log# 4 seq# 159566 mem# 1: +SMS_DATA/sms/onlinelog/group_4.306.669404413
Wed Jun 23 09:42:00 2010
Thread 2 cannot allocate new log, sequence 159567
Checkpoint not complete
Current log# 4 seq# 159566 mem# 0: +SMS_DATA/sms/onlinelog/group_4.275.666752077
Current log# 4 seq# 159566 mem# 1: +SMS_DATA/sms/onlinelog/group_4.306.669404413
Wed Jun 23 09:42:03 2010
Thread 2 advanced to log sequence 159567 (LGWR switch)
Current log# 3 seq# 159567 mem# 0: +SMS_DATA/sms/onlinelog/group_3.274.666752077
Current log# 3 seq# 159567 mem# 1: +SMS_DATA/sms/onlinelog/group_3.305.669404401
Wed Jun 23 09:44:03 2010
Thread 2 cannot allocate new log, sequence 159568
Checkpoint not complete
Current log# 3 seq# 159567 mem# 0: +SMS_DATA/sms/onlinelog/group_3.274.666752077
Current log# 3 seq# 159567 mem# 1: +SMS_DATA/sms/onlinelog/group_3.305.669404401
Wed Jun 23 09:44:06 2010
Thread 2 advanced to log sequence 159568 (LGWR switch)
Current log# 4 seq# 159568 mem# 0: +SMS_DATA/sms/onlinelog/group_4.275.666752077
Current log# 4 seq# 159568 mem# 1: +SMS_DATA/sms/onlinelog/group_4.306.669404413
Wed Jun 23 09:46:00 2010
Thread 2 cannot allocate new log, sequence 159569
Checkpoint not complete
Current log# 4 seq# 159568 mem# 0: +SMS_DATA/sms/onlinelog/group_4.275.666752077
Current log# 4 seq# 159568 mem# 1: +SMS_DATA/sms/onlinelog/group_4.306.669404413
Wed Jun 23 09:46:03 2010
Thread 2 advanced to log sequence 159569 (LGWR switch)
Current log# 3 seq# 159569 mem# 0: +SMS_DATA/sms/onlinelog/group_3.274.666752077
Current log# 3 seq# 159569 mem# 1: +SMS_DATA/sms/onlinelog/group_3.305.669404401
Wed Jun 23 09:47:36 2010
Thread 2 cannot allocate new log, sequence 159570
Checkpoint not complete
Current log# 3 seq# 159569 mem# 0: +SMS_DATA/sms/onlinelog/group_3.274.666752077
Current log# 3 seq# 159569 mem# 1: +SMS_DATA/sms/onlinelog/group_3.305.669404401
Wed Jun 23 09:47:40 2010
Thread 2 advanced to log sequence 159570 (LGWR switch)
Current log# 4 seq# 159570 mem# 0: +SMS_DATA/sms/onlinelog/group_4.275.666752077
Current log# 4 seq# 159570 mem# 1: +SMS_DATA/sms/onlinelog/group_4.306.669404413
原因和后果:
重点分析这两行
Thread 2 cannot allocate new log, sequence 159570
Checkpoint not complete
这两行说明日志将被重用的时候,该日志上涉及的数据的checkpoint不能正常完成,所以数据库发生等待,影响数据库的性能,影响事务完成的速度,使数据库出现有停顿的感觉。
checkpoint是一个数据库事件,它将已修改的数据从高速缓存刷新到磁盘,并更新控制文件和数据文件,DBWn在写出脏块时如果发现对应的日志还在日志缓冲区,就触发LGWR写出日志条.
二. 看I/O 情况
数据库存储状况:
SQL> col name format a10
SQL> col path format a10
SQL> select g.name, d.path,g.total_mb,g.free_mb from v$asm_disk d, v$asm_diskgroup g where d.group_number=g.group_number;
NAME PATH TOTAL_MB FREE_MB
---------- ------------------------- ---------- ----------
SMS_DATA ORCL:VOL1 953686 167085
SMS_DATA ORCL:VOL2 953686 167085
SMS_BAK ORCL:VOL3 476843 141991
注:数据库所使用的磁盘组情况。
[oracle@db2 bdump]$ /etc/init.d/oracleasm querydisk VOL1 VOL2 VOL3
Disk "VOL1" is a valid ASM disk on device [8, 2]
Disk "VOL2" is a valid ASM disk on device [8, 3]
Disk "VOL3" is a valid ASM disk on device [8, 4]
[oracle@db2 bdump]$ ls -la /dev/sd*
brw-rw---- 1 root disk 8, 0 Sep 30 2008 /dev/sda
brw-rw---- 1 root disk 8, 1 Sep 30 2008 /dev/sda1
brw-rw---- 1 root disk 8, 2 Sep 30 2008 /dev/sda2
brw-rw---- 1 root disk 8, 3 Sep 30 2008 /dev/sda3
brw-rw---- 1 root disk 8, 4 Sep 30 2008 /dev/sda4
说明:+SMS_DATA对应的磁盘是/dev/sda2,/dev/sda3,+SMS_BAK对应的磁盘是/dev/sda4
SQL> l
1* select a.member,b.GROUP#,b.THREAD#,b.BYTES,b.MEMBERS,b.ARCHIVED,b.STATUS from v$logfile a,v$log b where a.group#=b.group#
SQL> /
MEMBER GROUP# THREAD# BYTES MEMBERS ARC STATUS
--------------------------------------------- ------ ------- ---------- ------- --- ----------
+SMS_DATA/sms/onlinelog/group_1.266.666750973 1 1 52428800 2 YES ACTIVE
+SMS_DATA/sms/onlinelog/group_2.267.666750973 2 1 52428800 2 NO CURRENT
+SMS_DATA/sms/onlinelog/group_3.274.666752077 3 2 52428800 2 YES ACTIVE
+SMS_DATA/sms/onlinelog/group_4.275.666752077 4 2 52428800 2 NO CURRENT
+SMS_DATA/sms/onlinelog/group_1.303.669404377 1 1 52428800 2 YES ACTIVE
+SMS_DATA/sms/onlinelog/group_2.304.669404389 2 1 52428800 2 NO CURRENT
+SMS_DATA/sms/onlinelog/group_3.305.669404401 3 2 52428800 2 YES ACTIVE
+SMS_DATA/sms/onlinelog/group_4.306.669404413 4 2 52428800 2 NO CURRENT
8 rows selected.
说明:日志文件放在+SMS_DATA,所对应的磁盘设备是/dev/sda2,/dev/sda3
下面查看系统的I/O:
[oracle@db2 ~]$ iostat -d -x -k 1 5
Linux 2.6.9-42.7AXlargesmp (db2) 06/23/10
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 7.02 0.01 2.09 0.67 72.94 0.33 36.47 35.07 0.08 36.34 6.18 1.30
sda 8.12 3.28 69.73 56.26 17.16 55.85 8.58 27.93 0.58 0.06 0.49 0.43 5.38
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 17.00 0.00 2.00 0.00 152.00 0.00 76.00 76.00 0.01 7.50 7.50 1.50
sda 21.00 0.00 331.00 96.00 72948.00 520.00 36474.00 260.00 172.06 1.08 2.52 2.03 86.60
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 12.12 0.00 2.02 0.00 113.13 0.00 56.57 56.00 0.01 6.50 6.50 1.31
sda 0.00 0.00 303.03 98.99 58701.01 449.49 29350.51 224.75 147.13 1.24 3.09 2.34 94.24
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 262.63 97.98 51834.34 426.26 25917.17 213.13 144.92 1.21 3.36 2.63 94.75
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 30.69 0.00 3.96 0.00 277.23 0.00 138.61 70.00 0.03 8.25 6.25 2.48
sda 0.00 0.00 265.35 194.06 54069.31 826.73 27034.65 413.37 119.49 1.14 2.50 1.97 90.50
说明如下:
rrqm/s:每秒这个设备相关的读取请求有多少被Merge了(当系统调用需要读取数据的 时候,VFS将请求发到各个FS,如果FS发现不同的读取请求读取的是相同Block的数据,FS会将这个请求合并Merge);wrqm/s:每秒这个 设备相关的写入请求有多少被Merge了。
rsec/s:每秒读取的扇区数;wsec/: 每秒写入的扇区数。r/s:The number of read requests that were issued to the device per second;w/s:The number of write requests that were issued to the device per second;
await:每一个IO请求的处理的平均时间(单位是微秒)。这里可以理解为IO的响应时 间,一般地系统IO响应时间应该低于5ms,如果大于10ms就比较大了。
%util:在统计时间内所有处理IO时间,除以总共统计时间。例如,如果统计间隔1秒,该 设备有0.8秒在处理IO,而0.2秒闲置,那么该设备的%util = 0.8/1 = 80%,所以该参数暗示了设备的繁忙程度。一般地,如果该参数是100%表示设备已经接近满负荷运行了(当然如果是多磁盘,即使%util是100%,因 为磁盘的并发能力,所以磁盘使用未必就到了瓶颈)。
三、oracle 10g checkpoint的调整
从oracle 10g开始,数据库可以实现自动调整的检查点,使用自动调整的检查点,oracle数据库可以利用系统的低I/O负载时段写出内存中的脏数据,从而提高数据库的效率。因此,即使设置了不合理的检查点相关参数,oracle仍然能够通过自动调整将数据库的Crash Recovery 时间控制在合理的范围之内。
当FAST_START_MTTR_TARGET 参数未设定是,自动检查点生效。
SQL> show parameter fast_start_mttr_target
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_mttr_target integer 0
说明:数据库是可以实现自动调整的检查点
四、确定重做日志文件的大小是否存在问题
下面显示了两个日志切换的间隔时间
SQL> select b.recid,
2 to_char(b.first_time,'YYYY-MM-DD HH24:MI:SS') START_TIME,a.recid,
3 to_char(a.first_time,'YYYY-MM-DD HH24:MI:SS') END_TIME,
4 round((a.first_time - b.first_time) * 24 * 60, 2) minutes
5 from v$log_history a, v$log_history b
6 where a.recid = b.recid + 1
7 order by a.first_time desc
8 /
RECID START_TIME RECID END_TIME MINUTES
---------- ------------------- ---------- ------------------- ----------
596092 2010-06-23 10:07:51 596093 2010-06-23 10:10:08 2.28
596090 2010-06-23 10:08:42 596091 2010-06-23 10:09:24 .7
596089 2010-06-23 10:07:57 596090 2010-06-23 10:08:42 .75
596088 2010-06-23 10:05:45 596089 2010-06-23 10:07:57 2.2
596091 2010-06-23 10:09:24 596092 2010-06-23 10:07:51 -1.55
596086 2010-06-23 10:06:33 596087 2010-06-23 10:07:18 .75
596085 2010-06-23 10:05:51 596086 2010-06-23 10:06:33 .7
596084 2010-06-23 10:04:11 596085 2010-06-23 10:05:51 1.67
596087 2010-06-23 10:07:18 596088 2010-06-23 10:05:45 -1.55
596082 2010-06-23 10:04:45 596083 2010-06-23 10:05:20 .58
596081 2010-06-23 10:04:15 596082 2010-06-23 10:04:45 .5
RECID START_TIME RECID END_TIME MINUTES
---------- ------------------- ---------- ------------------- ----------
596080 2010-06-23 10:02:11 596081 2010-06-23 10:04:15 2.07
596083 2010-06-23 10:05:20 596084 2010-06-23 10:04:11 -1.15
596078 2010-06-23 10:02:56 596079 2010-06-23 10:03:34 .63
596077 2010-06-23 10:02:15 596078 2010-06-23 10:02:56 .68
596076 2010-06-23 10:00:05 596077 2010-06-23 10:02:15 2.17
596079 2010-06-23 10:03:34 596080 2010-06-23 10:02:11 -1.38
596074 2010-06-23 10:00:47 596075 2010-06-23 10:01:32 .75
596073 2010-06-23 10:00:08 596074 2010-06-23 10:00:47 .65
596072 2010-06-23 09:58:13 596073 2010-06-23 10:00:08 1.92
596075 2010-06-23 10:01:32 596076 2010-06-23 10:00:05 -1.45
596070 2010-06-23 09:58:52 596071 2010-06-23 09:59:32 .67
RECID START_TIME RECID END_TIME MINUTES
---------- ------------------- ---------- ------------------- ----------
596069 2010-06-23 09:58:17 596070 2010-06-23 09:58:52 .58
596068 2010-06-23 09:56:07 596069 2010-06-23 09:58:17 2.17
596071 2010-06-23 09:59:32 596072 2010-06-23 09:58:13 -1.32
596066 2010-06-23 09:56:59 596067 2010-06-23 09:57:40 .68
596065 2010-06-23 09:56:11 596066 2010-06-23 09:56:59 .8
596064 2010-06-23 09:53:52 596065 2010-06-23 09:56:11 2.32
596067 2010-06-23 09:57:40 596068 2010-06-23 09:56:07 -1.55
596062 2010-06-23 09:54:42 596063 2010-06-23 09:55:29 .78
596061 2010-06-23 09:53:58 596062 2010-06-23 09:54:42 .73
596060 2010-06-23 09:52:10 596061 2010-06-23 09:53:58 1.8
596063 2010-06-23 09:55:29 596064 2010-06-23 09:53:52 -1.62
说明:日志文件的切换间隔时间不合理。oracle建议使日志切换的发生间隔在15到30分钟之间为宜
建议:
(1) 增大日志文件
(2) 增加日志组以增加覆盖的时间间隔。
来自oracle官方的建议
1.把重做日志文件和数据库数据文件放在两个不同的硬盘上面,此时,任何一个硬盘发生损坏,都可以凭借另外一块硬盘的数据,来挽回损失。
2. 如采用归档模式,应该将重做日志成员放置到不同的硬盘中去,以消除LGWR和ARCH后台进程对重做日志成员的争夺。
3。不该把日志文件存放在非常活跃的数据或索引表空间的硬盘上,这会降低数据库正常读取的效率。
五、 具体操作方法:
方法一:添加日志文件组
---------------------------
ALTER DATABASE ADD LOGFILE thread 1 group 5('+SMS_DATA/sms/onlinelog/group_1.501','+SMS_DATA/sms/onlinelog/group_1.502') size 50M
ALTER DATABASE ADD LOGFILE thread 1 group 6('+SMS_DATA/sms/onlinelog/group_1.601','+SMS_DATA/sms/onlinelog/group_1.602') size 50M
ALTER DATABASE ADD LOGFILE thread 2 group 7('+SMS_DATA/sms/onlinelog/group_1.701','+SMS_DATA/sms/onlinelog/group_1.702') size 50M
ALTER DATABASE ADD LOGFILE thread 2 group 8('+SMS_DATA/sms/onlinelog/group_1.801','+SMS_DATA/sms/onlinelog/group_1.802') size 50M
方法二:添加日志文件组 ,并修改日志文件的大小
---------------------------
添加日志组:
ALTER DATABASE ADD LOGFILE thread 1 group 5('+SMS_DATA/sms/onlinelog/group_1.501','+SMS_DATA/sms/onlinelog/group_1.502') size 100M
ALTER DATABASE ADD LOGFILE thread 1 group 6('+SMS_DATA/sms/onlinelog/group_1.601','+SMS_DATA/sms/onlinelog/group_1.602') size 100M
ALTER DATABASE ADD LOGFILE thread 2 group 7('+SMS_DATA/sms/onlinelog/group_1.701','+SMS_DATA/sms/onlinelog/group_1.702') size 100M
ALTER DATABASE ADD LOGFILE thread 2 group 8('+SMS_DATA/sms/onlinelog/group_1.801','+SMS_DATA/sms/onlinelog/group_1.802') size 100M
切换当前日志到新的日志组
SQL> alter system switch logfile;
删除旧的日志组
alter database drop logfile group 1;
alter database drop logfile group 2;
alter database drop logfile group 3;
alter database drop logfile group 4;
删除旧的日志组后重新生成相应的日志组
ALTER DATABASE ADD LOGFILE thread 1 group 1('+SMS_DATA/sms/onlinelog/group_1.101','+SMS_DATA/sms/onlinelog/group_1.102') size 100M reuse
ALTER DATABASE ADD LOGFILE thread 1 group 2('+SMS_DATA/sms/onlinelog/group_1.201','+SMS_DATA/sms/onlinelog/group_1.202') size 100M reuse
ALTER DATABASE ADD LOGFILE thread 2 group 1('+SMS_DATA/sms/onlinelog/group_2.101','+SMS_DATA/sms/onlinelog/group_2.102') size 100M reuse
ALTER DATABASE ADD LOGFILE thread 2 group 2('+SMS_DATA/sms/onlinelog/group_2.201','+SMS_DATA/sms/onlinelog/group_2.202') size 100M reuse
六、对数据库的影响
删除的日志组注意STATUS要在INACTIVE,最好在业务较闲的时候操作,对数据库基本上没什么影响。
|
|