mysql checkpoint理解

在mysql的doc并没有深入介绍checkpoint,本文主要对网上两篇文章阅读加上自己的理解,描述一下transaction log(redo log)和checkpoint之间的关系。
Innodb实现了Fuzzy Checkpoint和Sharp Checkpoint的机制,但在事务日志中,采用了Fuzzy Checkpoint,那何为Fuzzy Checkpoint? Innodb每次取最老的modified page(last checkpoint)对应的LSN,确保此page位置的LSN之前page都已flush到redo log,当mysql crash后,mysql扫描redo log,从last checkpoint开始apply redo log到buffer pool,直到last checkpoint对应的LSN等于Log flushed up to对应的LSN,mysql认为恢复完成。

查看redo log和checkpoint信息:

root@(none) 11:37:27>show engine innodb status \G;

 

 

 

Innodb的checkpoint和redo log有哪些紧密关系?
有几上名词需要解释一下:

Ckp age(动态移动):最老的dirty page还没有flush到数据文件,即没有做last checkpoint的范围
Buf age(动态移动):modified page information没有写到log中,但已在log buffer
Buf async(固定点):日志空间大小的7/8,当buf age移动到Buf async点时,强制把没有写到log中的modified page information开始写入到log中,不阻塞事务
Buf sync(固定点): 日志空间大小的15/16,当写入很大的,buf age移动非常快,一下子到buf sync的点,阻塞事务,强制把modified page information开始写入到log中。如果不阻塞事务,未做last checkpoint的redo log存在覆盖危险
Ckp async(固定点): 日志空间大小的31/32,当ckp age到达ckp async,强制做last checkpoint,不阻塞事务
Ckp sync(固定点):日志空间大小,当ckp age到达ckp sync,强制做last checkpoint,阻塞事务,存在redo log覆盖的危险

接下分析4种情况
1、 如果buf age在buf async和buf sync之间
2、 如果buf age在buf sync之后(当然这种情况是不存在,mysql有保护机制)
3、 如果ckp age在ckp async和ckp sync之间(这种情况是不存在)
4、 如果ckp age在ckp sync之后(这种情况是不存在)

第一种情况:

当写入量巨大时,buf age移动到buf async和buf sync之间,触发写出到log中,mysql把尽量多的log写出,如果写入量减慢,buf age又移回到“图一”状态。如果写入量大于flush log的速度,buf age最终会和buf sync重叠,这时所有的事务都被阻塞,强制将2*(Buf age-Buf async)的脏页刷盘,这时IO会比较繁忙。

第二种情况:

当然这种情况是不可能出现,因为如果出现,redo log存在覆盖的可能,数据就会丢失。buf age会越过log size,buf age的大小可能就超过log size,如果要刷buf age,那么整个log size都不够容纳所有的buf age。

第三种和第四种情况不存在分析:

ckp age始终位于buf age的后面(左边),因为ckp age是last checkpoint点,总是追赶buf age(将尽可能多的modified page flush到磁盘),所以buf age肯定是先到达到buf sync。

疑问:ckp async及ckp sync存在意义?

BTW:上周听了褚霸关于OS及mysql分享,收益不少,讲到page cache时候,也提到page cache也存在high water及low water,当dirty page触到low water时,os是开始flush dirty page到磁盘,到high water时,会阻塞一切动作,os会疯狂的flush dirty page,磁盘会很忙,存在IO Storm,如果来避免?就想褚霸说的,innodb其实就是操作系统,只是业务场景不同。Percona的Vadim Tkachenko提出增加类似mid water的点,详细参考(http://www.mysqlperformanceblog.com/2011/04/04/innodb-flushing-theory-and-solutions/)

本文参考:

http://blog.csdn.net/yah99_wolf/article/category/539408

create table as 与binlog

使用create table as时,binlog_format不同,存在binlog中的内容也不是一样的。
1、statement与mixed,会和执行语句一样被记录到binlog中
2、row,在binlog会被拆成create table和insert into

mysql> select @@version;
+------------+
| @@version  |
+------------+
| 5.1.56-log |
+------------+
mysql> show variables like '%ignore%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| ignore_builtin_innodb | ON    |
+-----------------------+-------+

1、binlog_format:ROW

mysql> flush logs;
mysql>  create table b as select * from a;
Records: 1  Duplicates: 0  Warnings: 0
mysql> flush logs;
[root@dg3 var]# /usr/local/mysql/bin/mysqlbinlog -v -v mysql-bin.000022
SET TIMESTAMP=1321920783/*!*/;
CREATE TABLE `b` (
  `a` int(11) NOT NULL DEFAULT '0',
  `b` varchar(10) DEFAULT NULL
)
 
BINLOG '
D+nKThMCAAAAKQAAAGkBAAAAAFcAAAAAAAEAAmx4AAFiAAIDDwIUAAI=
D+nKThcCAAAAJQAAAI4BAAAAAFcAAAAAAAEAAv/8AQAAAAJseA==
'/*!*/;
### INSERT INTO lx.b
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='lx' /* VARSTRING(20) meta=20 nullable=1 is_null=0 */
# at 398

2、binlog_format:MIXED、STATEMENT

[root@dg3 var]# /usr/local/mysql/bin/mysqlbinlog -v -v mysql-bin.000026
 
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
create table b1 as select * from a
/*!*/;
# at 201
#111122  8:17:42 server id 2  end_log_pos 244   Rotate to mysql-bin.000027  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

max_seeks_for_key

看到max_seeks_for_key让我想到了oracle的optimizer_index_cost_adj,索引扫描与全表扫描成本比较

文档:
Limit the assumed maximum number of seeks when looking up rows based on a key. The MySQL optimizer assumes that no more than this number of key seeks are required when searching for matching rows in a table by scanning an index, regardless of the actual cardinality of the index (see Section 12.7.5.23, “SHOW INDEX Syntax”). By setting this to a low value (say, 100), you can force MySQL to prefer indexes instead of table scans.

但从测试来看,总感觉max_seeks_for_key没起作用

+------------+
| version()  |
+------------+
| 5.1.48-log |
+------------+
 
root@xx 08:29:28>select count(*),user_id from t group by user_id;
+----------+---------+
| count(*) | user_id |
+----------+---------+
|   445344 |      10 |
+----------+---------+
1 row in set (1.09 sec)
 
root@xx 08:31:40>show variables like 'max_seeks_for_key';
+-------------------+----------------------+
| Variable_name     | Value                |
+-------------------+----------------------+
| max_seeks_for_key | 18446744073709547520 |
+-------------------+----------------------+
1 row in set (0.00 sec)
 
root@xx 08:31:55>explain select * from t where user_id=10;
+----+-------------+-------+------+---------------+---------+---------+-------+--------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows   | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+--------+-------------+
|  1 | SIMPLE      | t     | ref  | user_id       | user_id | 9       | const | 213510 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+--------+-------------+
 
root@(none) 08:32:34>set global max_seeks_for_key=1;
Query OK, 0 rows affected (0.00 sec)
 
root@(none) 08:32:50>show variables like 'max_seeks_for_key';
+-------------------+----------------------+
| Variable_name     | Value                |
+-------------------+----------------------+
| max_seeks_for_key | 18446744073709547520 |
+-------------------+----------------------+
1 row in set (0.00 sec)
 
root@(none) 08:32:58>exit
 
root@(none) 08:33:01>show variables like 'max_seeks_for_key';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| max_seeks_for_key | 1     |
+-------------------+-------+
1 row in set (0.01 sec)
 
root@xx 08:33:21>explain select * from t where user_id=10;
+----+-------------+-------+------+---------------+---------+---------+-------+--------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows   | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+--------+-------------+
|  1 | SIMPLE      | t     | ref  | user_id       | user_id | 9       | const | 213510 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+--------+-------------+