统计MySQL的DML

在oracle里可以比较方便的统计(user_tab_modifications)每天对某个表做了多次的DML(IUD),在mysql怎么实现呢?有两种方法:
1、 通过mysql表中的gmt_create、gmt_modified字段来判断
2、 Binlog

一、 在TB所有表中都有有gmt_create、gmt_modified字段,gmt_create表示记录创建时间,gmt_modified表示最近修改时间,如果记录没有修改,gmt_create和gmt_modified一致

root@lx 04:03:40>SHOW CREATE TABLE test_rows \G;
*************************** 1. ROW ***************************
       TABLE: test_rows
CREATE TABLE: CREATE TABLE `test_rows` (
  `a` INT(11) NOT NULL AUTO_INCREMENT,
  `gmt_create` datetime NOT NULL,
  `gmt_modified` datetime NOT NULL,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk;
root@lx 04:27:02>SELECT * FROM test_rows;
+---+---------------------+---------------------+
| a | gmt_create          | gmt_modified        |
+---+---------------------+---------------------+
| 1 | 2012-12-26 16:15:18 | 2012-12-27 16:15:18 |
| 2 | 2012-12-26 16:15:19 | 2012-12-26 16:16:09 |
| 3 | 2012-12-26 16:15:20 | 2012-12-27 16:15:20 |
| 4 | 2012-12-27 16:15:25 | 2012-12-27 16:15:25 |
| 5 | 2012-12-27 16:15:25 | 2012-12-27 16:15:25 |
| 6 | 2012-12-27 16:15:26 | 2012-12-27 16:15:26 |
+---+---------------------+---------------------+

昨天INSERT:

root@lx 04:29:04>SELECT COUNT(*)
    ->   FROM test_rows
    ->  WHERE gmt_create < date_format(DATE(now()), '%Y-%m-%e %H-%i-%s')
    ->    AND gmt_create >= date_format(DATE(date_add(now(), INTERVAL - 1 DAY)),
    ->                                  '%Y-%m-%e %H-%i-%s');

昨天UPDATE:

root@lx 04:31:48>SELECT *
    ->   FROM test_rows
    ->  WHERE gmt_create < date_format(DATE(now()), '%Y-%m-%e %H-%i-%s')
    ->    AND gmt_create >= date_format(DATE(date_add(now(), INTERVAL - 1 DAY)),
    ->                                  '%Y-%m-%e %H-%i-%s')
->    AND date_add(gmt_create, INTERVAL - 1 DAY) != date_add(gmt_modified, INTERVAL - 1 DAY)

昨天DELETE:
每天记录information_schema.TABLES中TABLE_ROWS值,昨天记数数与今天0点记录数相减等到

问题:如果表很大的话,没有索引,这些查询性能会是个问题,可以考虑放到凌晨备库(加个gmt_create索引)去跑

二、 binlog解析

root@lx 04:39:59>INSERT INTO test_rows(gmt_create,gmt_modified) VALUES(date_add(now(),INTERVAL -1 DAY),now());
Query OK, 1 ROW affected (0.01 sec)
 
root@lx 04:40:08>INSERT INTO test_rows(gmt_create,gmt_modified) VALUES(date_add(now(),INTERVAL -1 DAY),now());
Query OK, 1 ROW affected (0.00 sec)
 
root@lx 04:40:09>INSERT INTO test_rows(gmt_create,gmt_modified) VALUES(now(),now());
Query OK, 1 ROW affected (0.00 sec)
 
root@lx 04:40:14>INSERT INTO test_rows(gmt_create,gmt_modified) VALUES(now(),now());
Query OK, 1 ROW affected (0.00 sec)
 
root@lx 04:40:43>UPDATE test_rows SET gmt_modified=now() WHERE a=4;
Query OK, 1 ROW affected (0.00 sec)
ROWS matched: 1  Changed: 1  Warnings: 0
 
root@lx 04:40:48>UPDATE test_rows SET gmt_modified=now() WHERE a=1;
Query OK, 1 ROW affected (0.00 sec)
ROWS matched: 1  Changed: 1  Warnings: 0
root@lx 04:41:04>DELETE FROM test_rows WHERE a=2;
Query OK, 1 ROW affected (0.00 sec)
$ sudo mysqlbinlog --no-defaults -v -v mysql-bin.000081| tr ' ' '\n' | sort | uniq -c | grep -E "INSERT|UPDATE|DELETE"
      1 DELETE
      4 INSERT
      2 UPDATE

如果需要对某个表统计IUD,可以以下命令:

$sudo mysqlbinlog --no-defaults -v -v mysql-bin.000083| grep -w 'lx.t'|uniq -c|grep -E "INSERT|UPDATE|DELETE"
      3 ### INSERT INTO lx.t

问题:如果写入量比较大的话,binlog会很大,解析时间和分析时间会是瓶颈

小结:mysql的ROW_COUNT()能够记录每次DML影响的行数,如果每次影响函数能够累加到内存表中,这样统计就更方便了,修改源码。

《MySQL主库xtrabackup备份导致应用异常》遗留问题(二)

案例重现
Session 1:跑procedure,主要是向表中insert记录

root@lx 10:55:17>call sp_myddl;

session 2: tps上升

                    -QPS- -TPS-
  ins   upd   del    sel   iud|
    0     0     0      0     0|
    0     1     0      1     1|
    0     0     0      1     0|
    0     0     0     19     0|
 1640     1     0      1  1641|
 2305     0     0      1  2305|
 2318     1     0      1  2319|

Session 3:跑长查询

root@lx 10:55:45>select count(*) from test_myddl a,test_myddl b;

session 4:FTWRL

root@lx 10:55:57>flush tables with read lock; --堵住

session 2:所以有tps都会堵住

  ins   upd   del    sel   iud|
    0     0     0      9     0|
    0     0     0     11     0|
    0     0     0      1     0|
    0     0     0      2     0|
    0     0     0      1     0|
    0     0     0      1     0|
    0     0     0      1     0|
    0     0     0      1     0|

Session 5:杀掉FTWRL mysql process id

root@(none) 10:56:58>kill 72863;
Query OK, 0 rows affected (0.00 sec)

Session 4:表现

root@lx 10:55:57>flush tables with read lock;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Session 2:insert test_myddl未恢复正常,但对其他表的DML操作恢复

  ins   upd   del    sel   iud|
    0     1     0      1     1|
    0     0     0      1     0|
    0     1     0      1     1|
    0     0     0      1     0|
    0     1     0      1     1|
    0     0     0     19     0|
    0     1     0      1     1|

Session 3:强制退出长查询

root@lx 10:55:45>select count(*) from test_myddl a,test_myddl b;
^CCtrl-C -- sending KILL QUERY 72826" to server ...

Session 2:insert写入恢复

  ins   upd   del    sel   iud|
 2345     0     0     19  2345|
 2346     1     0      1  2347|
 2319     0     0      1  2319|
 2341     1     0      4  2342|
 2377     0     0      1  2377|
 2401     1     0      1  2402|
 2393     0     0      1  2393|
 2410     1     0      1  2411|
 2320     0     0      1  2320|
 2332     1     0      1  2333|

Slow.log记录信息:

# Time: 121227 22:57:07
# User@Host: root[root] @ localhost []
# Thread_id: 72863  Schema: lx  Last_errno: 1160  Killed: 1053
# Query_time: 64.780099  Lock_time: 0.000000  Rows_sent: 0  Rows_examined: 0  Rows_affected: 0  Rows_read: 0
# Bytes_sent: 0  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
SET timestamp=1356620227;
flush tables with read lock;
# Time: 121227 22:57:38
# User@Host: root[root] @ localhost []
# Thread_id: 72826  Schema: lx  Last_errno: 1160  Killed: 0
# Query_time: 106.972363  Lock_time: 0.000156  Rows_sent: 0  Rows_examined: 7126116  Rows_affected: 0  Rows_read: 0
# Bytes_sent: 0  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 3A3A27
SET timestamp=1356620258;
select count(*) from test_myddl a,test_myddl b;

小结:当FTWRL时,insert等待Waiting for global read lock,当kill掉FTWRL时insert(select也是)等待Waiting for table flush,其他表的写入没有影响。退出长查询,insert恢复正常。

新问题:test_myddl都没有closing table成功,为什么不能insert和select成功呢?

下次解释,写周报

《MySQL主库xtrabackup备份导致应用异常》遗留问题(一)

先从xtrabackup代码分析一下

# flush tables with read lock
        mysql_check();
        mysql_lockall() if !$option_no_lock;

因为mysql_check()去检查一下mysql是否还活着,如果活着,就调用mysql_lockall,这个案例中mysql_check()还活着,走到mysql_lockall

sub mysql_lockall {
    $now = current_time();
    print STDERR "$now  $prefix Starting to lock all tables...\n";
    mysql_send "USE mysql;";
    mysql_send "SET AUTOCOMMIT=0;";
    if (compare_versions($mysql_server_version, '4.0.22') == 0
        || compare_versions($mysql_server_version, '4.1.7') == 0) {
        # MySQL server version is 4.0.22 or 4.1.7
        mysql_send "COMMIT;";
        mysql_send "FLUSH TABLES WITH READ LOCK;";
    } else {
        # MySQL server version is other than 4.0.22 or 4.1.7
        mysql_send "FLUSH TABLES WITH READ LOCK;";
        mysql_send "COMMIT;";
    }

Mysql执行flush tables with read lock(简称FTWRL)

sub mysql_send {
    my $request = shift;
    $current_mysql_request = $request;
    print MYSQL_WRITER "$request\n";
    mysql_check();
    $current_mysql_request = '';
}

在xtrabackup mysql_open时,会执行:$mysql_pid = open(*MYSQL_WRITER, “| mysql $options >$mysql_stdout 2>$mysql_stderr “);来得到执行FTWRL的mysql线程ID,再执行mysql_check()

sub mysql_check {
    my $mysql_pid_copy = $mysql_pid;
 
    # send a dummy query to mysql child process
    $hello_id++;
    my $hello_message = "innobackup hello $hello_id";
    print MYSQL_WRITER "select '$hello_message';\n"
        or Die "Connection to mysql child process failed: $!";
 
    # wait for reply
    eval {
        local $SIG{ALRM} = sub { die "alarm clock restart" };
        my $stdout = '';
        my $stderr = '';
        alarm $mysql_response_timeout;
        while (index($stdout, $hello_message) < 0) {
            sleep 2;
            if ($mysql_pid && $mysql_pid == waitpid($mysql_pid, &WNOHANG)) {
                my $reason = `cat $mysql_stderr`;
                $mysql_pid = '';
                die "mysql child process has died: $reason";
            }
            $stdout = `cat $mysql_stdout`;
            $stderr = `cat $mysql_stderr | grep -v ^Warning`;
            if ($stderr) {
                # mysql has reported an error, do exit
                die "mysql error: $stderr";
            }
        }
        alarm 0;
    };
    if ($@ =~ /alarm clock restart/) {
        Die "Connection to mysql child process (pid=$mysql_pid_copy) timedout."
            . " (Time limit of $mysql_response_timeout seconds exceeded."
            . " You may adjust time limit by editing the value of parameter"
            . " \"\$mysql_response_timeout\" in this script.)";
} elsif ($@) { Die $@; }
$mysql_last_access_time = time();
}

因为eval块写出到mysql-stdout中,未找到“select $hello_message”,返回-1,进入while,但waitpid($mysql_pid, &WNOHANG)返回-1(FTWRL还在running),退出if。直到超时900S后,xtrabackup备份退出,退时调用override的Die函数

sub Die {
    my $message = shift;
    my $extra_info = '';
 
    # kill all child processes of this process
    kill_child_processes();
 
    if ($current_mysql_request) {
        $extra_info = " while waiting for reply to MySQL request:" .
            " '$current_mysql_request'";
    }
    die "$prefix Error: $message$extra_info";
}

会杀掉所有mysql的线程

sub kill_child_processes {
    if ($ibbackup_pid) {
        kill($kill_signal, $ibbackup_pid);
        $ibbackup_pid = '';
    }
 
    if ($mysql_pid) {
        kill($kill_signal, $mysql_pid);
        $mysql_pid = '';
    }
}

因为$mysql_pid就是执行FTWRL的mysql process id。

MySQL主库xtrabackup备份导致应用异常

昨天有位同学问我:有台主库tps一下子跌没了,应用写入全部报错,她提供几个时间点给我。tps一下子跌没,我条件反射:莫非有全局读锁。开始排查
第一步:
查看数据监控 mysql thread running 23:47分左右开始上涨,到第二天01:17恢复正常。其他指标变化不是非常明显

又得到非常有用信息:因备库问题,在主库16:00启动xtrabackup,这时感觉找到问题的根源
第二步:
查看slow,一条子查询跑了23659S、flush跑了 5216S,这条子查询应该在19:00左右跑起,备份应该在23:45左右跑flush tables with read lock

# Time: 121225  1:09:54
# User@Host: xxx[xxx] @  [xxxx.xxxx.xxxx.xxxx]
# Query_time: 23659.989155  Lock_time: 0.000104 Rows_sent: 0  Rows_examined: 47501046480
SET timestamp=1356368994;
select id, parent_biz_order_id, buyer_id, status, type, gmt_modify from ttt where parent_biz_order_id 
in (select parent_biz_order_id from ttt where type='哈哈');
# Time: 121225  1:13:25
# User@Host: root[root] @ localhost []
# Query_time: 5216.430881  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 0
use mysql;
SET timestamp=1356369205;
FLUSH TABLES WITH READ LOCK;

第三步:
查看备份信息:备份脚本实际跑完(备份未成功)时间是在第二天00:02:15

+--------------+---------------------+---------------------+
| db_name      | start_time          | end_time |
+--------------+---------------------+---------------------+
| xxxxxxxxx | 2012-12-24 17:40:56 | 2012-12-25 00:02:15 |

从备份日志看出,23:46:24 xtrabackup触发flush tables with read lock,等了900秒后,因为close tables失败,最终导致备份失败。触发flush tables时间点+900S=xtrabackup脚本跑完时间end_time
备份脚本报错:
121224 23:46:24 innobackupex-1.5.1: Starting to lock all tables…
innobackupex-1.5.1: Error: Connection to mysql child process (pid=18366) timedout. (Time limit of 900 seconds exceeded. You may adjust time limit by editing the value of parameter “$mysql_response_timeout” in this script.) while waiting for reply to MySQL request: ‘FLUSH TABLES WITH READ LOCK;’ at /usr/bin/innobackupex-1.5.1 line 336.

xtrabackup源码解释:

# flush tables with read lock
        mysql_check(); --没有问题
        mysql_lockall() if !$option_no_lock; 
--在执行flush tables with read lock时,mysql_send函数执行超时900S,备份失败
 
# timeout in seconds for a reply from mysql
my $mysql_response_timeout = 900;
 
sub mysql_send {
    my $request = shift;
 
    $current_mysql_request = $request;
    print MYSQL_WRITER "$request\n";
    mysql_check();
    $current_mysql_request = '';
}
 
sub mysql_lockall {
    $now = current_time();
    print STDERR "$now  $prefix Starting to lock all tables...\n";
 
    mysql_send "USE mysql;";
    mysql_send "SET AUTOCOMMIT=0;";
    if (compare_versions($mysql_server_version, '4.0.22') == 0
        || compare_versions($mysql_server_version, '4.1.7') == 0) {
        # MySQL server version is 4.0.22 or 4.1.7
        mysql_send "COMMIT;";
        mysql_send "FLUSH TABLES WITH READ LOCK;";
    } else {
        # MySQL server version is other than 4.0.22 or 4.1.7
        mysql_send "FLUSH TABLES WITH READ LOCK;";
        mysql_send "COMMIT;";
    }
    write_binlog_info;
    write_slave_info if $option_slave_info;
 
    $now = current_time();
    print STDERR "$now  $prefix All tables locked and flushed to disk\n";
}

把整个过程连贯起来描述:

1、16:00 xtracbackup启动,因为xtrabackup会先copy innodb data file
2、19:00 查询sql跑起
3、23:45 xtrabackup加flush tables with read lock,跑成功这条语句要做三件事:1、加全局锁 2、关闭open tables 3、打上阻塞commit的flag。因为加全局锁成功,但在关闭open tables时,因为长查询,一直hang住
4、23:45以后,所有的写入都会被阻塞,应用不停创建连接,thread running升高
5、第二天1:09:54 sql跑完
6、第二天1:17:25thread running恢复正常

小结:还有几个问题没有解释,flush tables with read lock失败,thread running为什么没有在flush失败时间点下降?slow.log为什么记得5216S?

python远程调用脚本(二)

Socket方式调用远程脚本这里只贴一下代码,代码只是简单实现这个功能,并没有考虑异常处理情况,因为这两个方案(rpyc与socket),需要再次讨论确定后,再完善。

Server端:

import os,sys
import socket
 
def nserver():
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.bind(("xxxx.xxxx.xxxx.xxxx", 11111))
        sock.listen(5)
        while True:
                  try:
                     conn, addr = sock.accept()
                     ret = conn.recv(1024)
                     print ret
                     result = os.system(ret)
                     conn.send(str(result))
                  except KeyboardInterrupt:
                     print "exit"
                     sys.exit(0)
        sock.close()
if __name__ == '__main__':
        nserver()

client端:

#!/usr/bin/env python
import os,sys
import socket
def send_command(command):
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.connect(("xxxx.xxxx.xxxx.xxxx", 11111))
        sock.send(command)
        result = sock.recv(1024)
        sock.close()
        return result
if __name__ == '__main__':
   cmd=sys.argv[1]
   ifsuccess = send_command(cmd)
   if (ifsuccess == "0"):
      print "command is successful!"

client端分别执行:

$python client.py "/home/luoxuan/iamshell.sh"
command is successful!
 
$python client.py "/home/luoxuan/iamperl.pl"
command is successful!
 
$python client.py "/home/luoxuan/iampython.py"
command is successful!

服务端打印结果:

/home/luoxuan/iamshell.sh
I am shell
/home/luoxuan/iamperl.pl
I am perl/home/luoxuan/iampython.py
I am python

注意:相同的代码,perl执行时,少了换行符