perl查询空表引起的bug

在ali集团对数据库的数据采集、监控、分析等由xxagent完成,所以对xxagent运行稳定非常重要。下面介绍xxagent一个bug,导致数据查询异常。

1、数据库的lijie表中没有记录:

root@xx 10:28:26>show create table lijie \G;
*************************** 1. row ***************************
       Table: lijie
Create Table: CREATE TABLE `lijie` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(32) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=gbk;
root@xx 10:28:30>select count(*) from lijie;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

2、执行以下代码:

my $sql3 = "select id,name from lijie";
my @sql3re = DB::query($sql3);
$Common::log->debug("luoxuansun3->@sql3re->") if (@sql3re);
$Common::log->debug("luoxuansun4") if (DB::query($sql3));
my $sql3n  = scalar @sql3re;
my $sql3nn = @sql3re;
my $sql3nnn = DB::query($sql3);
$Common::log->debug("luoxuansun5"."->".$sql3n."->".$sql3nn."->".$sql3nnn."aa");

3、log信息

$xxagent logf|grep luoxuansun
2013-07-28 10:39:50 [DEBUG] luoxuansun3->-> 
2013-07-28 10:39:50 [DEBUG] luoxuansun5->1->1->aa
2013-07-28 10:39:52 [DEBUG] luoxuansun3->->
2013-07-28 10:39:52 [DEBUG] luoxuansun5->1->1->aa

4、现象
通过执行DB模块的query函数返回给@sql3re后,里面既然有了记录(实际是没有记录),但直接执行”DB::query($sql3)”却是没有数据,前后矛盾

5、写入一条记录

root@xx 10:28:57>insert into lijie values(1,'a');
Query OK, 1 row affected (0.00 sec)
log信息
2013-07-28 10:44:39 [DEBUG] luoxuansun3->HASH(0x1bbb1f8)-> 
2013-07-28 10:44:39 [DEBUG] luoxuansun4
2013-07-28 10:44:39 [DEBUG] luoxuansun5->1->1->1aa
注意:结果"luoxuansun5->1->1->1aa"并没有发生变化,但"luoxuansun4"打印出来,说明的确有数据存在

6、再写一条记录

root@xx 10:44:35>insert into lijie values(1,'b');
Query OK, 1 row affected (0.00 sec)
log信息
2013-07-28 10:46:53 [DEBUG] luoxuansun3->HASH(0x1bd6628) HASH(0x1bd7df8)->  
2013-07-28 10:46:53 [DEBUG] luoxuansun4 at  
2013-07-28 10:46:53 [DEBUG] luoxuansun5->2->2->2aa 
注意:结果集中有两条记录了,变化符合预期

7、排查DB的query函数

sub query { 
  my $sql = join '', @_; 
  if ( !defined($dbh) ) {
    $dbh = _connect;
  }
  for ( my $count = 1 ; $count <= $connect_retries ; $count++ ) {
    if ( defined($dbh) ) {
      my @res = $dbh->query($sql);
      my $x = scalar @res;
      $Common::log->debug("luoxuansat>$sql->$x");
      if ( scalar(@res) ) {
        $Common::log->debug("luoxuansat2->$sql->$x");
        return @res;
      } else {
        $dbh = _connect;
        $Common::log->error("query retires $count $sql"); 
        next;   
      }
    } else {
      $dbh = _connect;
      $Common::log->error("query retires $count $sql"); 
      next;   
    }
  }
}
 
问题是出在空表时,先把记录删除,看一下log信息
root@xx 10:46:44>delete from lijie;
Query OK, 2 rows affected (0.00 sec)
log信息
$xxagent logf|grep luoxuansat|grep lijie
2013-07-28 10:53:31 [DEBUG] luoxuansat>select id,name from lijie->0 
2013-07-28 10:53:31 [DEBUG] luoxuansat>select id,name from lijie->0
注意:调用$dbh->query返回记录集为0,那说问题就出在DB的query函数上了
log信息
$xxagent logf|grep "query retires"|grep lijie
2013-07-28 10:56:18 [ERROR] query retires 1 select id,name from lijie 
2013-07-28 10:56:18 [ERROR] query retires 2 select id,name from lijie 
2013-07-28 10:56:18 [ERROR] query retires 3 select id,name from lijie 
注意:结果为0的情况下,却走到了:
else {
        $dbh = _connect;
        $Common::log->error("query retires $count $sql"); 
        next;   
      }
而没有结果集返回

8、bug fixed:
但没有结果返回时,应该明确返回空数组给调用方或对空结果做特殊处理

sub query { 
  my $sql = join '', @_; 
  my @nores;   --设置空数组
  if ( !defined($dbh) ) {
    $dbh = _connect;
  }
  for ( my $count = 1 ; $count <= $connect_retries ; $count++ ) {
    if ( defined($dbh) ) {
      my @res = $dbh->query($sql);
      my $x = scalar @res;
      $Common::log->debug("luoxuansat>$sql->$x");
      if ( scalar(@res) ) {
        $Common::log->debug("luoxuansat2->$sql->$x");
        return @res;
      } else {
        $dbh = _connect;
        $Common::log->error("query retires $count $sql"); 
        next;   
      }
    } else {
      $dbh = _connect;
      $Common::log->error("query retires $count $sql"); 
      next;   
    }
   return @nores;--无结果集时返回
  }

9、修改效果

$xxagent logf|grep luoxuansun
2013-07-28 11:06:45 [DEBUG] luoxuansun5->0->0->0aa 
2013-07-28 11:06:48 [DEBUG] luoxuansun5->0->0->0aa 
2013-07-28 11:07:30 [DEBUG] luoxuansun5->0->0->0aa 
下面代码不再执行打印
$Common::log->debug("luoxuansun3->@sql3re->") if (@sql3re);
写入一条记录
root@xx 11:11:05>insert into lijie values(1,'a');
Query OK, 1 row affected (0.00 sec)
log信息
$xxagent logf|grep luoxuansun
2013-07-28 11:12:14 [DEBUG] luoxuansun3->HASH(0x212f5f0)->  
2013-07-28 11:12:14 [DEBUG] luoxuansun4 at  
2013-07-28 11:12:14 [DEBUG] luoxuansun5->1->1->1aa 
#xxagent logf|grep luoxuansat|grep lijie
2013-07-28 11:12:14 [DEBUG] luoxuansat>select id,name from lijie->1  
2013-07-28 11:12:14 [DEBUG] luoxuansat2->select id,name from lijie->1  
再写入一条
root@xx 11:11:11>insert into lijie values(1,'b');
Query OK, 1 row affected (0.00 sec)
log信息
$xxagent logf|grep luoxuansun
2013-07-28 11:14:51 [DEBUG] luoxuansun3->HASH(0x2140e58) HASH(0x21408a0)->  
2013-07-28 11:14:51 [DEBUG] luoxuansun4 at  
2013-07-28 11:14:51 [DEBUG] luoxuansun5->2->2->2aa 
#xxagent logf|grep luoxuansat|grep lijie
2013-07-28 11:14:51 [DEBUG] luoxuansat>select id,name from lijie->2  
2013-07-28 11:14:51 [DEBUG] luoxuansat2->select id,name from lijie->2

10、小结
上面把很多代码都省略,其实原本的代码函数之间调用还要复杂,为了把问题尽量简单,没有例出。问题总是有它的原因,只要你坚持。

rm两点建议

看到下厨房(http://tech.xiachufang.com/?p=18)因为rm把mysql的数据文件删除,这里不介绍如何恢复数据文件,对于rm我们要有敬畏之心,那我们可以对rm做些什么,防止误删呢?

1、建议为rm创建别名,对rm加-i –preserve-root

$which rm
alias rm='rm -i --preserve-root'
/bin/rm
 
$ rm rm.txt
rm:是否删除普通空文件 "rm.txt"?y
$ rm -r /
rm: 在"/" 进行递归操作十分危险
rm: 使用 --no-preserve-root 选项跳过安全模式

2、safe-rm使用麻烦一点,为了-f破坏也是值得的
可到下载http://code.google.com/p/safe-rm/。目前该工具还在完善中,有人维护,关键是用perl写的,激动

使用方法源码中:

safe-rm is meant to replace the rm command so you can achieve this by
putting a symbolic link with the name "rm" in a directory which sits
at the front of your path. For example, given this path:
 
PATH=/usr/local/bin:/bin:/usr/bin
 
You could create the following symbolic link:
 
ln -s /usr/local/bin/safe-rm /usr/local/bin/rm
 
Protected paths can be set both at the site and user levels.
 
Both of these configuration files can contain a list of important files
or directories (one per line):
 
/etc/safe-rm.conf
~/.safe-rm

环境:safe-rm放到/usr/local/bin/,并创建safe-rm软链接为rm,真正的rm在/bin下
注意:safe-rm命令路径一定要在真正rm路径之前,才会使用safe-rm。

有人会说为什么不把真正的rm删除呢?

$ sudo mv /bin/rm /bin/rm.bak
$ sudo rm -rf /tmp
Can't exec "/bin/rm": 没有那个文件或目录 at /usr/local/bin/rm line 113.

编辑/etc/safe-rm.conf

/tmp
/tmp/aa

注意:文件夹要以”/”结尾,不然起不到安全删除。eg:”/tmp”不能配置成”/tmp/”

创建两个文件

$ touch /tmp/aa
$ touch /tmp/ab

尝试删除文件夹及文件

$ sudo rm -rf /tmp
safe-rm: skipping /tmp
$ sudo rm -rf /tmp/aa
safe-rm: skipping /tmp/aa
$ sudo rm -rf /tmp/bb
$ ls
aa bb
$ ls
aa

log4perl日志模块使用

1

recover table from full backup(二)

1、目标要为 XtraDB
2、 innodb_file_per_table=on
3、目标库innodb_expand_import(innodb_import_table_from_xtrabackup较新版本)开启

背景:

root@luoxuan >desc export_test;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| a     | int(11) | YES  |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
1 row in set (0.00 sec)
 
root@luoxuan >insert into export_test values(1);
Query OK, 1 row affected (0.00 sec)

instance 1:
一、原db备份

$ sudo ./innobackupex --defaults-file=/u01/mysql/my.cnf --user=root /u01/bak2

二、prepare

$ sudo ./xtrabackup --prepare --export --target-dir=/u01/bak2/2013-03-27_11-33-39

三、在instance 2上创建和原表一样的表结构,并discard掉

root@luoxuan >CREATE TABLE export_test (
    ->   a int(11) DEFAULT NULL
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.00 sec)
 
root@luoxuan > ALTER TABLE luoxuan.export_test DISCARD TABLESPACE;

四、把prepare后的文件夹中找到export_test.ibd、export_test.exp并cp instance 2的数据目标
五、在instance 2 上import

root@(none) >ALTER TABLE luoxuan.export_test IMPORT TABLESPACE;
Query OK, 0 rows affected (0.00 sec)
 
root@(none) >select * from luoxuan.export_test;
+------+
| a    |
+------+
|    1 |
+------+

六、得到备份时binlog位置

$ more xtrabackup_binlog_info 
mysql-bin.002520        6244

同时在instance 1写入一条记录insert into luoxuan.export_test values(2)

七、解析binlog,当然可以有stop-position。自己开发脚本对binlog中只抽取export_test相关的DML

sudo mysqlbinlog --no-defaults -v -v  /u01/mysql/log/mysql-bin.002604 --start-position=6244 > /tmp/recover.sql

八、在instance 2上使用解析后的binlog,可以恢复到

root@luoxuan 04:16:30>select * from luoxuan.export_test;
+------+
| a    |
+------+
|    2 |
|    1 |
+------+

九、instance 1上的数据

root@luoxuan 04:17:10>select * from luoxuan.export_test;
+------+
| a    |
+------+
|    1 |
|    2 |
+------+

recover table from full backup(一)

前提:
1、innodb_fast_shutdown = 0
2、You must not drop, truncate or alter the schema of the table after the backup has been taken.
3、The variable innodb_file_per_table must be enabled.

背景:

root@luoxuan 05:14:25>select * from lijie;
+------+
| b    |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)
 
root@luoxuan 05:14:29>select * from test_recover;
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
+------+

一、获得一致性备份:

session 1: sudo ./innobackupex --defaults-file=/u01/mysql/my.cnf --user=root /u01/bak2 --slave-info --export 2> /tmp/xtrabackup.log

session 2:备份过程进行数据写入

root@luoxuan 05:17:23>insert into lijie values(3);
Query OK, 1 row affected (0.00 sec)
 
root@luoxuan 05:17:27>insert into test_recover values(4);     
Query OK, 1 row affected (0.00 sec)

session 1:

sudo ./innobackupex --defaults-file=/u01/mysql/my.cnf --apply-log --export /u01/bak2/2013-03-26_17-16-55

session 2:获得一致性过程中写入数据

root@luoxuan 05:17:36>insert into lijie values(4);
Query OK, 1 row affected (0.61 sec)
 
root@luoxuan 05:20:43>insert into test_recover values(5);
Query OK, 1 row affected (0.00 sec)

二、删除需单独恢复表的数据

root@luoxuan 05:20:57>select * from lijie;
+------+
| b    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
+------+
4 rows in set (0.00 sec)
 
root@luoxuan 05:21:40>select * from test_recover;
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
+------+
5 rows in set (0.00 sec)
 
root@luoxuan 05:21:45>delete from test_recover where a = 1;
Query OK, 1 row affected (0.00 sec)

三、
discard需要恢复表

mysql> set FOREIGN_KEY_CHECKS=0;
mysql> ALTER TABLE test_recover DISCARD TABLESPACE;

四、把apply-log后的数据文件cp回/u01/mysql/data/库名/

$ sudo cp ./luoxuan/test_recover.ibd /u01/mysql/data/luoxuan

记得修改权限

五、import需要恢复的表

mysql> set FOREIGN_KEY_CHECKS=0;
mysql> ALTER TABLE test_recover IMPORT TABLESPACE;
mysql> set FOREIGN_KEY_CHECKS=1;
 
root@luoxuan 05:25:24>select * from test_recover;(正确)
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
+------+
4 rows in set (0.00 sec)
 
root@luoxuan 05:25:45>select * from lijie;  记录应该(1,2,3,4,5)
+------+
| b    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
+------+

六、拿到备份时的binlog file及position

$ more xtrabackup_binlog_info 
mysql-bin.002603        12634

七、把binlog解析后发现,postion位置也不对

http://www.mysqlperformanceblog.com/2012/01/25/how-to-recover-a-single-innodb-table-from-a-full-backup/
这篇文章只针对在恢复过程没有写入情况下,恢复出来数据正确,能单独恢复一个表