向bgwriter 发送 SIGQUIT 的实验

bgwriter.c 的代码中有如下部分：

    pqsignal(SIGQUIT, bg_quickdie);        /* hard crash time */

[作者：技术者高健@博客园 mail: luckyjackgao@gmail.com ]

还有：

/*
 * bg_quickdie() occurs when signalled SIGQUIT by the postmaster.
 *
 * Some backend has bought the farm,
 * so we need to stop what we're doing and exit.
 */
static void
bg_quickdie(SIGNAL_ARGS)
{
    PG_SETMASK(&BlockSig);

    /*
     * We DO NOT want to run proc_exit() callbacks -- we're here because
     * shared memory may be corrupted, so we don't want to try to clean up our
     * transaction.  Just nail the windows shut and get out of town.  Now that
     * there's an atexit callback to prevent third-party code from breaking
     * things by calling exit() directly, we have to reset the callbacks
     * explicitly to make this work as intended.
     */
    on_exit_reset();

    /*
     * Note we do exit(2) not exit(0).    This is to force the postmaster into a
     * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
     * backend.  This is necessary precisely because we don't clean up our
     * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
     * should ensure the postmaster sees this as a crash, too, but no harm in
     * being doubly sure.)
     */
    exit(2);
}

我是这样实验的：

首先个给 bgwriter.c 的 pg_quickdie，加入一小段代码，变成：

/*
 * bg_quickdie() occurs when signalled SIGQUIT by the postmaster.
 *
 * Some backend has bought the farm,
 * so we need to stop what we're doing and exit.
 */
static void
bg_quickdie(SIGNAL_ARGS)
{
    fprintf(stderr,"bg_quickdie happend.\n");
    PG_SETMASK(&BlockSig);

    /*
     * We DO NOT want to run proc_exit() callbacks -- we're here because
     * shared memory may be corrupted, so we don't want to try to clean up our
     * transaction.  Just nail the windows shut and get out of town.  Now that
     * there's an atexit callback to prevent third-party code from breaking
     * things by calling exit() directly, we have to reset the callbacks
     * explicitly to make this work as intended.
     */
    on_exit_reset();

    /*
     * Note we do exit(2) not exit(0).    This is to force the postmaster into a
     * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
     * backend.  This is necessary precisely because we don't clean up our
     * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
     * should ensure the postmaster sees this as a crash, too, but no harm in
     * being doubly sure.)
     */
    exit(2);
}

然后，我启动 postgreSQL ,并查看进程状态：

[postgres@localhost bin]$ ./postgres -D /usr/local/pgsql/data
LOG:  database system was shut down at 2012-10-31 10:25:11 CST
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

[root@localhost postgresql-9.2.0]# ps -ef|grep post
root      2928  2897  0 10:34 pts/1    00:00:00 su - postgres
postgres  2929  2928  0 10:34 pts/1    00:00:00 -bash
postgres  2967  2929  0 10:34 pts/1    00:00:00 ./postgres -D /usr/local/pgsql/data
postgres  2969  2967  0 10:34 ?        00:00:00 postgres: checkpointer process     
postgres  2970  2967  0 10:34 ?        00:00:00 postgres: writer process           
postgres  2971  2967  0 10:34 ?        00:00:00 postgres: wal writer process       
postgres  2972  2967  0 10:34 ?        00:00:00 postgres: autovacuum launcher process   
postgres  2973  2967  0 10:34 ?        00:00:00 postgres: stats collector process   
root      3000  2977  0 10:35 pts/2    00:00:00 grep post
[root@localhost postgresql-9.2.0]#

然后，向 bgwriter 发送 SIGQUIT 信号：

[root@localhost postgresql-9.2.0]# kill -s SIGQUIT 2970

这个时候，我们会从pts/1 中看到什么？

bg_quickdie happend.
LOG:  background writer process (PID 2970) exited with exit code 2
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2012-10-31 10:34:47 CST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 0/192D458
LOG:  redo is not required
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

也就是说，bgwriter 捕获了SIGQUIT 的信号，而Postmaster/postgres 重新启动了各个子进程！

再看 ps 来验证一下：

[root@localhost postgresql-9.2.0]# ps -ef|grep post
root      2928  2897  0 10:34 pts/1    00:00:00 su - postgres
postgres  2929  2928  0 10:34 pts/1    00:00:00 -bash
postgres  2967  2929  0 10:34 pts/1    00:00:00 ./postgres -D /usr/local/pgsql/data
postgres  3002  2967  0 10:35 ?        00:00:00 postgres: checkpointer process     
postgres  3003  2967  0 10:35 ?        00:00:00 postgres: writer process           
postgres  3004  2967  0 10:35 ?        00:00:00 postgres: wal writer process       
postgres  3005  2967  0 10:35 ?        00:00:00 postgres: autovacuum launcher process   
postgres  3006  2967  0 10:35 ?        00:00:00 postgres: stats collector process   
root      3010  2977  0 10:36 pts/2    00:00:00 grep post
[root@localhost postgresql-9.2.0]#

[作者：技术者高健@博客园 mail: luckyjackgao@gmail.com ]

结束

posted @ 2012-10-31 11:07 健哥的数据花园阅读(531) 评论(0) 收藏举报

刷新页面返回顶部

健哥的数据花园

向bgwriter 发送 SIGQUIT 的实验

公告