Linux 多线程编程：死锁场景、原因分析、解决方案

1. 前言
2. 本文目标
3. 死锁场景
- 3.1 案例1

1. 前言

限于作者能力水平，本文可能存在谬误，因此而给读者带来的损失，作者不做任何承诺。

2. 本文目标

列举各种pthread编程的死锁场景，并简要分析原因，之后给出（不一定最优的）解决方案。

3. 死锁场景

3.1 案例1

3.1.1 场景

多个线程共享锁，然后再某个线程发pthread_kill()/pthread_cancel()调用，造成的死锁。代码如下：

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

void *thread_func(void *arg)
{
	printf("thread started\n");
	
	for (;;) {
		pthread_mutex_lock(&mtx);
		printf("thread: lock\n");
		sleep(10);
		pthread_mutex_unlock(&mtx);
		printf("thread: unlock\n");
	}
}


int main(void)
{
	pthread_t pth;

	if (pthread_create(&pth, NULL, thread_func, NULL)) {
		perror("pthread_create");
		return -1;
	}

	sleep(3); /* 延时一段时间，保证 pth 比主线程先拿到锁 */

	printf("stop the thread... ");
	pthread_cancel(pth);
	pthread_join(pth, NULL);
	printf("done.\n");

	printf("main thread: try to get lock...\n");
	pthread_mutex_lock(&mtx);
	printf("main thread: do something with lock\n");
	pthread_mutex_unlock(&mtx);


	printf("main thread: exit\n");
	
	return 0;
}

# 运行结果
$ ./pthread_deadlock 
thread started
thread: lock
stop the thread... done.
main thread: try to get lock...

可以看到主线程将永远无法等到锁。

3.1.2 原因分析

如果程序没有设置可在任意点退出(可通过pthread_setcanceltype()设置)，pthread_kill()/pthread_cancel() 调用，给目标线程发停止信号，导致目标线程在cancellation point退出。什么是cancellation point? 简单来说，就是给线程发停止信号时，线程退出的位置。可通过 man 7 pthreads 查询，哪些库函数是是一个cancellation point。
https://man7.org/linux/man-pages/man7/pthreads.7.html
在文档中搜索关键字"Cancellation points"，从列表看到，很不幸，sleep()函数就是一个cancellation point，意味着线程可能在该处退出，此时会导致共享锁没有释放，主线程无法获得该锁，造成死锁。

3.1.3 解决方案

我们可以为所有的共享锁，维护一个每线程的锁嵌套层次计数：上锁时，嵌套计数加1，当计数由0变1时，调用pthread_setcancelstate()禁用线程的cancel；锁释放时，嵌套层次减1，当嵌套层次减到0时，我们调用pthread_testcancel()，该函数检测是否有挂起的phread_cancel()/pthread_kill()请求，有则会退出线程。
该方案有着很明显的缺点，当在锁内处理的事务耗时较长时，线程的退出将会延迟较长时间。

posted @ 2025-04-08 08:59 JiMoKuangXiangQu 阅读(55) 评论(0) 收藏举报

刷新页面返回顶部

JiMoKuangXiangQu