shared_ptr是线程安全的吗?及spinlock内核实现和使用的一些碎碎念
intro
在多线程中,数据的安全性和程序的性能之间总是有一个平衡关系:并发控制需要使用锁来同步,而锁又总是可能引入性能损失。为了尽量在数据安全的前提下提高所得性能,linux用户态引入futex锁,内核引入RCU锁,C++中的shared_ptr都在尽量减少锁的损耗。
shared_ptr是C++标准库中智能指针的重要组成部分,它引入标准库自动维护对象引用计数并释放的机制,但是对于这种结构的线程安全性没有找到特别明确的例子,所以这里尝试验证下该结构的线程安全性。
share_ptr
文档
- boost
shared_ptr objects offer the same level of thread safety as built-in types. A shared_ptr instance can be "read" (accessed using only const operations) simultaneously by multiple threads. Different shared_ptr instances can be "written to" (accessed using mutable operations such as operator= or reset) simultaneosly by multiple threads (even when these instances are copies, and share the same reference count underneath.)
其中的"the same level of thread safety as built-in types"。根据这个说法,因为int是内建类型,并且多个线程同时读/写(当然并发读不会有问题)一个int对象不是线程安全的,所以不同线程对同一个shared_ptr对象的并发读写也不是安全的。
- C++标准
All member functions (including copy constructor and copy assignment) can be called by multiple threads on different shared_ptr objects without additional synchronization even if these objects are copies and share ownership of the same object. If multiple threads of execution access the same shared_ptr object without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur; the std::atomic<shared_ptr> can be used to prevent the data race.
后半部分说明"If multiple threads of execution access the same shared_ptr object"这里看起来又包含了读取操作:也就是thread1读取对象a的时候,thread2不能修改这个对象a。
- stackoverflow
I have a double free bug using std::shared_ptr and trying to get know why. I am using shared_ptr in multithread environment , one thread sometimes replaces some element in a global array
std::shared_ptr
globalTable[100]; // global elements storage
using:globalTable[idx].reset(newBucket);
and the other thread reads this table sometimes using :std::shared_ptr
bkt(globalTable[pIdx]);
// do calculations with bkt-> items
After this I am receiving double-free error, and AddressSanitizer says that the second code tries to free an object that was destroyed by the first one . How it is possible ? As I know shared_ptr must be completly thread safe.
该问题的接收答案也提到了如果一个线程access the same shared_ptr,这里的access就不仅包含了写,也包括了读;同时访问的线程中如果有任何一个是a non-const member function of shared_ptr(非常量函数),就可能有并发问题。
- ChatGPT
作为补充,ChatGPT对于“is read and write same shared_ptr Object in multiple threads safe”的答复也是否定的:
No, reading and writing to the same
std::shared_ptrobject from multiple threads is not safe without synchronization. While multiple threads can safely read from ashared_ptr, if one thread writes (modifies) it, you need to use synchronization mechanisms (like mutexes) to avoid data races and undefined behavior.
由于AI答案有可能出现幻觉,所以该答案也只是一个例证。
实现
在shared_ptr的C++实现中,shared_ptr的析构函数会原子性递减引用计数(__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1),如果引用计数递减之间为1,说明递减成功之后为0,此时开始执行真正的对象析构。
假设在这个执行之后,从这个shared_ptr对象创建一个新的对象,此时新的对象就会直接递增这个引用计数。此时尽管新创建对象的引用计数非零,单事实上它引用的对象已经被析构,访问该对象的时候会访问到已经释放的内存(脏数据或者coredump),并且在析构的时候会再次将计数递减为0并再次执行析构。
///@file: bits/shared_ptr_base.h
void
_M_release() noexcept
{
// Be race-detector-friendly. For more info see bits/c++config.
_GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
_M_dispose();
// There must be a memory barrier between dispose() and destroy()
// to ensure that the effects of dispose() are observed in the
// thread that runs destroy().
// See http://gcc.gnu.org/ml/libstdc++/2005-11/msg00136.html
if (_Mutex_base<_Lp>::_S_need_barriers)
{
__atomic_thread_fence (__ATOMIC_ACQ_REL);
}
// Be race-detector-friendly. For more info see bits/c++config.
_GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
-1) == 1)
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
_M_destroy();
}
}
}
从一个shared_ptr创建一个新的shared_ptr对象时,在拷贝了控制结构之后直接递增了引用计数而并没有判断之前是否为负值。所以,如果传进来的__r对象的控制块指针非零但是其中的引用计数为零,就会再次将引用计数从0再次修改为正值。
///@file: bits/shared_ptr_base.h
__shared_count(const __shared_count& __r) noexcept
: _M_pi(__r._M_pi)
{
if (_M_pi != 0)
_M_pi->_M_add_ref_copy();
}
验证
可以看到,主线程创建的对象,在子线程中拷贝这个(在主线程中创建的)对象,同样可能会出现并发问题(访问到已析构对象)。
tsecer@harry: cat mp.cpp
#include <chrono>
#include <iostream>
#include <memory>
#include <mutex>
#include <thread>
using namespace std::chrono_literals;
struct Sleeper
{
Sleeper()
{
m_i = new int(1111);
}
~Sleeper()
{
m_i = nullptr;
// wait the copy construction of Sleeper object in sub thread
std::this_thread::sleep_for(100000ms);
}
int *m_i = nullptr;
};
void thr(std::shared_ptr<Sleeper> *p)
{
// wait for the destruction of Sleeper object in main thread
std::this_thread::sleep_for(1000ms);
std::shared_ptr<Sleeper> lp = *p;
while (true)
{
// !!! WILL CRASH HERE
printf("val is %d\n", *lp->m_i);
std::this_thread::sleep_for(1000ms);
}
}
int main()
{
std::thread *pstThread;
{
std::shared_ptr<Sleeper> p = std::make_shared<Sleeper>();
pstThread = new std::thread{thr, &p};
}
std::this_thread::sleep_for(100000ms);
pstThread->join();
}
tsecer@harry: g++ mp.cpp -g -lpthread
tsecer@harry: gdb -quiet ./a.out
Reading symbols from ./a.out...
(gdb) r
[New Thread 0x7ffff6cb5700 (LWP 960435)]
Thread 2 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6cb5700 (LWP 960435)]
thr (p=0x7fffffffd750) at mp.cpp:32
32 printf("val is %d\n", *lp->m_i);
(gdb) bt
#0 thr (p=0x7fffffffd750) at mp.cpp:32
#1 0x0000000000401f46 in std::__invoke_impl<void, void (*)(std::shared_ptr<Sleeper>*), std::shared_ptr<Sleeper>*> (__f=@0x418f20: 0x40124c <thr(std::shared_ptr<Sleeper>*)>)
at /usr/include/c++/8/bits/invoke.h:60
#2 0x0000000000401aa2 in std::__invoke<void (*)(std::shared_ptr<Sleeper>*), std::shared_ptr<Sleeper>*> (__fn=@0x418f20: 0x40124c <thr(std::shared_ptr<Sleeper>*)>)
at /usr/include/c++/8/bits/invoke.h:95
#3 0x0000000000402db1 in std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<Sleeper>*), std::shared_ptr<Sleeper>*> >::_M_invoke<0ul, 1ul> (this=0x418f18)
at /usr/include/c++/8/thread:244
#4 0x0000000000402d3e in std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<Sleeper>*), std::shared_ptr<Sleeper>*> >::operator() (this=0x418f18)
at /usr/include/c++/8/thread:253
#5 0x0000000000402ce2 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<Sleeper>*), std::shared_ptr<Sleeper>*> > >::_M_run (this=0x418f10)
at /usr/include/c++/8/thread:196
#6 0x00007ffff78dbb13 in ?? () from /lib64/libstdc++.so.6
#7 0x00007ffff7bb61ca in start_thread () from /lib64/libpthread.so.0
#8 0x00007ffff6ef3e73 in clone () from /lib64/libc.so.6
(gdb) p lp
$1 = std::shared_ptr<Sleeper> (use count 1, weak count 0) = {get() = 0x418ec0}
(gdb)
spin_lock
用户态自己实现的所谓看起来”高效“的锁可能会有这样那样的问题,所以内核才是并发的last resort。那么内核是如何保证锁的正确性的呢?
应用
在内核中,因为内核可以随时关掉中断,所以在操作一系列简单,但是非原子的指令序列的时候,就可以先先关闭中断,然后通过spinlock获得锁,之后执行动作序列,之后释放锁。
例如,在操作futex的等待队列时,为了避免多进程竞争,futex_q_lock就是通过spin_lock(&hb->lock);来获得自旋锁。
///@file: kernel\futex\core.c
/* The key must be already stored in q->key. */
struct futex_hash_bucket *futex_q_lock(struct futex_q *q)
__acquires(&hb->lock)
{
struct futex_hash_bucket *hb;
hb = futex_hash(&q->key);
/*
* Increment the counter before taking the lock so that
* a potential waker won't miss a to-be-slept task that is
* waiting for the spinlock. This is safe as all futex_q_lock()
* users end up calling futex_queue(). Similarly, for housekeeping,
* decrement the counter at futex_q_unlock() when some error has
* occurred and we don't end up adding the task to the list.
*/
futex_hb_waiters_inc(hb); /* implies smp_mb(); (A) */
q->lock_ptr = &hb->lock;
spin_lock(&hb->lock);
return hb;
}
用户态spinlock的问题
但是,内核态可以使用spinlock不代表用户态也可以是使用这个机制,正如linux在下面所说的:内核态有权限操作硬件,这里最关键的就是关闭中断,从而避免进程被调度出去。因为用户态没有权限/功能(反过来想,如果用户态有这个功能,用户进程就可以一直不释放CPU,独占整个CPU资源),所以一个进程可能刚拿到锁就因为进程的调度时间片到期,从而被切换出去。这样其它spinlock的等待着就只能等这个这个进程被再次唤醒并且释放锁之后才能继续。
Beware Spinlocks in User Space
So now you still hold the lock, but you got scheduled away from the CPU, because you had used up your time slice. The “current time” you read is basically now stale, and has nothing to do with the (future) time when you are actually going to release the lock.
Somebody else comes in and wants that “spinlock”, and that somebody will now spin for a long while, since nobody is releasing it - it’s still held by that other thread entirely that was just scheduled out. At some point, the scheduler says “ok, now you’ve used your time slice”, and schedules the original thread, and now the lock is actually released. Then another thread comes in, gets the lock again, and then it looks at the time and says “oh, a long time passed without the lock being held at all”.
内核态如何避免唤醒丢失
如果进程无法获得所需资源(例如互斥锁),此时进程会被系统挂起。这里也有一个边界条件:进程A获取锁失败,设置自己为不可调度状态并开始睡眠,然后进程B释放锁,此时A是否会进入非必要的休眠状态。或者说,A是如何被唤醒的呢?
等待/切出
以典型的futex为例, 在执行futex_wait_queue函数时,此时进程依然持有spinlock,直到futex_queue函数中释放锁。
这里的关键是:将进程设置为不可调度状态(TASK_INTERRUPTIBLE|TASK_FREEZABLE)和添加等待队列都是在spinlock这个互斥锁的保护下进程,并且在调度(执行schedule函数)前执行。
///@file: kernel\futex\waitwake.c
/**
* futex_wait_queue() - futex_queue() and wait for wakeup, timeout, or signal
* @hb: the futex hash bucket, must be locked by the caller
* @q: the futex_q to queue up on
* @timeout: the prepared hrtimer_sleeper, or null for no timeout
*/
void futex_wait_queue(struct futex_hash_bucket *hb, struct futex_q *q,
struct hrtimer_sleeper *timeout)
{
/*
* The task state is guaranteed to be set before another task can
* wake it. set_current_state() is implemented using smp_store_mb() and
* futex_queue() calls spin_unlock() upon completion, both serializing
* access to the hash list and forcing another memory barrier.
*/
set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
futex_queue(q, hb);
/* Arm the timer */
if (timeout)
hrtimer_sleeper_start_expires(timeout, HRTIMER_MODE_ABS);
/*
* If we have been removed from the hash list, then another task
* has tried to wake us, and we can skip the call to schedule().
*/
if (likely(!plist_node_empty(&q->list))) {
/*
* If the timer has already expired, current will already be
* flagged for rescheduling. Only call schedule if there
* is no timeout, or if it has yet to expire.
*/
if (!timeout || timeout->task)
schedule();
}
__set_current_state(TASK_RUNNING);
}
唤醒
在对应的唤醒操作中,唤醒操作也是在wait中相同的spinlock保护下进行的。对于前面说的情况:因为准备切出的进程A已经在队列中并且状态为TASK_INTERRUPTIBLE,所以futex_wake会尝试唤醒等待中的进程(通过this->wake(&wake_q, this)调用)。
**如果进程A还没有执行到schedule函数,就会被(this->wake(&wake_q, this);)调用(从TASK_INTERRUPTIBLE)状态修改为可调度状态,此时在执行schedule函数时,因为进程A是出于可运行状态,所以并不会被真正切出去。也就不会出现唤醒丢失。
///@file: kernel\futex\waitwake.c
/*
* Wake up waiters matching bitset queued on this futex (uaddr).
*/
int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
{
struct futex_hash_bucket *hb;
struct futex_q *this, *next;
union futex_key key = FUTEX_KEY_INIT;
DEFINE_WAKE_Q(wake_q);
int ret;
if (!bitset)
return -EINVAL;
ret = get_futex_key(uaddr, flags, &key, FUTEX_READ);
if (unlikely(ret != 0))
return ret;
if ((flags & FLAGS_STRICT) && !nr_wake)
return 0;
hb = futex_hash(&key);
/* Make sure we really have tasks to wakeup */
if (!futex_hb_waiters_pending(hb))
return ret;
spin_lock(&hb->lock);
plist_for_each_entry_safe(this, next, &hb->chain, list) {
if (futex_match (&this->key, &key)) {
if (this->pi_state || this->rt_waiter) {
ret = -EINVAL;
break;
}
/* Check if one of the bits is set in both bitsets */
if (!(this->bitset & bitset))
continue;
this->wake(&wake_q, this);
if (++ret >= nr_wake)
break;
}
}
spin_unlock(&hb->lock);
wake_up_q(&wake_q);
return ret;
}
outro
在多线程编程中,并发的控制本身就比较复杂,加上各种锁的使用,很容易出问题呢~。
浙公网安备 33010602011771号