浅谈Python与Golang并发情况下数据隔离问题

🍎 Golang的Goroutine中数据的隔离性

在 Golang 中，每个 Goroutine 有自己独立的栈，因此Goroutine 内的局部变量是隔离的，但堆上的数据是共享的。具体分析如下：

✅ Goroutine局部变量是隔离的

隔离性：局部变量存储在 Goroutine 栈 上，每个 Goroutine 有自己的栈空间，因此不会互相影响。

示例1：

package main

import (
	"fmt"
	"time"
)

func worker(id int) {
	counter := 0 // 每个 Goroutine 独立的局部变量
	counter++
	fmt.Printf("Goroutine-%d: %d\n", id, counter)
}

func main() {
	for i := 0; i < 5; i++ {
    curr_i := i
		go worker(curr_i) // 启动多个 Goroutine
	}

	time.Sleep(2 * time.Second)
}

输出示例（各 Goroutine 数据互不干扰）：

Goroutine-1: 1
Goroutine-0: 1
Goroutine-3: 1
Goroutine-4: 1
Goroutine-2: 1

示例2：使用 Goroutine 闭包隔离变量

package main

import (
	"fmt"
	"time"
)

func main() {
	for i := 0; i < 5; i++ {
		go func(id int) {
			counter := id * 10
			fmt.Printf("Goroutine-%d: counter=%d\n", id, counter)
		}(i) // 传递参数，避免变量逃逸
	}

	time.Sleep(time.Second)
}

Goroutine-0: counter=0
Goroutine-1: counter=1
Goroutine-2: counter=2
Goroutine-3: counter=3
Goroutine-4: counter=4

✅ 全局变量(堆上的数据)协程间共享

如果多个 Goroutine 引用了共享数据（如全局变量、指针、切片、map），那么这些数据是共享的，因此并发修改需加锁或使用其他同步机制，避免数据竞争。
❗️平时写代码的时候，注意 “内存逃逸” 问题：返回引用类型、结构体指针，都会造成内存逃逸，这里就不展开说了。

示例（存在数据竞争问题）：

package main

import (
	"fmt"
	"time"
)

var counter = 0

func worker() {
	for i := 0; i < 1000; i++ {
		counter++ // 多个 Goroutine 共享同一个 counter
	}
}

func main() {
	for i := 0; i < 50; i++ {
		go worker()
	}

	time.Sleep(2 * time.Second)
	fmt.Println("Final Counter:", counter)
}

可能输出不准确（数据竞争）：

Final Counter: 26711

🍎 Python中多线程数据隔离性探讨

大家都是到，Python中多个进程中的数据是隔离的，但是同一个进程下多个线程的数据是共享的，这里做一下相关问题的探讨。

✅ 线程局部变量是隔离的

隔离性：每个线程有自己的栈空间，因此线程内部的局部变量是隔离的。

import threading

def worker(id):
    counter = 0  # 线程局部变量
    counter += 1
    print(f"Thread-{id}: {counter}")

for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    t.start()

输出（数据是隔离的）：

Thread-0: 1
Thread-1: 1
Thread-2: 1
Thread-3: 1
Thread-4: 1

✅ 全局变量（或者类变量）这样的共享数据不是隔离的

Python 线程之间可以直接共享全局变量、类变量，因此需要使用 Lock 来避免数据竞争。

示例（存在数据竞争问题）：

import threading

counter = 0

def worker():
    for _ in range(1000):
        global counter
        counter += 1  # 共享数据，存在竞争

threads = [threading.Thread(target=worker) for _ in range(50)]
[t.start() for t in threads]
[t.join() for t in threads]

print("Final Counter:", counter)

输出（未加锁会出现不准确情况）：

Final Counter: 4783

🍎 实际中的一个问题

❗️虽然Golang与Python中，不同协程或者线程中的“线程局部变量”可以实现数据隔离，但是因为局部变量的作用域仅仅是单个函数内，如果我们有一个“操作流程”，这个操作流程需要先后调用多个函数，而我们希望每个线程内调用多个函数时，不同线程间有 “同名” 的数据，但是这些数据仅在单个线程中的不同函数间共享，不同线程中即使是相同的函数数据也是不一样的！

❗️❗️聪明的你一定想到了实际的用途：就是在web服务端中，我们往往会开并发去处理HTTP或者RPC请求，相同的API，每个线程调用的函数链是一样的，但是不同线程之间的数据是隔离的，比如traceId，因为每个线程的traceId是独立的，所以我们可以通过traceId这样的维一表示去排查单个请求的问题！

✅ Golang中的实现: context

context是Golang中非常重要的并发原语，用于管理goroutine的生命周期、取消信号、超时控制以及传递请求范围的值。

context详细说明这里就不赘述了，这里给大家演示一下，如果一个流程中需要调用多个函数，Golang如何使用ctx实现Goroutine之间数据隔离的（说白了其实就是：首先为每个协程初始化一个独立的父context，然后将这个父context一直往下传递即可：）

package main

import (
	"context"
	"fmt"
	"time"
)

func son_worker(ctx context.Context, i int) {
	id := ctx.Value("id").(int)
	fmt.Printf("son %d 中的id: %d \n", i, id)
}

func worker(ctx context.Context, i int) {
	id := ctx.Value("id").(int)
	fmt.Printf("woker %d 中的id: %d \n", i, id)
	// Notice 调用子协程 传递 ctx
	son_worker(ctx, i)

}

func main() {
	for i := 0; i < 5; i++ {
		// ❗️为每个协程都初始化一个独立的父协程！
		ctx := context.WithValue(context.Background(), "id", i)
		go worker(ctx, i)
	}

	time.Sleep(time.Second * 5)
}

// ❗️❗️❗️最终的效果可以看出：协程之间数据是隔离的！而且这样比 “单个函数的局部变量” 效果好很多！
/*
son 0 中的id: 0 
woker 2 中的id: 2 
son 2 中的id: 2 
woker 3 中的id: 3 
woker 4 中的id: 4 
son 4 中的id: 4 
woker 1 中的id: 1 
son 3 中的id: 3 
son 1 中的id: 1 
*/

✅ Python中的实现: threading.local

从下面的代码可以看出来，使用threading.local 不同线程的调用链一样，每个线程中的trace_id一样，不同线程之间数据隔离：

import time
import random
import threading

local_data = threading.local()


def snowflake_key():
    """
    生成唯一的 trace_id
    """
    sid = int(round(time.time() * 1000))
    sid = str(sid) + str(random.randint(1000, 9999))
    return "{}".format(sid)


def create_trace_id(trace_id=None):
    """
    创建当前线程线程的全局 trace_id
    """
    local_data.trace_id = trace_id or snowflake_key()
    return local_data.trace_id


def get_trace_id():
    """
    获取当前线程线程的全局 trace_id
    """
    try:
        if not local_data.trace_id:
            create_trace_id(None)
    except:
        create_trace_id(None)
    return local_data.trace_id


def son():
    print(f"当前son线程: {threading.current_thread().name}, trace_id: {local_data.trace_id}")


def worker():
    local_data.trace_id = get_trace_id()
    print(f"当前worker线程: {threading.current_thread().name}, trace_id: {local_data.trace_id}")
    # Notice 调用son方法: 同一个线程 worker 与 son 方法用同一个local_cata 实现线程间数据隔离
    son()


threads = [threading.Thread(target=worker, args=()) for i in range(5)]

[t.start() for t in threads]
[t.join() for t in threads]
"""
当前worker线程: Thread-1 (worker), trace_id: 17436804212019497
当前son线程: Thread-1 (worker), trace_id: 17436804212019497
当前worker线程: Thread-2 (worker), trace_id: 17436804212019903
当前worker线程: Thread-3 (worker), trace_id: 17436804212025156
当前son线程: Thread-2 (worker), trace_id: 17436804212019903
当前worker线程: Thread-4 (worker), trace_id: 17436804212023512
当前worker线程: Thread-5 (worker), trace_id: 17436804212024771
当前son线程: Thread-5 (worker), trace_id: 17436804212024771
当前son线程: Thread-3 (worker), trace_id: 17436804212025156
当前son线程: Thread-4 (worker), trace_id: 17436804212023512
"""

✅ flask项目中的实际使用

import time
import random
import threading

from flask import Flask, request

# 创建一个 Flask 应用
app = Flask(__name__)

# 创建一个线程本地存储对象
request_local = threading.local()

"""
Notice 在 threading.local 中传递 trace_id（golang中可以使用context） 
"""

local = threading.local()

"""
Notice 在 threading.local 中传递 trace_id（golang中可以使用context） 
"""


# 定义一个简单的请求上下文管理器
@app.before_request
def before_request():
    # 在每个请求开始时，将请求的唯一信息存储到线程本地存储中
    request_local.request_id = request.headers.get('X-Request-ID', 'default-request-id')
    request_local.user_agent = request.headers.get('User-Agent', 'unknown')


@app.after_request
def after_request(response):
    # 在请求结束后清除线程本地存储的数据
    del request_local.request_id
    del request_local.user_agent
    return response


# 定义一个路由，访问线程本地存储中的数据
@app.route('/index')
def index():
    # 从线程本地存储中获取数据
    request_id = getattr(request_local, 'request_id', 'no-request-id')
    user_agent = getattr(request_local, 'user_agent', 'unknown')

    return f"""
    <h1>Request Info</h1>
    <p>Request ID: {request_id}</p>
    <p>User Agent: {user_agent}</p>
    """


# 启动 Flask 应用
if __name__ == '__main__':
    app.run(debug=True)
    # curl命令：
    """
    curl --location --request GET 'http://127.0.0.1:5000/index' \
        --header 'X-Request-ID: 666' \
        --header 'User-Agent: Chrome'
    """

🍎 其他方案

❗️❗️注意下：sync.Pool 与 channel 的方案有点牵强，实际上像channel这样的Golang的并发原语是实现协程间通信、数据共享的数据结构，用来做数据隔离十分不合适！实际中可能会出现内存泄漏、内存溢出等问题！

❗️❗️Python中asyncio中的contextvar必须是在Python的协程(异步编程)中使用，多线程场景下要使用threading.local。

✅ sync.Pool

❗️❗️其实这个例子稍微有点牵强，因为 sync.Pool 设计的初衷是为了减少GC压力，为了让多个Goroutine可以从同一个地方去获取“全局”的资源，比如连接资源等，至于使用sync.Pool底层使用了互斥锁sync.Mutex，另外还可能存在内存泄漏的风险～

原理：sync.Pool 是 Goroutine 安全的对象池，每个 Goroutine 获取独立的对象副本，适用于高频临时对象的复用，避免频繁创建和回收。

➤ 示例 1：为每个 Goroutine 提供独立计数器

package main

import (
	"fmt"
	"sync"
	"time"
)

var counterPool = &sync.Pool{
	New: func() interface{} {
		counter := 0
		return &counter
	},
}

func worker(id int) {
	counter := counterPool.Get().(*int)
	*counter += id
	fmt.Printf("Goroutine-%d: counter=%d\n", id, *counter)
	counterPool.Put(counter) // 归还对象
}

func main() {
	for i := 0; i < 5; i++ {
		go worker(i)
	}

	time.Sleep(time.Second)
}

✅ 输出（每个 Goroutine 获取的对象是独立的）：

Goroutine-0: counter=0
Goroutine-1: counter=1
Goroutine-2: counter=2
Goroutine-3: counter=3
Goroutine-4: counter=4

➤ 示例 2：每个 Goroutine 获取独立缓冲区

package main

import (
	"fmt"
	"sync"
	"time"
)

var bufferPool = &sync.Pool{
	New: func() interface{} {
		return make([]byte, 10) // 每个 Goroutine 拥有独立缓冲区
	},
}

func worker(id int) {
	buf := bufferPool.Get().([]byte)
	defer bufferPool.Put(buf) // 归还对象

	buf[0] = byte(id) // 独立使用
	fmt.Printf("Goroutine-%d: buf=%v\n", id, buf[:1])
}

func main() {
	for i := 0; i < 5; i++ {
		go worker(i)
	}

	time.Sleep(time.Second)
}

✅ channel

➤ 示例：每个 Goroutine 单独的 channel

package main

import (
	"fmt"
)

func worker(id int, ch chan int) {
	for data := range ch {
		fmt.Printf("Goroutine-%d: %d\n", id, data)
	}
}

func main() {
	for i := 0; i < 3; i++ {
    // 每个协程创建自己单独的channel
		ch := make(chan int)
		go worker(i, ch)
		ch <- i * 10
		close(ch)
	}
}

✅ 输出（每个 Goroutine 拥有独立 channel）：

Goroutine-0: 0
Goroutine-1: 10
Goroutine-2: 20

✅ Python中如果使用asyncio的话还有contextvar

在Python中， contextvar 是用于管理上下文变量的模块，类似于Go语言中的 context 。它允许你在异步任务或并发环境中传递和存储上下文相关的数据。以下是 contextvar 的核心概念和用法：

1. ContextVar的核心功能

上下文隔离：每个任务或协程可以拥有独立的上下文变量值。
线程安全： contextvar 是线程安全的，适用于多线程和异步编程。
动态作用域：上下文变量的值可以在调用链中动态传递。

2. ContextVar的创建和使用

contextvar.ContextVar(name) ：创建一个上下文变量。
var.set(value) ：设置上下文变量的值。
var.get(default) ：获取上下文变量的值，如果未设置则返回默认值。
contextvars.copy_context() ：复制当前上下文。

3. ContextVar的使用示例

以下是一个简单的示例，展示如何在异步任务中使用 contextvar ：

import contextvars
import asyncio

# 创建一个上下文变量
request_id = contextvars.ContextVar('request_id')

async def process_request():
    # 设置上下文变量的值
    request_id.set("12345")
    print(f"处理请求: {request_id.get()}")
    await asyncio.sleep(1)
    print(f"请求完成: {request_id.get()}")

async def main():
    # 启动多个异步任务
    tasks = [asyncio.create_task(process_request()) for _ in range(3)]
    await asyncio.gather(*tasks)

asyncio.run(main())

4. ContextVar的注意事项

作用域：上下文变量的值只在当前上下文有效，不会影响其他上下文。
性能： contextvar 的性能开销较低，适合高频使用的场景。
兼容性： contextvar 是Python 3.7引入的，确保你的Python版本支持。
contextvar 是Python中管理上下文数据的重要工具，特别适用于异步编程和并发任务。

posted on 2025-03-28 15:32 江湖乄夜雨阅读(58) 评论(0) 收藏举报