1.0 Block I/O
Because flask is a block I/O, even torch.cuda.empty_cache() still cannot help for it.
Eventually, I find that the solution is: creating the new thread for SD pipeline .
2.0 Create new thread
import threading
import torch
import json
import gc
from PIL import Image
from diffusers import StableDiffusionPipeline
from flask import Flask, request, render_template
app = Flask(__name__, static_url_path='', static_folder='', template_folder='')
@app.route("/", methods=["POST"])
def index():
# 1.0 thread
class MyThread(threading.Thread):
def __init__(self,):
threading.Thread.__init__(self)
# 1.1 pipeline
def run(self):
# 1.2 clean cuda
if torch.cuda.is_available():
gc.collect()
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
model_id, prompt = "YOUR_MODEL_ID", "YOUR_PROMPT"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to('cuda')
image = pipe(prompt).images[0]
# 2.0 create 5 thread
threads = []
for i in range(5):
threads.append(MyThread(i))
threads[i].start()
return app.response_class(response=json.dumps({'status': 'success'}), status=200, mimetype='application/json')
if __name__ == '__main__':
app.debug = True
app.run(host='0.0.0.0', port=82)
PS: 這是我從項目簡化出來的代碼,未經測試。
First, 1.0 create a pipeline thread instance
Second, 1.2 clean the cuda space before running pipeline
Finally, 2.0 start a pipeline thread
If pipline is in a new thread, the cuda space can be released by torch.cuda.empty_cache().
3.0 Project Code
https://github.com/kenny-chen/ai.diffusers
浙公网安备 33010602011771号