Celery
Celery 分布式任务队列
Celery介绍和基本使用
在项目中如何使用celery
启用多个workers
Celery 定时任务
与Django结合
通过django配置celery periodic task
一. Celery介绍和基本使用
Celery 是一个基于python开发的分布式异步消息任务队列, 通过它可以轻松的实现任务的异步处理, 如果业务场景中需要用到异步任务, 就可以考虑使用celery. 例子:
需求场景
1. 对100台命令执行一条批量命令,命令执行需要很长时间,但是不想让主程序等着结果返回,而是给主程序返回一个任务ID,task_id
主程序过一段时间根据task_id,获取执行结果即可,再命令执行期间,主程序 可以继续做其他事情
2. 定时任务,比如每天检测一下所有的客户资料,发现是客户的生日,发个祝福短信
解决方案
1. 逻辑view 中启一个进程
父进程结束,子进程跟着结束,子进程任务没有完成,不符合需求
父进程结束,等着子进程结束,父进程需等着结果返回,不符合需求
小结:该方案解决不了阻塞问题,即需要等待
2. 启动 subprocess,任务托管给操作系统执行
实现task_id,实现异步,解决阻塞
小结:大批量高并发,主服务器会出现问题,解决不了并发
3. celery
celery提供多子节点,解决并发问题
celery介绍
celery是一个基于python开发的分布式异步消息队列,轻松实现任务的异步处理
celery在执行任务时需要一个消息中间件来接收和发送任务消息,以及存储任务结果,一般使用RabbitMQ 或 Redis
celery优点
简单:熟悉celery的工作流程后,配置使用简单
高可用:当任务执行失败或执行过程中发生连接中断,celery会自动尝试重新执行任务
快速:一个单进程的celery每分钟可处理上百万个任务
灵活:几乎celery的各个组件都可以被扩展及自定制
celery基本工作流程

其中中间队列用于分配任务以及存储执行结果

Celery安装
Celery的默认broker是RabbitMQ, 仅需配置一行就可以
broker_url = 'amqp://guest:guest@localhost:5672//'
使用Redis做broker也可以
broker_url = 'redis://localhost:6379/0' #port 6379, 使用database 0
远程连接redis则用:
broker_url = 'redis://:password@hostname:port/db_number'
如果想获取每个任务的执行结果, 还需要配置一下把任务结果存在哪
app.conf.result_backend = 'redis://localhost:6379/0'
Celery使用
安装celery模块
pip install celery
创建一个celery application 用来定义任务列表
创建一个任务文件叫tasks.py
from celery import Celery app = Celery('tasks', broker='redis://localhost', backend='redis://localhost') @app.task def add(x, y): print("running...", x, y) return x + y
要记得先启动redis-server或者rabbitmq-server.
启动Celery Worker来开始监听并执行任务
celery -A tasks worker --loglevel=info

调用任务
再打开一个终端, 进行命令行模式, 调用任务
$ python3 >>> from tasks import add >>> add.delay(4, 4)
worker终端会显示收到一个任务

若想看任务结果的话, 需要在调用任务时赋值个变量
>>> result = add.delay(4, 4) The ready() method returns whether the task has finished processing or not: >>> result.ready() True You can wait for the result to complete, but this is rarely used since it turns the asynchronous call into a synchronous one: >>> result.get(timeout=1) 8 In case the task raised an exception, get() will re-raise the exception, but you can override this by specifying the propagate argument: >>> result.get(propagate=False) If the task raised an exception you can also gain access to the original traceback: >>> result.traceback …

-
tasks.add(4,4) ---> 本地执行
-
tasks.add.delay(4,4) --> worker执行
-
t=tasks.add.delay(4,4) --> t.get() 获取结果,或卡住,阻塞
-
t.ready()---> False:未执行完,True:已执行完
-
t.get(propagate=False) 抛出简单异常,但程序不会停止
-
t.traceback 追踪完整异常
小结:
1. 导入第三方包,如 from celery import Celery
2. 实例化第三方类,如 app = Celery(......)
3. 实例化的对象去关联执行任务的方法,如 @app.task
4. 分区角色 worker 执行任务,broker分配任务
二. 在项目中如何使用celery
可以把celery配置成一个应用
目录格式如下
proj/__init__.py /celery.py /tasks.py
proj/celery.py内容
from __future__ import absolute_import, unicode_literals from celery import Celery app = Celery('proj', broker='redis://localhost', backend='redis://localhost', include=['proj.tasks']) # Optional configuration, see the application user guide. app.conf.update( result_expires=3600, ) if __name__ == '__main__': app.start()
proj/tasks.py中的内容
from __future__ import absolute_import, unicode_literals from .celery import app import subprocess import time @app.task def add(x, y): print('running...',x,y) return x + y @app.task def mul(x, y): return x * y @app.task def xsum(numbers): return sum(numbers) @app.task def cmd(cmd): print('running cmd....',cmd) time.sleep(5) cmd_obj = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE) return cmd_obj.stdout.read().decode('utf-8')
在前端启动worker
celery -A proj worker -l info

调试任务


在后台启动worker
使用celery multi指令在后台开启一个或多个workers:
celery multi start w1 -A proj -l info

重启worker
celery multi restart w1 -A proj -l info
停止worker
celery multi stop w1 -A proj -l info

停止指令是异步的所以它不是等待worker关闭. 所以可以使用stopwait指令确保所有正在运行的任务在完成前不会结束:
celery multi stopwait w1 -A proj -l info
三. Celery 定时任务
Celery支持定时任务, 设定好任务的执行时间, Celery就会定时自动帮你执行, 这个定时任务模块叫celery beat.
from celery import Celery from celery.schedules import crontab app = Celery('tasks', broker='redis://localhost', backend='redis://localhost') @app.on_after_configure.connect def setup_periodic_tasks(sender, **kwargs): # Calls test('hello') every 10 seconds. sender.add_periodic_task(10.0, test.s('hello'), name='add every 10') # Calls test('world') every 30 seconds sender.add_periodic_task(30.0, test.s('world'), expires=10) # Executes every Monday morning at 7:30 a.m. sender.add_periodic_task( crontab(hour=7, minute=30, day_of_week=1), test.s('Happy Mondays!'), ) @app.task def test(arg): print(arg)
add_periodic_task 会添加一条定时任务
上面是通过调用函数添加定时任务, 也可以像写配置文件一样的形式添加, 下面是每30s执行的任务
app.conf.beat_schedule = { 'add-every-30-seconds': { 'task': 'tasks.add', 'schedule': 30.0, 'args': (16, 16) }, } app.conf.timezone = 'UTC'
任务添加好了, 需要让celery单独启动一个进程来定时发起这些任务, 注意, 这里是发起任务, 不是执行, 这个进程只会不断的去检查任务计划, 每次发现有任务需要执行了, 就发起一个任务调用消息, 交给celery worker去执行
启动任务调度器 celery beat
celery -A periodic_task beat

需要启动一个worker, 负责执行celery beat发起的任务
启动celery worker来执行任务
celery -A periodic_task worker

此时worker的输出, 每隔一小会, 就会执行一次定时任务.
注意:Beat needs to store the last run times of the tasks in a local database file (named celerybeat-schedule by default), so it needs access to write in the current directory, or alternatively you can specify a custom location for this file:
|
1
|
$ celery -A periodic_task beat -s /home/celery/var/run/celerybeat-schedule |
更复杂的定时配置
用crontab功能, 跟linux自带的crontab功能一样, 可以个性化定制任务执行时间.
例如每周一早上7.30执行tasks.add任务:
from celery.schedules import crontab app.conf.beat_schedule = { # Executes every Monday morning at 7:30 a.m. 'add-every-monday-morning': { 'task': 'tasks.add', 'schedule': crontab(hour=7, minute=30, day_of_week=1), 'args': (16, 16), }, }
更多定时配置方式如下:
| Example | Meaning |
crontab() |
Execute every minute. |
crontab(minute=0, hour=0) |
Execute daily at midnight. |
crontab(minute=0, hour='*/3') |
Execute every three hours: midnight, 3am, 6am, 9am, noon, 3pm, 6pm, 9pm. |
|
Same as previous. |
crontab(minute='*/15') |
Execute every 15 minutes. |
crontab(day_of_week='sunday') |
Execute every minute (!) at Sundays. |
|
Same as previous. |
|
Execute every ten minutes, but only between 3-4 am, 5-6 pm, and 10-11 pm on Thursdays or Fridays. |
crontab(minute=0,hour='*/2,*/3') |
Execute every even hour, and every hour divisible by three. This means: at every hour except: 1am, 5am, 7am, 11am, 1pm, 5pm, 7pm, 11pm |
crontab(minute=0, hour='*/5') |
Execute hour divisible by 5. This means that it is triggered at 3pm, not 5pm (since 3pm equals the 24-hour clock value of “15”, which is divisible by 5). |
crontab(minute=0, hour='*/3,8-17') |
Execute every hour divisible by 3, and every hour during office hours (8am-5pm). |
crontab(0, 0,day_of_month='2') |
Execute on the second day of every month. |
|
Execute on every even numbered day. |
|
Execute on the first and third weeks of the month. |
|
Execute on the eleventh of May every year. |
|
Execute on the first month of every quarter. |
http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#solar-schedules
四. 与Django结合

Django可以跟celery结合实现异步任务, 只需简单配置即可.

django_celery/django_celery/celery.py
from __future__ import absolute_import, unicode_literals import os from celery import Celery # set the default Django settings module for the 'celery' program. os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'django_celery.settings') # !!!!! app = Celery('django_celery') #!!!!!! # Using a string here means the worker don't have to serialize # the configuration object to child processes. # - namespace='CELERY' means all celery-related configuration keys # should have a `CELERY_` prefix. app.config_from_object('django.conf:settings', namespace='CELERY') # Load task modules from all registered Django app configs. app.autodiscover_tasks() @app.task(bind=True) def debug_task(self): print('Request: {0!r}'.format(self.request))
Let’s break down what happens in the first module, first we import absolute imports from the future, so that our celery.py module won’t *** with the library: ##### from __future__ import absolute_import Then we set the default DJANGO_SETTINGS_MODULE environment variable for the celery command-line program: ##### os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings') You don’t need this line, but it saves you from always passing in the settings module to the celery program. It must always come before creating the app instances, as is what we do next: ###### app = Celery('proj') This is our instance of the library. We also add the Django settings module as a configuration source for Celery. This means that you don’t have to use multiple configuration files, and instead configure Celery directly from the Django settings; but you can also separate them if wanted. The uppercase name-space means that all Celery configuration options must be specified in uppercase instead of lowercase, and start with CELERY_, so for example the task_always_eager` setting becomes CELERY_TASK_ALWAYS_EAGER, and the broker_url setting becomes CELERY_BROKER_URL. You can pass the object directly here, but using a string is better since then the worker doesn’t have to serialize the object. ##### app.config_from_object('django.conf:settings', namespace='CELERY') Next, a common practice for reusable apps is to define all tasks in a separate tasks.pymodule, and Celery does have a way to auto-discover these modules: ##### app.autodiscover_tasks() With the line above Celery will automatically discover tasks from all of your installed apps, following the tasks.py convention: ###### - app1/ - tasks.py - models.py - app2/ - tasks.py - models.py Finally, the debug_task example is a task that dumps its own request information. This is using the new bind=True task option introduced in Celery 4.1 to easily refer to the current task instance.
需要将下面这段拷贝到django_celery/django_celery/__init__.py, 这确保app在django启动时, @shared_task装饰器会使用app.
from __future__ import absolute_import, unicode_literals # This will make sure the app is always imported when # Django starts so that shared_task will use this app. from .celery import app as celery_app __all__ = ['celery_app']
需要在settings.py里面配置
# For celery CELERY_BROKER_URL = 'redis://localhost' CELERY_RESULT_BACKEND = 'redis://localhost'
在具体的app里的tasks.py里写任务
# Create your tasks here from __future__ import absolute_import, unicode_literals from celery import shared_task @shared_task def add(x, y): return x + y @shared_task def mul(x, y): return x * y @shared_task def xsum(numbers): return sum(numbers)
在django views里调用celery task
from django.shortcuts import render,HttpResponse from celery.result import AsyncResult # Create your views here. from app01 import tasks def task_test(request): res = tasks.add.delay(228, 24) print("start running task") print(res.task_id) print("async task res", res.get()) result = AsyncResult(id=res.task_id) print(result.get()) return HttpResponse('res %s' % res.get())

启动worker
celery -A django_celery worker -l info

五. 在Django中使用计划任务功能
pip3 install django-celery-beat
INSTALLED_APPS = [
....
'django_celery_beat',
]
3. 数据库迁移
python manage.py migrate
4. 启动 celery beat
celery -A django_celery beat -l info -S django
定时任务存到数据库里,启动beat定时取任务放到队列里执行
5. admin管理


启动celery beat和worker,会发现每隔2秒,beat会发起一个任务消息让worker执行tasks任务
注意,经测试,每添加或修改一个任务,celery beat都需要重启一次,要不然新的配置不会被celery beat进程读到
http://python.jobbole.com/87238/
http://docs.celeryproject.org/en/latest/index.html

浙公网安备 33010602011771号