Celery

Celery 分布式任务队列

Celery介绍和基本使用

在项目中如何使用celery

启用多个workers

Celery 定时任务

与Django结合

通过django配置celery periodic task


一. Celery介绍和基本使用 

Celery 是一个基于python开发的分布式异步消息任务队列, 通过它可以轻松的实现任务的异步处理, 如果业务场景中需要用到异步任务, 就可以考虑使用celery. 例子:

需求场景

1.  对100台命令执行一条批量命令,命令执行需要很长时间,但是不想让主程序等着结果返回,而是给主程序返回一个任务ID,task_id

主程序过一段时间根据task_id,获取执行结果即可,再命令执行期间,主程序 可以继续做其他事情

2.  定时任务,比如每天检测一下所有的客户资料,发现是客户的生日,发个祝福短信

解决方案

1.  逻辑view 中启一个进程

父进程结束,子进程跟着结束,子进程任务没有完成,不符合需求

父进程结束,等着子进程结束,父进程需等着结果返回,不符合需求

小结:该方案解决不了阻塞问题,即需要等待 

2. 启动 subprocess,任务托管给操作系统执行

实现task_id,实现异步,解决阻塞

小结:大批量高并发,主服务器会出现问题,解决不了并发

3. celery

celery提供多子节点,解决并发问题

 

celery介绍

celery是一个基于python开发的分布式异步消息队列,轻松实现任务的异步处理

celery在执行任务时需要一个消息中间件来接收和发送任务消息,以及存储任务结果,一般使用RabbitMQ 或 Redis

 

celery优点

简单:熟悉celery的工作流程后,配置使用简单

高可用:当任务执行失败或执行过程中发生连接中断,celery会自动尝试重新执行任务

快速:一个单进程的celery每分钟可处理上百万个任务

灵活:几乎celery的各个组件都可以被扩展及自定制

 

celery基本工作流程

其中中间队列用于分配任务以及存储执行结果

Celery安装

Celery的默认broker是RabbitMQ, 仅需配置一行就可以

broker_url = 'amqp://guest:guest@localhost:5672//'

使用Redis做broker也可以

broker_url = 'redis://localhost:6379/0' #port 6379, 使用database 0

远程连接redis则用:

broker_url = 'redis://:password@hostname:port/db_number'

如果想获取每个任务的执行结果, 还需要配置一下把任务结果存在哪

app.conf.result_backend = 'redis://localhost:6379/0'

 


Celery使用

安装celery模块

pip install celery

创建一个celery application 用来定义任务列表

创建一个任务文件叫tasks.py

from celery import Celery

app = Celery('tasks',
             broker='redis://localhost',
             backend='redis://localhost')

@app.task
def add(x, y):
    print("running...", x, y)
    return x + y

要记得先启动redis-server或者rabbitmq-server.

启动Celery Worker来开始监听并执行任务

celery -A tasks worker --loglevel=info

调用任务

再打开一个终端, 进行命令行模式, 调用任务 

$ python3
>>> from tasks import add
>>> add.delay(4, 4)

worker终端会显示收到一个任务

若想看任务结果的话, 需要在调用任务时赋值个变量

>>> result = add.delay(4, 4)

The ready() method returns whether the task has finished processing or not:
>>> result.ready()
True


You can wait for the result to complete, but this is rarely used since it turns the asynchronous call into a synchronous one:
>>> result.get(timeout=1)
8


In case the task raised an exception, get() will re-raise the exception, but you can override this by specifying the propagate argument:
>>> result.get(propagate=False)


If the task raised an exception you can also gain access to the original traceback:
>>> result.traceback
…

  • tasks.add(4,4) ---> 本地执行

  • tasks.add.delay(4,4) --> worker执行

  • t=tasks.add.delay(4,4)  --> t.get()  获取结果,或卡住,阻塞

  • t.ready()---> False:未执行完,True:已执行完

  • t.get(propagate=False) 抛出简单异常,但程序不会停止

  • t.traceback 追踪完整异常

小结:

1. 导入第三方包,如 from celery import Celery

2. 实例化第三方类,如 app = Celery(......)

3. 实例化的对象去关联执行任务的方法,如 @app.task

4. 分区角色  worker 执行任务,broker分配任务

 


二. 在项目中如何使用celery 

可以把celery配置成一个应用

目录格式如下

proj/__init__.py
    /celery.py
    /tasks.py

proj/celery.py内容

from __future__ import absolute_import, unicode_literals
from celery import Celery

app = Celery('proj',
             broker='redis://localhost',
             backend='redis://localhost',
             include=['proj.tasks'])

# Optional configuration, see the application user guide.
app.conf.update(
    result_expires=3600,
)

if __name__ == '__main__':
    app.start()

proj/tasks.py中的内容

from __future__ import absolute_import, unicode_literals
from .celery import app
import subprocess
import time

@app.task
def add(x, y):
    print('running...',x,y)
    return x + y

@app.task
def mul(x, y):
    return x * y

@app.task
def xsum(numbers):
    return sum(numbers)

@app.task
def cmd(cmd):
    print('running cmd....',cmd)
    time.sleep(5)
    cmd_obj = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
    return cmd_obj.stdout.read().decode('utf-8')

在前端启动worker

celery -A proj worker -l info

调试任务

 

在后台启动worker

使用celery multi指令在后台开启一个或多个workers:

celery multi start w1 -A proj -l info

重启worker

celery multi restart w1 -A proj -l info

停止worker

celery multi stop w1 -A proj -l info

停止指令是异步的所以它不是等待worker关闭. 所以可以使用stopwait指令确保所有正在运行的任务在完成前不会结束:

celery multi stopwait w1 -A proj -l info

 


三. Celery 定时任务

Celery支持定时任务, 设定好任务的执行时间, Celery就会定时自动帮你执行, 这个定时任务模块叫celery beat.

 

写一个脚本periodic_task.py
from celery import Celery
from celery.schedules import crontab

app = Celery('tasks',
             broker='redis://localhost',
             backend='redis://localhost')

@app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
    # Calls test('hello') every 10 seconds.
    sender.add_periodic_task(10.0, test.s('hello'), name='add every 10')

    # Calls test('world') every 30 seconds
    sender.add_periodic_task(30.0, test.s('world'), expires=10)

    # Executes every Monday morning at 7:30 a.m.
    sender.add_periodic_task(
        crontab(hour=7, minute=30, day_of_week=1),
        test.s('Happy Mondays!'),
    )


@app.task
def test(arg):
    print(arg)

add_periodic_task 会添加一条定时任务 

 

上面是通过调用函数添加定时任务, 也可以像写配置文件一样的形式添加, 下面是每30s执行的任务

app.conf.beat_schedule = {
    'add-every-30-seconds': {
        'task': 'tasks.add',
        'schedule': 30.0,
        'args': (16, 16)
    },
}
app.conf.timezone = 'UTC'
View Code

 

任务添加好了, 需要让celery单独启动一个进程来定时发起这些任务, 注意, 这里是发起任务, 不是执行, 这个进程只会不断的去检查任务计划,  每次发现有任务需要执行了, 就发起一个任务调用消息, 交给celery worker去执行

启动任务调度器 celery beat

celery -A periodic_task beat

需要启动一个worker, 负责执行celery beat发起的任务

启动celery worker来执行任务

celery -A periodic_task worker

此时worker的输出, 每隔一小会, 就会执行一次定时任务.

注意:Beat needs to store the last run times of the tasks in a local database file (named celerybeat-schedule by default), so it needs access to write in the current directory, or alternatively you can specify a custom location for this file:

1
$ celery -A periodic_task beat -s /home/celery/var/run/celerybeat-schedule

更复杂的定时配置

用crontab功能, 跟linux自带的crontab功能一样, 可以个性化定制任务执行时间.

例如每周一早上7.30执行tasks.add任务:

from celery.schedules import crontab
 
app.conf.beat_schedule = {
    # Executes every Monday morning at 7:30 a.m.
    'add-every-monday-morning': {
        'task': 'tasks.add',
        'schedule': crontab(hour=7, minute=30, day_of_week=1),
        'args': (16, 16),
    },
}

更多定时配置方式如下:

Example Meaning
crontab() Execute every minute.
crontab(minute=0, hour=0) Execute daily at midnight.
crontab(minute=0, hour='*/3') Execute every three hours: midnight, 3am, 6am, 9am, noon, 3pm, 6pm, 9pm.
crontab(minute=0,
hour='0,3,6,9,12,15,18,21')
Same as previous.
crontab(minute='*/15') Execute every 15 minutes.
crontab(day_of_week='sunday') Execute every minute (!) at Sundays.
crontab(minute='*',
hour='*',day_of_week='sun')
Same as previous.
crontab(minute='*/10',
hour='3,17,22',day_of_week='thu,fri')
Execute every ten minutes, but only between 3-4 am, 5-6 pm, and 10-11 pm on Thursdays or Fridays.
crontab(minute=0,hour='*/2,*/3') Execute every even hour, and every hour divisible by three. This means: at every hour except: 1am, 5am, 7am, 11am, 1pm, 5pm, 7pm, 11pm
crontab(minute=0, hour='*/5') Execute hour divisible by 5. This means that it is triggered at 3pm, not 5pm (since 3pm equals the 24-hour clock value of “15”, which is divisible by 5).
crontab(minute=0, hour='*/3,8-17') Execute every hour divisible by 3, and every hour during office hours (8am-5pm).
crontab(0, 0,day_of_month='2') Execute on the second day of every month.
crontab(0, 0,
day_of_month='2-30/3')
Execute on every even numbered day.
crontab(0, 0,
day_of_month='1-7,15-21')
Execute on the first and third weeks of the month.
crontab(0, 0,day_of_month='11',
month_of_year='5')
Execute on the eleventh of May every year.
crontab(0, 0,
month_of_year='*/3')
Execute on the first month of every quarter.

http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#solar-schedules   


四. 与Django结合 

Django可以跟celery结合实现异步任务, 只需简单配置即可.

django_celery/django_celery/celery.py

from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
 
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'django_celery.settings')  # !!!!!
 
app = Celery('django_celery')   #!!!!!!
 
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
#   should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
 
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
 
@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))
Let’s break down what happens in the first module, first we import absolute imports from the future, so that our celery.py module won’t *** with the library:
#####
from __future__ import absolute_import


Then we set the default DJANGO_SETTINGS_MODULE environment variable for the celery command-line program:
#####
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')


You don’t need this line, but it saves you from always passing in the settings module to the celery program. It must always come before creating the app instances, as is what we do next:
######
app = Celery('proj')

This is our instance of the library.



We also add the Django settings module as a configuration source for Celery. This means that you don’t have to use multiple configuration files, and instead configure Celery directly from the Django settings; but you can also separate them if wanted.

The uppercase name-space means that all Celery configuration options must be specified in uppercase instead of lowercase, and start with CELERY_, so for example the task_always_eager` setting becomes CELERY_TASK_ALWAYS_EAGER, and the broker_url setting becomes CELERY_BROKER_URL.

You can pass the object directly here, but using a string is better since then the worker doesn’t have to serialize the object.
#####
app.config_from_object('django.conf:settings', namespace='CELERY')


Next, a common practice for reusable apps is to define all tasks in a separate tasks.pymodule, and Celery does have a way to  auto-discover these modules:
#####
app.autodiscover_tasks()


With the line above Celery will automatically discover tasks from all of your installed apps, following the tasks.py convention:
######
- app1/
    - tasks.py
    - models.py
- app2/
    - tasks.py
    - models.py

Finally, the debug_task example is a task that dumps its own request information. This is using the new bind=True task option introduced in Celery 4.1 to easily refer to the current task instance.
explanation

需要将下面这段拷贝到django_celery/django_celery/__init__.py, 这确保app在django启动时, @shared_task装饰器会使用app.

from __future__ import absolute_import, unicode_literals
 
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app
 
__all__ = ['celery_app']

需要在settings.py里面配置

# For celery
CELERY_BROKER_URL = 'redis://localhost'
CELERY_RESULT_BACKEND = 'redis://localhost'

在具体的app里的tasks.py里写任务

# Create your tasks here
from __future__ import absolute_import, unicode_literals
from celery import shared_task
 
@shared_task
def add(x, y):
    return x + y
 
@shared_task
def mul(x, y):
    return x * y
 
@shared_task
def xsum(numbers):
    return sum(numbers)

在django views里调用celery task

from django.shortcuts import render,HttpResponse
from celery.result import AsyncResult

# Create your views here.

from app01 import tasks

def task_test(request):
    res = tasks.add.delay(228, 24)
    print("start running task")
    print(res.task_id)
    print("async task res", res.get())

    result = AsyncResult(id=res.task_id)
    print(result.get())

    return HttpResponse('res %s' % res.get())

启动worker

celery -A django_celery worker -l info

 


五. 在Django中使用计划任务功能  

1.  安装插件
pip3 install django-celery-beat
2.  修改配置 settings.py
INSTALLED_APPS = [
    ....   
    'django_celery_beat',
]

3. 数据库迁移

python manage.py migrate

4.  启动 celery beat

celery -A django_celery beat -l info -S django

定时任务存到数据库里,启动beat定时取任务放到队列里执行

5.  admin管理

 

 

 

启动celery beat和worker,会发现每隔2秒,beat会发起一个任务消息让worker执行tasks任务

注意,经测试,每添加或修改一个任务,celery beat都需要重启一次,要不然新的配置不会被celery beat进程读到

  

http://python.jobbole.com/87238/

http://docs.celeryproject.org/en/latest/index.html

 

posted @ 2018-05-23 11:23  Charonnnnn  阅读(314)  评论(0)    收藏  举报