原文地址:getting-started-with-kestrel-from-a-php-application
我们一直在python服务器层使用Twitter的 kestrel queue server ,并且它现在也还工作得不错~现在,我们有一些需求要求在我们php的application层也加入消息队列功能,我花了好几天的时间,并在这个星期把消息队列支持加入到我们的web application。现在我分享一下我从中学到了什么,以及我是怎么样实现它的。
目标:
kestrel本身的部署与运行是非常简单的。这里唯一需要指出的是,我推荐使用分支版本,因为我在使用主版本时,程序显得非常不稳定。关于客户端的实施,我头脑中有几个初始目标:
1.因为kesterel使用memcache协议,所以尽量使用一个现有的memcache客户端而不是重新构建一个。
2.充分利用我们现有的基础设施,我之前曾经在这篇文章介绍过得。并保证我们多用户的需求得到满足。
3.在我们改变消息队列服务器后能够保持队列接口的稳定性。
4.利用现有的kestrel管理工具,只 build我们所需的功能。
根据这些目标,我最终得出四个部分:kestrel客户端,生产者,消费者,以及一个为消费者运行的非常小的命令行工具。但是,在我进行任何编码之前,我先架设了kestrel web,一个kestrel的web界面,我的同事Matt Erkkila写的。kestrel web让你可以对kestrel消息队列进行统计,管理,还可以手动地筛选队列和对队列排序。有了这个工具,我就可以非常简单地观察到我的测试队列中的每个业务的加入和消费,也可以很容易地在我需要的时候清空队列。
kestrel客户端
我找不到现成的php的kestrel客户端,所以我找了两个现有的memcahe扩展:the older memcache, 和 Andrei Zmievski's memcached。其中后者是是基于libmemcached库的。刚开始的时候我使用 memcache,而它也工作得很好,但很快我发现我不能编辑超时选项。kestrel建议对新业务的加入使用轮询的方式,而在 memcache中当你把轮询设置超过一秒或者更高时就会出现超时错误,而这些在memcache扩展中并不存在问题,所以我决定使用扩展。
我第一个遇到的疑难杂症是,序列化。当你向kestrel写入数据的时候,你可以使用memcached的串行化器,但当你读取数据的时候,kestrel却不知道数据已经是序列化的。所以,我只在我的客户端上手动序列化数据,这样世界的和平了~还有一件必须提醒你的是,如果你打算禁用压缩或者手动压缩,你要知道作为memcache扩展,默认是会主动压缩超过100bytes的数据的,而且从kestrel读取数据时并不会进行解压缩。
另一个问题是,你不能使用任何自定义的kestrel命令。应用层并不需要太过于花俏,memcache扩展就已经够用了。一旦我们将来需要加入对kestrel的监控支持,我们可能需要重新把客户端重头做一次。不过就现在而言,kestrel已经满足我们的所有需求了。
我一决定使用 memcached,我就为它写了一个轻量级的装饰( decorator?这里不知道用什么合适),EC_KestrelClient。它负责初始化客户端,序列化,以及协助GET命令行对kestrel的一些详细选项进行设置。它也支持通过它去设置memcache的详细选项,这个类看起来像这样的:
<?php
/**
* A thin kestrel client that wraps Memcached (libmemcached extension)
*
* @author Bill Shupp <hostmaster@shupp.org>
* @copyright 2010-2011 Empower Campaigns
*/
class EC_KestrelClient
{
/**
* The Memcached instance
*
* @var Memcached
*/
protected $_memcached = null;
/**
* The Kestrel server IP
*
* @var string
*/
protected $_host = '127.0.0.1';
/**
* The Kestrel server port
*
* @var string
*/
protected $_port = 22133;
/**
* Optional options, not currently used
*
* @var array
*/
protected $_options = array();
/**
* Sets the host, port, and options to be used
*
* @param string $host The host to use, defaults to 127.0.0.1
* @param int $port The port to use, defaults to 22133
* @param array $options Memcached options, not currently used
*
* @return void
*/
public function __construct(
$host = '127.0.0.1', $port = 22133, array $options = array()
)
{
$this->_host = $host;
$this->_port = $port;
$this->setOptions($options);
}
/**
* Sets job data on the queue, json_encoding the value to avoid problematic
* serialization.
*
* @param string $queue The queue name
* @param mixed $data The data to store
*
* @return bool
*/
public function set($queue, $data)
{
// Local json serialization, as kestrel doesn't send serialization flags
return $this->getMemcached()->set($queue, json_encode($data));
}
/**
* Reliably read an item off of the queue. Meant to be run in a loop, and
* call closeReliableRead() when done to make sure the final job is not left
* on the queue.
*
* @param mixed $queue The queue name to read from
* @param int $timeout The timeout to wait for a job to appear
*
* @return array|false
* @see closeReliableRead()
*/
public function reliableRead($queue, $timeout = 1000)
{
$queue = $queue . '/close/open/t=' . $timeout;
$result = $this->getMemcached()->get($queue);
if ($result === false) {
return $result;
}
// Local json serialization, as kestrel doesn't send serialization flags
return json_decode($result, true);
}
/**
* Closes any existing open read
*
* @param string $queue The queue name
*
* @return false
*/
public function closeReliableRead($queue)
{
$queue = $queue . '/close';
return $this->getMemcached()->get($queue);
}
/**
* Aborts an existing reliable read
*
* @param string $queue The queue name
*
* @return false
*/
public function abortReliableRead($queue)
{
$queue = $queue . '/abort';
return $this->getMemcached()->get($queue);
}
/**
* Set an option to be used with the Memcached client. Not used.
*
* @param string $name The option name
* @param value $value The option value
*
* @return void
*/
public function setOption($name, $value)
{
$this->_options[$name] = $value;
}
/**
* Sets multiple options
*
* @param array $options Array of key/values to set
*
* @return void
*/
public function setOptions(array $options)
{
foreach ($options as $name => $value) {
$this->setOption($name, $value);
}
}
/**
* Gets a current option's value
*
* @param string $name The option name
*
* @return mixed
*/
public function getOption($name)
{
if (isset($this->_options[$name])) {
return $this->_options[$name];
}
return null;
}
/**
* Gets all current options
*
* @return array
*/
public function getOptions()
{
return $this->_options;
}
/**
* Gets a singleton instance of the Memcached client
*
* @return Memcached
*/
public function getMemcached()
{
if ($this->_memcached === null) {
$this->_initMemcached();
}
return $this->_memcached;
}
/**
* Initialized the Memcached client instance
*
* @return void
*/
protected function _initMemcached()
{
$this->_memcached = $this->_getMemcachedInstance();
foreach ($this->_options as $option => $value) {
$this->_memcached->setOption($option, $value);
}
$this->_memcached->addServer($this->_host, $this->_port);
$this->_memcached->setOption(Memcached::OPT_COMPRESSION, false);
}
// @codeCoverageIgnoreStart
/**
* Returns a new instance of Memcached. Abstracted for testing.
*
* @return Memcached
*/
protected function _getMemcachedInstance()
{
return new Memcached();
}
// @codeCoverageIgnoreEnd
}
生产者
这个生产者非常简单。它只是一个格式化的数据结构,包含了当前用户信息,以及消息队列的命名空间。这样当把一个新消费者加入队列时就不会和其他已有对象发生冲突。这个类看上去是这样的:
<?php
/**
* Interface for adding jobs to a queue server
*
* @author Bill Shupp <hostmaster@shupp.org>
* @copyright 2010-2011 Empower Campaigns
*/
class EC_Producer
{
/**
* Adds a job onto a queue
*
* @param string $queue The queue name to add a job to
* @param string $jobName The job name for the consumer to run
* @param mixed $data Optional additional data to pass to the job
*
* @return bool
*/
public function addJob($queue, $jobName, $data = null)
{
$item = array(
'instance' => EC::getCurrentInstanceName(),
'jobName' => $jobName
);
if ($data !== null) {
$item['data'] = $data;
}
// Namespace queue with project
$queue = 'enterprise_' . $queue;
$client = $this->_getKestrelClient();
return $client->set($queue, $item);
}
// @codeCoverageIgnoreStart
/**
* Gets a single instance of EC_KestrelClient. Abstracted for testing.
*
* @return void
*/
protected function _getKestrelClient()
{
if (APPLICATION_ENV === 'testing') {
throw new Exception(__METHOD__ . ' was not mocked when testing');
}
static $client = null;
if ($client === null) {
$host = EC::getConfigOption('kestrel.host');
$port = EC::getConfigOption('kestrel.port');
$client = new EC_KestrelClient($host, $port);
}
return $client;
}
// @codeCoverageIgnoreEnd
}
消费者
消费者比生产者稍为大那么一点点,但仍然是非常简单的。而前面说到的命令行工具被设计成为像daemontools 或者 supervisord这样的监控工具运行,所以将会是个非常简单的命令行工具,仅仅向EC_Consumer的命令行参数而已。消费者解释完接受到的参数后,从kestrel选出一个新的作业,然后通过后面的基础设施进行处理。直到我们对php进行长时间运行进程的能力更有信心之前,我增加了一个可选的参数,这个参数可以让消费者在超过X个作业时停止工作并终止,然后监控服务将在几秒以后重新运行。我还为测试增加了一个DUBUG的参数,这样你就可以观察每一个执行的动作了,命令行工具是这样的:
#!/bin/env php <?php // External application bootstrapping require_once __DIR__ . '/cli_init.php'; // Instantiate and run the consumer $consumer = new EC_Consumer($argv); $consumer->run();
然后消费类EC_Consumer是这样的:
<?php
/**
* Enterprise queue consumer interface, called by bin/consumer_cli.php
*
* @author Bill Shupp <hostmaster@shupp.org>
* @copyright 2010-2011 Empower Campaigns
*/
class EC_Consumer
{
/**
* Instance of {@link Zend_Console_Getopt}
*
* @var Zend_Console_Getopt
*/
protected $_opt = null;
/**
* Which APPLICATION_ENV to run under (see -e)
*
* @var string
*/
protected $_environment = null;
/**
* The kestrel server IP
*
* @var string
*/
protected $_host = null;
/**
* The kestrel server port
*
* @var int
*/
protected $_port = null;
/**
* The kestrel queue name to connect to
*
* @var string
*/
protected $_queue = null;
/**
* Whether we should show debug output
*
* @var bool
*/
protected $_debug = false;
/**
* Maximum # of jobs for this process to perform (for memory fail safe)
*
* @var int
*/
protected $_maxJobs = null;
/**
* Current job count
*
* @var int
*/
protected $_jobCount = 0;
/**
* Parses arguments from the command line and does error handling
*
* @param array $argv The $argv from bin/ecli.php
*
* @throw Zend_Console_Getopt_Exception on failure
* @return void
*/
public function __construct(array $argv)
{
try {
$opt = new Zend_Console_Getopt(
array(
'environment|e=s' => 'environment name (e.g. development)'
. ', required',
'server|s=s' => 'kestrel server, format of host:port'
. ', required',
'queue|q=s' => 'queue name (e.g. crawler_campaign)'
. ', required',
'max-jobs|m=s' => 'max jobs to run before exiting'
. ', optional',
'debug|d' => 'show debug output'
. ', optional',
)
);
$opt->setArguments($argv);
$opt->parse();
// Set environment
if ($opt->e === null) {
throw new Zend_Console_Getopt_Exception(
'Error: missing environment'
);
}
$this->_environment = $opt->e;
// @codeCoverageIgnoreStart
if (!defined('APPLICATION_ENV')) {
define('APPLICATION_ENV', $this->_environment);
}
// @codeCoverageIgnoreEnd
// Set server
if ($opt->s === null) {
throw new Zend_Console_Getopt_Exception(
'Error: missing server'
);
}
$parts = explode(':', $opt->s);
if (count($parts) !== 2) {
throw new Zend_Console_Getopt_Exception(
'Error: invalid server: ' . $opt->s
);
}
$this->_host = $parts[0];
$this->_port = $parts[1];
// Set queue
if ($opt->q === null) {
throw new Zend_Console_Getopt_Exception(
'Error: missing queue'
);
}
$this->_queue = $opt->q;
// Set max-jobs
if ($opt->m !== null) {
$this->_maxJobs = $opt->m;
}
// Set debug
if ($opt->d !== null) {
$this->_debug = true;
}
} catch (Zend_Console_Getopt_Exception $e) {
echo "\n" . $e->getMessage() . "\n\n";
echo $opt->getUsageMessage();
// @codeCoverageIgnoreStart
if (!defined('APPLICATION_ENV') || APPLICATION_ENV !== 'testing') {
exit(1);
}
// @codeCoverageIgnoreEnd
}
$this->_opt = $opt;
}
/**
* Polls the queue server for jobs and runs them as they come in
*
* @return void
*/
public function run()
{
$client = $this->_getKestrelClient();
$queue = 'enterprise_' . $this->_queue;
while ($this->_keepRunning()) {
// Pull job from queue
$job = $client->reliableRead($queue, 500);
if ($job === false) {
$this->_debug('Nothing on queue ' . $queue);
continue;
}
if (!isset($job['instance'])) {
echo 'Instance not set in queue job: ' . print_r($job, true);
continue;
}
$instance = $job['instance'];
if (!isset($job['jobName'])) {
echo 'Job name not set in queue job: ' . print_r($job, true);
continue;
}
$jobName = $job['jobName'];
$data = null;
if (isset($job['data'])) {
$data = $job['data'];
}
// Run the job
$returnCode = $this->runJob($instance, $jobName, $data);
if ($returnCode !== 0) {
$client->abortReliableRead($queue);
continue;
}
}
$client->closeReliableRead($queue);
}
/**
* Runs the job via bin/ecli.php
*
* @param string $instance The instance name to run the job under
* @param string $jobName The job name
* @param string $data Optional extra data
*
* @return int
*/
public function runJob($instance, $jobName, $data)
{
$cmd = BASE_PATH . '/bin/ecli.php '
. '-e ' . $this->_environment
. ' -i ' . $instance
. ' -j ' . $jobName;
if ($data) {
$cmd .= " '" . base64_encode(json_encode($data)) . "'";
}
$returnCode = $this->_passthru($cmd);
$this->_jobCount++;
$this->_debug('Job count: ' . $this->_jobCount);
return $returnCode;
}
/**
* Check to see if the job limit has been reached
*
* @return bool
*/
protected function _keepRunning()
{
return ($this->_maxJobs === null) ? true
: ($this->_jobCount < $this->_maxJobs);
}
/**
* Show debug messages
*
* @param mixed $message
*
* @return void
*/
protected function _debug($message)
{
if (!$this->_debug) {
return;
}
echo $message . "\n";
}
// @codeCoverageIgnoreStart
/**
* Calls the passthru() function and returns the exit code. Abstracted
* for testing.
*
* @param string $cmd The command to execute
*
* @return int
*/
protected function _passthru($cmd)
{
passthru($cmd, $returnCode);
return $returnCode;
}
/**
* Gets a single instance of EC_KestrelClient. Abstracted for testing.
*
* @return void
*/
protected function _getKestrelClient()
{
if (APPLICATION_ENV === 'testing') {
throw new Exception(__METHOD__ . ' was not mocked when testing');
}
return new EC_KestrelClient($this->_host, $this->_port);
}
// @codeCoverageIgnoreEnd
}
1 <?php
2
3 /**
4 * Enterprise queue consumer interface, called by bin/consumer_cli.php
5 *
6 * @author Bill Shupp <hostmaster@shupp.org>
7 * @copyright 2010-2011 Empower Campaigns
8 */
9 class EC_Consumer
10 {
11 /**
12 * Instance of {@link Zend_Console_Getopt}
13 *
14 * @var Zend_Console_Getopt
15 */
16 protected $_opt = null;
17
18 /**
19 * Which APPLICATION_ENV to run under (see -e)
20 *
21 * @var string
22 */
23 protected $_environment = null;
24
25 /**
26 * The kestrel server IP
27 *
28 * @var string
29 */
30 protected $_host = null;
31
32 /**
33 * The kestrel server port
34 *
35 * @var int
36 */
37 protected $_port = null;
38
39 /**
40 * The kestrel queue name to connect to
41 *
42 * @var string
43 */
44 protected $_queue = null;
45
46 /**
47 * Whether we should show debug output
48 *
49 * @var bool
50 */
51 protected $_debug = false;
52
53 /**
54 * Maximum # of jobs for this process to perform (for memory fail safe)
55 *
56 * @var int
57 */
58 protected $_maxJobs = null;
59
60 /**
61 * Current job count
62 *
63 * @var int
64 */
65 protected $_jobCount = 0;
66
67 /**
68 * Parses arguments from the command line and does error handling
69 *
70 * @param array $argv The $argv from bin/ecli.php
71 *
72 * @throw Zend_Console_Getopt_Exception on failure
73 * @return void
74 */
75 public function __construct(array $argv)
76 {
77 try {
78 $opt = new Zend_Console_Getopt(
79 array(
80 'environment|e=s' => 'environment name (e.g. development)'
81 . ', required',
82 'server|s=s' => 'kestrel server, format of host:port'
83 . ', required',
84 'queue|q=s' => 'queue name (e.g. crawler_campaign)'
85 . ', required',
86 'max-jobs|m=s' => 'max jobs to run before exiting'
87 . ', optional',
88 'debug|d' => 'show debug output'
89 . ', optional',
90 )
91 );
92 $opt->setArguments($argv);
93 $opt->parse();
94
95 // Set environment
96 if ($opt->e === null) {
97 throw new Zend_Console_Getopt_Exception(
98 'Error: missing environment'
99 );
100 }
101 $this->_environment = $opt->e;
102 // @codeCoverageIgnoreStart
103 if (!defined('APPLICATION_ENV')) {
104 define('APPLICATION_ENV', $this->_environment);
105 }
106 // @codeCoverageIgnoreEnd
107
108
109 // Set server
110 if ($opt->s === null) {
111 throw new Zend_Console_Getopt_Exception(
112 'Error: missing server'
113 );
114 }
115 $parts = explode(':', $opt->s);
116 if (count($parts) !== 2) {
117 throw new Zend_Console_Getopt_Exception(
118 'Error: invalid server: ' . $opt->s
119 );
120 }
121 $this->_host = $parts[0];
122 $this->_port = $parts[1];
123
124 // Set queue
125 if ($opt->q === null) {
126 throw new Zend_Console_Getopt_Exception(
127 'Error: missing queue'
128 );
129 }
130 $this->_queue = $opt->q;
131
132 // Set max-jobs
133 if ($opt->m !== null) {
134 $this->_maxJobs = $opt->m;
135 }
136
137 // Set debug
138 if ($opt->d !== null) {
139 $this->_debug = true;
140 }
141 } catch (Zend_Console_Getopt_Exception $e) {
142 echo "\n" . $e->getMessage() . "\n\n";
143 echo $opt->getUsageMessage();
144 // @codeCoverageIgnoreStart
145 if (!defined('APPLICATION_ENV') || APPLICATION_ENV !== 'testing') {
146 exit(1);
147 }
148 // @codeCoverageIgnoreEnd
149 }
150
151 $this->_opt = $opt;
152 }
153
154
155 /**
156 * Polls the queue server for jobs and runs them as they come in
157 *
158 * @return void
159 */
160 public function run()
161 {
162 $client = $this->_getKestrelClient();
163 $queue = 'enterprise_' . $this->_queue;
164
165 while ($this->_keepRunning()) {
166 // Pull job from queue
167 $job = $client->reliableRead($queue, 500);
168 if ($job === false) {
169 $this->_debug('Nothing on queue ' . $queue);
170 continue;
171 }
172
173 if (!isset($job['instance'])) {
174 echo 'Instance not set in queue job: ' . print_r($job, true);
175 continue;
176 }
177 $instance = $job['instance'];
178
179 if (!isset($job['jobName'])) {
180 echo 'Job name not set in queue job: ' . print_r($job, true);
181 continue;
182 }
183 $jobName = $job['jobName'];
184
185 $data = null;
186 if (isset($job['data'])) {
187 $data = $job['data'];
188 }
189
190 // Run the job
191 $returnCode = $this->runJob($instance, $jobName, $data);
192 if ($returnCode !== 0) {
193 $client->abortReliableRead($queue);
194 continue;
195 }
196 }
197 $client->closeReliableRead($queue);
198 }
199
200
201 /**
202 * Runs the job via bin/ecli.php
203 *
204 * @param string $instance The instance name to run the job under
205 * @param string $jobName The job name
206 * @param string $data Optional extra data
207 *
208 * @return int
209 */
210 public function runJob($instance, $jobName, $data)
211 {
212 $cmd = BASE_PATH . '/bin/ecli.php '
213 . '-e ' . $this->_environment
214 . ' -i ' . $instance
215 . ' -j ' . $jobName;
216 if ($data) {
217 $cmd .= " '" . base64_encode(json_encode($data)) . "'";
218 }
219 $returnCode = $this->_passthru($cmd);
220 $this->_jobCount++;
221 $this->_debug('Job count: ' . $this->_jobCount);
222
223 return $returnCode;
224 }
225
226 /**
227 * Check to see if the job limit has been reached
228 *229 * @return bool
230 */
231 protected function _keepRunning()
232 {
233 return ($this->_maxJobs === null) ? true
234 : ($this->_jobCount < $this->_maxJobs);
235 }
236
237
238 /**
239 * Show debug messages
240 *
241 * @param mixed $message
242 *
243 * @return void
244 */
245 protected function _debug($message)
246 {
247 if (!$this->_debug) {
248 return;
249 }
250 echo $message . "\n";
251 }
252
253 // @codeCoverageIgnoreStart
254 /**
255 * Calls the passthru() function and returns the exit code. Abstracted
256 * for testing.
257 *
258 * @param string $cmd The command to execute
259 *
260 * @return int
261 */
262 protected function _passthru($cmd)
263 {
264 passthru($cmd, $returnCode);
265 return $returnCode;
266 }
267
268 /**
269 * Gets a single instance of EC_KestrelClient. Abstracted for testing.
270 *
271 * @return void
272 */
273 protected function _getKestrelClient()
274 {
275 if (APPLICATION_ENV === 'testing') {
276 throw new Exception(__METHOD__ . ' was not mocked when testing');
277 }
278 return new EC_KestrelClient($this->_host, $this->_port);
279 }
280 // @codeCoverageIgnoreEnd
281 }
合体
那么我们现在把上面的部件都合体吧,看一看它们是怎么样工作的。通过我们的应用程序增加一个示例作业“Hello World到队列“hello_world”中,看起来是这样的:
<?php
$producer = new EC_Producer();
$producer->addJob('hello_world', 'HelloWorld', array('foo' => 'bar'));
?>
最后,这里是利用命令行工具DEBUG产生的示例输出:
./bin/consumer_cli.php -e development -s 127.0.0.1:22133 -q hello_world -d -m 2
Nothing on queue enterprise_hello_world
Nothing on queue enterprise_hello_world
Nothing on queue enterprise_hello_world
Nothing on queue enterprise_hello_world
Running EC_Job_HelloWorld on instance dev under environment development
Hello, world! Here is my data array:
stdClass Object
(
[foo] => bar
)
And here ,event) are my args: ./bin/ecli.php eyJmb28iOiJiYXIifQ==
Completed job in 0 seconds.
Job count: 1
Nothing on queue enterprise_hello_world
Nothing on queue enterprise_hello_world
Nothing on queue enterprise_hello_world
Nothing on queue enterprise_hello_world
Running EC_Job_HelloWorld on instance dev under environment development
Hello, world! Here is my data array:
stdClass Object
(
[foo] => bar
)
And here are my args: ./bin/ecli.php eyJmb28iOiJiYXIifQ==
Completed job in 0 seconds.
Job count: 2
好了,这就是我的实现。我也很希望和其他人交流怎么在PHP中使用kestrel~
好了,最后说一句,我其实不懂英语,上面都是猜的~哈哈哈【抠鼻
BTW,我感觉自己生产者和消费者是不是翻译错了。。。。希望指正
浙公网安备 33010602011771号