Ceph源码笔记-OSD侧
(0)OSD侧的流程概述

(1)OSD侧的messager处理

# ms_pipe_read 线程
OSD->ms_fast_dispatch
OSD->dispatch_session_waiting
OSD->dispatch_op_fast
OSD->handle_op
OSD->get_pg_or_queue_for_pg
// 入队列
OSD->enqueue_op
######################
// 出队列
OSD->ShardedOpWQ->_process
PGQueueable->run
PGQueueable->RunVis->handle
OSD->dequeue_op
(2)ReplicatedPG处理请求队列

(2-a)ReplicatedPG处理请求队列
ReplicatedPG->do_request
// 分支:如果是主OSD的PG处理
CEPH_MSG_OSD_OP
ReplicatedPG->do_op
//分支:如果是从OSD的PG处理
MSG_OSD_SUBOP
ReplicatedPG->do_sub_op
(2-b)ReplicatedPG的分支甲: 如果是主OSD上的处理
ReplicatedPG->do_request
// 主OSD的PG处理
CEPH_MSG_OSD_OP
ReplicatedPG->do_op
ReplicatedPG->execute_ctx
// 将上下文中的多个子操作放入到事务中,对于普通读写只有一个子操作
// 读操作在prepare_transaction中完成数据读取
ReplicatedPG->prepare_transaction
ReplicatedPG->do_osd_ops
// 读操作继续
ReplicatedPG->complete_read_ctx
ReplicatedPG->OpContext->start_async_reads
// 判断是否将旧的PG日志进行trim操作
ReplicatedPG->calc_trim_to
// 写操作继续,给从OSD发消息,多副本或EC。
ReplicatedPG->issue_repop
ReplicatedBackend->submit_transaction
# 主OSD -> 从OSD
ReplicatedBackend->issue_op
>> ctx->obc->ondisk_write_lock(); 加锁,待数据落盘解锁
>> Context *on_all_commit = new C_OSD_RepopCommit(this, repop); 写完日志回调
>> Context *on_all_applied = new C_OSD_RepopApplied(this, repop); 写完XFS回调(暂未落盘,数量可读)
>> Context *onapplied_sync = new C_OSD_OndiskWriteUnlock(); 同步回调,释放锁
>> pgbackend->submit_transaction(…, onapplied_sync, on_all_applied, on_all_commit, …);
>> PGBackend->get_parent->send_message_osd_cluster
# 主OSD自身提交给journal
ReplicatedBackend->get_parent->queue_transactions
// 写操作, 检查副本操作是否已经成功
ReplicatedPG->eval_repop
(2-c)ReplicatedPG的分支乙:如果是从OSD上的消息处理
// 从OSD的PG处理
MSG_OSD_SUBOP
ReplicatedPG->do_sub_op
ReplicatedPG->sub_op_remove
ReplicatedPG->sub_op_scrub_reserve
ReplicatedPG->sub_op_scrub_stop
ReplicatedPG->sub_op_scrub_map
(2-c)ReplicatedBackend的消息处理
ReplicatedBackend::handle_message
ReplicatedBackend::sub_op_modify
PGBackend->get_parent->queue_transactions
FileStore::queue_transactions
FileStore->build_op
Journal->prepare_entry
JournalingObjectStore->submit_manager.op_submit_start
JournalingObjectStore->_op_journal_transactions
Journal->submit_entry
JournalingObjectStore->submit_manager.op_submit_finish
(3)OSD提交给Journal
# 从OSD侧提交给journal
ReplicatedBackend->handle_message()
ReplicatedBackend->sub_op_modify
ReplicatedBackend->get_parent->queue_transactions
FileStore->queue_transactions
FileStore->build_op
journal->prepare_entry
JournalingObjectStore->SubmitManager->op_submit_start
// 提交日志到日志队列,并封装一个回调对象C_JournalAhead
JournalingObjectStore->_op_journal_transactions
journal->submit_entry
FileJournal->writeq
JournalingObjectStore->SubmitManager->op_submit_finish
(4)FileJournal负责实际的读写
FileJournal->write_finish_thread
FileJournal->write_thread_entry // 负责从writeq队列中获取待写入的日志
FileJournal->prepare_multi_write // 构造日志
FileJournal->do_write // 实际写日志
FileJournal->queue_completions_thru
FileJournal->completion_peek_front
FileJournal->finisher->queue
FileJournal->finisher_cond.Signal
FileJournal->complete_write

浙公网安备 33010602011771号