MySQL启动过程详解三:Innodb存储引擎的启动

Innodb启动过程如下:

1. 初始化innobase_hton,它是一个handlerton类型的指针,以便在server层能够调用存储引擎的接口。

2. Innodb相关参数的检车和初始化,包括系统表空间,临时表空间,undo表空间,redo文件,doublewrite文件等。

3. innobase_start_or_create_for_mysql()创建或者启动 innobase。

 

innobase_start_or_create_for_mysql() 过程如下:

1. 重置 start state.

2. 处理 innodb_flush_method, 一般情况下,线上使用 O_DIRECT | O_DIRECT_NO_FSYNC

3. 设置 Innodb 最大线程数量

4. 重置 innodb_buffer_pool_instances 和 innodb_buffer_pool_size

5. 根据 srv_buf_pool_instances 调整 innodb_page_cleaners 的数量

6. 启动innodb server, 进行相关参数和组件的初始化。

7. 初始化异步IO子系统

8. 创建 innodb_buffer_pool, 当没有足够的内存时会报错

9. 调用 fsp_init 和 log_init, 初始化 fsp 系统 & redo log 系统

10. 调用recv_sys_create和recv_sys_init函数,创建及初始化recovery系统

11. 调用 lock_sys_create函数,创建锁系统

12. 调用 os_thread_create 函数,创建 IO 线程

13. 调用 buf_flush_page_cleaner_init 函数,初始化 page_cleaner 系统,而后创建 buf_flush_page_cleaner_coordinator 和 buf_flush_page_cleaner_worker 线程

14. 等待 page_cleaner 变为  active 状态。

15. 调用 check_file_spec函数,检查数据文件是否存在, ibdata1 ibdata2 等等, 判断是否需要创建新的数据库

16. 如果需要创建新的数据库, 则检查是否存在 redo log file 和 undo 表空间

17. 调用 srv_sys_space.open_or_create(), 打开或创建新的数据文件[ibdata..],如果不是创建新的数据库,则从 ibdata1文件中读取 flushed_lsn

18. 这里如果是 create_new_db,则:

    18.1 从所有缓冲池的 flush list 的尾部同步flush脏的数据页

    18.2 获取当前 lsn

    18.3 创建 redo log 文件

19. 如果是 !create_new_db,则打开 redo log file

20. 调用 fil_space_create函数,创建 redo log 内存中的空间对象

21. 添加redo log file 文件到 redo log space 中

22. 初始化 redo log group 日志组

23. 调用 fil_open_log_and_system_tablespace_files,打开所有日志文件和系统表空间数据文件

24. 调用 srv_undo_tablespaces_init,打开 undo 表空间, 在找到并打开所有的 undo 文件之后, 将他们全部加入文件管理系统

25. 调用trx_sys_file_format_init函数,初始化变量file_format_max

26. 创建 trx_sys instance 并初始化 purge_queue 和 mutex

27. 如果 create_new_db,则:

     27.1 调用 fsp_header_init,在 ibdata 文件的开始分配空间,以便可以存储管理一些系统模块,如事务系统等

  27.2 调用 trx_sys_create_sys_pages,创建事务系统的文件页,在ibdata中的第6个页面。

  27.3 调用 trx_sys_init_at_db_start,创建并初始化事务系统内存结构。

  27.4 调用 trx_purge_sys_create,创建并初始化 trx purge 系统

  27.5 调用 dict_create, 创建新的数据字典并初始化 change buf

28. 使整个缓冲池无效, 来确保在 recovery的过程中我们重新读取之前读取的页。这是一个很轻量级的操作, 此时再 LRU 列表中只有一个数据页, 在 flush 列表中没有任何数据页。

29. 调用  recv_recovery_from_checkpoint_start(),开始 recovery 操作

  29.1 初始化 flush 红黑树, 以便在恢复的过程中快速插入 flush 列表。

  29.2 在 log groups 中查找 latest checkpoint

  29.3 读取 latest checkpoint 所在的 redo log 页到 log_sys->checkpoint_buf中

  29.4 获取 checkpoint_lsn 和 checkpoint_no

  29.5 从 checkpoing_lsn 读取 redo log 到 hash 表中。

  29.6 检查 crash recovery 所需的表空间, 处理并删除double write buf 中的数据页, 这里会检查double write buf 中页对应的真实数据页的完整性, 如果有问题, 则使用 double write buf 中页进行恢复。同时, 生成后台线程 recv_writer_thread 以清理缓冲池中的脏页。

  29.7 将日志段从最新的日志组复制到其他组, 我们目前只有一个日志组。

30. 清除 double write buf 中的数据页

30. 调用 dict_boot, 初始化数据字典系统和change_buf

31. 调用trx_sys_init_at_db_start,创建并初始化事务系统

32. 调用 recv_apply_hashed_log_recs,应用 redo log

33. 调用trx_purge_sys_create,创建 trx_purge sys

34. 调用recv_recovery_from_checkpoint_finish,从一个 checkpoint 位置完成 recovery 操作

  34.1 确保 recv_writer 线程已完成

  34.2 等待 flush 操作完成, flush脏页操作已经完成

  34.3 等待 recv_writer 线程终止

  34.4 释放 flush 红黑树

  34.5 回滚所有的数据字典表的事务,以便数据字典表没有被锁定。数据字典 latch 应保证一次只有一个数据字典事务处于活跃状态。

35. 调用recv_recovery_rollback_active,回滚未在Innodb中提交的不完整的事务【处于TRX_STATE_ACTIVE状态,尚未进入 TRX_STATE_PREPARED状态的事务】,这是在一个后台线程中进行中

36. 调用 srv_open_tmp_tablespace,打开临时表空间

37. 调用trx_sys_create_rsegs,创建回滚段

38. 创建锁等待超时线程,线程函数为lock_wait_timeout_thread。

39. 创建信号量超时监控线程,当信号量等待持续过长的时间时,打印警告信息,线程函数为srv_error_monitor_thread。

40. 创建 master thread,线程函数为 srv_master_thread

41. 创建 purge 系统线程,srv_purge_coordinator_thread 和 srv_worker_thread 线程

42. srv_start_wait_for_purge_to_start,等待 purge 系统启动

43. 创建buffer pool dump/load线程,线程函数为buf_dump_thread

44. 创建统计信息收集线程,线程函数为dict_stats_thread

45. 调用函数fts_optimize_init,创建优化线程,线程函数为fts_optimize_thread

46. 创建buffer pool size动态调整线程,线程函数为buf_resize_thread。

 

Innodb存储引擎的启动代码是在 ha_innodb.cc 的 innobase_init() 方法中,其源码如下:

 

/*********************************************************************//**
初始化Innodb 插件
Opens an InnoDB database.
@return 0 on success, 1 on failure */
static
int
innobase_init(
/*==========*/
	void	*p)	/*!< in: InnoDB handlerton */
{
	static char	current_dir[3];		/*!< Set if using current lib */
	int		err;
	char		*default_path;
	uint		format_id;
	ulong		num_pll_degree;
        // 初始化 innobase_hton,以便在server层能够调用Innodb的接口
	DBUG_ENTER("innobase_init");
	handlerton* innobase_hton= (handlerton*) p;
	innodb_hton_ptr = innobase_hton;
     
	innobase_hton->state = SHOW_OPTION_YES;
	innobase_hton->db_type = DB_TYPE_INNODB;
	innobase_hton->savepoint_offset = sizeof(trx_named_savept_t);
	innobase_hton->close_connection = innobase_close_connection;
	innobase_hton->kill_connection = innobase_kill_connection;
	innobase_hton->savepoint_set = innobase_savepoint;
	innobase_hton->savepoint_rollback = innobase_rollback_to_savepoint;

	innobase_hton->savepoint_rollback_can_release_mdl =
				innobase_rollback_to_savepoint_can_release_mdl;

	innobase_hton->savepoint_release = innobase_release_savepoint;
	innobase_hton->commit = innobase_commit;
	innobase_hton->rollback = innobase_rollback;
	innobase_hton->prepare = innobase_xa_prepare;
	innobase_hton->recover = innobase_xa_recover;
	innobase_hton->commit_by_xid = innobase_commit_by_xid;
	innobase_hton->rollback_by_xid = innobase_rollback_by_xid;
	innobase_hton->create = innobase_create_handler;
	innobase_hton->alter_tablespace = innobase_alter_tablespace;
	innobase_hton->drop_database = innobase_drop_database;
	innobase_hton->panic = innobase_end;
	innobase_hton->partition_flags= innobase_partition_flags;

	innobase_hton->start_consistent_snapshot =
		innobase_start_trx_and_assign_read_view;

	innobase_hton->flush_logs = innobase_flush_logs;
	innobase_hton->show_status = innobase_show_status;
	innobase_hton->fill_is_table = innobase_fill_i_s_table;
	innobase_hton->flags =
		HTON_SUPPORTS_EXTENDED_KEYS | HTON_SUPPORTS_FOREIGN_KEYS |
		HTON_SUPPORTS_TABLE_ENCRYPTION;

	innobase_hton->release_temporary_latches =
		innobase_release_temporary_latches;
        innobase_hton->replace_native_transaction_in_thd =
                innodb_replace_trx_in_thd;
	innobase_hton->data = &innodb_api_cb;
	innobase_hton->is_reserved_db_name= innobase_check_reserved_file_name;

	innobase_hton->is_supported_system_table=
		innobase_is_supported_system_table;

	innobase_hton->rotate_encryption_master_key =
		innobase_encryption_key_rotation;

	ut_a(DATA_MYSQL_TRUE_VARCHAR == (ulint)MYSQL_TYPE_VARCHAR);

#ifndef NDEBUG
	static const char	test_filename[] = "-@";
	char			test_tablename[sizeof test_filename
				+ sizeof(srv_mysql50_table_name_prefix) - 1];
	if ((sizeof(test_tablename)) - 1
			!= filename_to_tablename(test_filename,
						 test_tablename,
						 sizeof(test_tablename), true)
			|| strncmp(test_tablename,
				   srv_mysql50_table_name_prefix,
				   sizeof(srv_mysql50_table_name_prefix) - 1)
			|| strcmp(test_tablename
				  + sizeof(srv_mysql50_table_name_prefix) - 1,
				  test_filename)) {

		sql_print_error("tablename encoding has been changed");
		DBUG_RETURN(innobase_init_abort());
	}
#endif /* NDEBUG */

	/* Check that values don't overflow on 32-bit systems. */
	if (sizeof(ulint) == 4) {
		if (innobase_buffer_pool_size > UINT_MAX32) {
			sql_print_error(
				"innodb_buffer_pool_size can't be over 4GB"
				" on 32-bit systems");

			DBUG_RETURN(innobase_init_abort());
		}
	}

	os_file_set_umask(my_umask);

	/* Setup the memory alloc/free tracing mechanisms before calling
	any functions that could possibly allocate memory. */
	ut_new_boot();

	/* First calculate the default path for innodb_data_home_dir etc.,
	in case the user has not given any value.

	Note that when using the embedded server, the datadirectory is not
	necessarily the current directory of this program. */

	if (mysqld_embedded) {
		default_path = mysql_real_data_home;
	} else {
		/* It's better to use current lib, to keep paths short */
		current_dir[0] = FN_CURLIB;
		current_dir[1] = FN_LIBCHAR;
		current_dir[2] = 0;
		default_path = current_dir;
	}

	ut_a(default_path);

	fil_path_to_mysql_datadir = default_path;
	folder_mysql_datadir = fil_path_to_mysql_datadir;

	/* Set InnoDB initialization parameters according to the values
	read from MySQL .cnf file */

	/* The default dir for data files is the datadir of MySQL 
           默认的数据文件目录
        */
	srv_data_home = innobase_data_home_dir
		? innobase_data_home_dir : default_path;

	/*--------------- Shared tablespaces -------------------------
          共享表空间, 分为系统表空间和临时共享表空间
        */

	/* Check that the value of system variable innodb_page_size was
	set correctly.  Its value was put into srv_page_size. If valid,
	return the associated srv_page_size_shift. */
        // 检查系统变量 innodb_page_size 的值。
	srv_page_size_shift = innodb_page_size_validate(srv_page_size);
	if (!srv_page_size_shift) {
		sql_print_error("InnoDB: Invalid page size=%lu.\n",
				srv_page_size);
		DBUG_RETURN(innobase_init_abort());
	}

	/* Set default InnoDB temp data file size to 12 MB and let it be
	auto-extending. 
        设置默认的 Innodb 数据文件大小为12MB,并设置其自动增长。
        */
	if (!innobase_data_file_path) {
		innobase_data_file_path = (char*) "ibdata1:12M:autoextend";
	}

	/* This is the first time univ_page_size is used.
	It was initialized to 16k pages before srv_page_size was set 
        univ_page_size 被初始化为 16k.
        */
	univ_page_size.copy_from(
		page_size_t(srv_page_size, srv_page_size, false));
        // 设置系统表空间的 space_id
	srv_sys_space.set_space_id(TRX_SYS_SPACE);

	/* Create the filespace flags. 
           设置系统表空间 filespace_flags\name\path
        */
	ulint	fsp_flags = fsp_flags_init(
		univ_page_size, false, false, false, false);
	srv_sys_space.set_flags(fsp_flags);
        
	srv_sys_space.set_name(reserved_system_space_name);
	srv_sys_space.set_path(srv_data_home);

	/* Supports raw devices 
           支持 raw devices
        */
	if (!srv_sys_space.parse_params(innobase_data_file_path, true)) {
		ib::error() << "Unable to parse innodb_data_file_path="
			    << innobase_data_file_path;
		DBUG_RETURN(innobase_init_abort());
	}

	/* Set default InnoDB temp data file size to 12 MB and let it be
	auto-extending. 
           设置默认的 Innodb temp 数据文件大小为 12MB 并自动增长。
        */
	if (!innobase_temp_data_file_path) {
		innobase_temp_data_file_path = (char*) "ibtmp1:12M:autoextend";
	}

	/* We set the temporary tablspace id later, after recovery.
	The temp tablespace doesn't support raw devices.
	Set the name and path. 
        在这里设置临时表空间 name 和 path,临时表空间不支持原始设备。
        在 recovery 之后设置临时表空间id。
        */
	srv_tmp_space.set_name(reserved_temporary_space_name);
	srv_tmp_space.set_path(srv_data_home);

	/* Create the filespace flags with the temp flag set. 
           设置临时表空间的 filespace_flags.
        */
	fsp_flags = fsp_flags_init(
		univ_page_size, false, false, false, true);
	srv_tmp_space.set_flags(fsp_flags);
        
	if (!srv_tmp_space.parse_params(innobase_temp_data_file_path, false)) {
		ib::error() << "Unable to parse innodb_temp_data_file_path="
			    << innobase_temp_data_file_path;
		DBUG_RETURN(innobase_init_abort());
	}

	/* Perform all sanity check before we take action of deleting files*/
        // 检查系统表空间和临时表空间是否有公共 data file.
	if (srv_sys_space.intersection(&srv_tmp_space)) {
		sql_print_error("%s and %s file names seem to be the same.",
			srv_tmp_space.name(), srv_sys_space.name());
		DBUG_RETURN(innobase_init_abort());
	}

	/* ------------ UNDO tablespaces files ---------------------
           undo 表空间。
        */
        // undo表空间dir
	if (!srv_undo_dir) {
		srv_undo_dir = default_path;
	}
        // 规范 undo 表空间目录
	os_normalize_path(srv_undo_dir);
        
	if (strchr(srv_undo_dir, ';')) {
		sql_print_error("syntax error in innodb_undo_directory");
		DBUG_RETURN(innobase_init_abort());
	}

	/* -------------- All log files ---------------------------
           所有的日志文件
        */

	/* The default dir for log files is the datadir of MySQL 
           默认redo log 目录
        */
        // 默认 redo log group dir
	if (!srv_log_group_home_dir) {
		srv_log_group_home_dir = default_path;
	}
        // 规范目录
	os_normalize_path(srv_log_group_home_dir);

	if (strchr(srv_log_group_home_dir, ';')) {
		sql_print_error("syntax error in innodb_log_group_home_dir");
		DBUG_RETURN(innobase_init_abort());
	}
        
	if (!innobase_large_prefix) {
		ib::warn() << deprecated_large_prefix;
	}

	if (!THDVAR(NULL, support_xa)) {
		ib::warn() << deprecated_innodb_support_xa_off;
		THDVAR(NULL, support_xa) = TRUE;
	}

	if (innobase_file_format_name != innodb_file_format_default) {
		ib::warn() << deprecated_file_format;
	}

	/* Validate the file format by animal name 
           校验 innodb_file_format_max; innodb文件格式
        */
	if (innobase_file_format_name != NULL) {

		format_id = innobase_file_format_name_lookup(
			innobase_file_format_name);

		if (format_id > UNIV_FORMAT_MAX) {

			sql_print_error("InnoDB: wrong innodb_file_format.");

		DBUG_RETURN(innobase_init_abort());
		}
	} else {
		/* Set it to the default file format id. Though this
		should never happen. */
		format_id = 0;
	}

	srv_file_format = format_id;

	/* Given the type of innobase_file_format_name we have little
	choice but to cast away the constness from the returned name.
	innobase_file_format_name is used in the MySQL set variable
	interface and so can't be const. */

	innobase_file_format_name =
		(char*) trx_sys_file_format_id_to_name(format_id);

	/* Check innobase_file_format_check variable 
           检查 innodb_file_format_check 变量;
        */
	if (!innobase_file_format_check) {
		ib::warn() << deprecated_file_format_check;

		/* Set the value to disable checking. */
		srv_max_file_format_at_startup = UNIV_FORMAT_MAX + 1;

	} else {

		/* Set the value to the lowest supported format. */
		srv_max_file_format_at_startup = UNIV_FORMAT_MIN;
	}

	if (innobase_file_format_max != innodb_file_format_max_default) {
		ib::warn() << deprecated_file_format_max;
	}

	/* Did the user specify a format name that we support?
	As a side effect it will update the variable
	srv_max_file_format_at_startup */
	if (innobase_file_format_validate_and_set(
			innobase_file_format_max) < 0) {

		sql_print_error("InnoDB: invalid"
				" innodb_file_format_max value:"
				" should be any value up to %s or its"
				" equivalent numeric id",
				trx_sys_file_format_id_to_name(
					UNIV_FORMAT_MAX));

		DBUG_RETURN(innobase_init_abort());
	}
        /**
           Innodb change buffer 
        */
	if (innobase_change_buffering) {
		ulint	use;

		for (use = 0;
		     use < UT_ARR_SIZE(innobase_change_buffering_values);
		     use++) {
			if (!innobase_strcasecmp(
				    innobase_change_buffering,
				    innobase_change_buffering_values[use])) {
				ibuf_use = (ibuf_use_t) use;
				goto innobase_change_buffering_inited_ok;
			}
		}

		sql_print_error("InnoDB: invalid value"
				" innodb_change_buffering=%s",
				innobase_change_buffering);
		DBUG_RETURN(innobase_init_abort());
	}

innobase_change_buffering_inited_ok:
        // Innodb_change_buffering = ALL
	ut_a((ulint) ibuf_use < UT_ARR_SIZE(innobase_change_buffering_values));
	innobase_change_buffering = (char*)
		innobase_change_buffering_values[ibuf_use];

	/* Check that interdependent parameters have sane values. 
           对相互依赖的参数进行检查。
           srv_max_buf_pool_modified_pct & srv_max_dirty_pages_pct_lwm 
           srv_max_io_capacity & srv_io_capacity & SRV_MAX_IO_CAPACITY_DUMMY_DEFAULT 
        */
	if (srv_max_buf_pool_modified_pct < srv_max_dirty_pages_pct_lwm) {
		sql_print_warning("InnoDB: innodb_max_dirty_pages_pct_lwm"
				  " cannot be set higher than"
				  " innodb_max_dirty_pages_pct.\n"
				  "InnoDB: Setting"
				  " innodb_max_dirty_pages_pct_lwm to %lf\n",
				  srv_max_buf_pool_modified_pct);

		srv_max_dirty_pages_pct_lwm = srv_max_buf_pool_modified_pct;
	}

	if (srv_max_io_capacity == SRV_MAX_IO_CAPACITY_DUMMY_DEFAULT) {

		if (srv_io_capacity >= SRV_MAX_IO_CAPACITY_LIMIT / 2) {
			/* Avoid overflow. */
			srv_max_io_capacity = SRV_MAX_IO_CAPACITY_LIMIT;
		} else {
			/* The user has not set the value. We should
			set it based on innodb_io_capacity. */
			srv_max_io_capacity =
				ut_max(2 * srv_io_capacity, 2000UL);
		}

	} else if (srv_max_io_capacity < srv_io_capacity) {
		sql_print_warning("InnoDB: innodb_io_capacity"
				  " cannot be set higher than"
				  " innodb_io_capacity_max.\n"
				  "InnoDB: Setting"
				  " innodb_io_capacity to %lu\n",
				  srv_max_io_capacity);

		srv_io_capacity = srv_max_io_capacity;
	}
        // 检查 innodb_buffer_pool_filename 配置
	if (!is_filename_allowed(srv_buf_dump_filename,
				 strlen(srv_buf_dump_filename), FALSE)) {
		sql_print_error("InnoDB: innodb_buffer_pool_filename"
			" cannot have colon (:) in the file name.");
		DBUG_RETURN(innobase_init_abort());
	}

	/* --------------------------------------------------
           innodb_file_flush_method & innobase_log_file_size & innodb_log_write_ahead_size
           innodb_log_buffer_size & innodb_buffer_pool_size & innodb_read_io_threads & innodb_write_io_threads
           innodb_doublewrite & innodb_log_checksums & innodb_rollback_on_timeout & innobase_locks_unsafe_for_binlog 
           innodb_open_files & innodb_monitor 配置 & innodb_old_blocks_pct & innodb_undo_logs & 
           
        */

	srv_file_flush_method_str = innobase_file_flush_method;

	srv_log_file_size = (ib_uint64_t) innobase_log_file_size;

	if (UNIV_PAGE_SIZE_DEF != srv_page_size) {
		ib::warn() << "innodb-page-size has been changed from the"
			" default value " << UNIV_PAGE_SIZE_DEF << " to "
			<< srv_page_size << ".";
	}

	if (srv_log_write_ahead_size > srv_page_size) {
		srv_log_write_ahead_size = srv_page_size;
	} else {
		ulong	srv_log_write_ahead_size_tmp = OS_FILE_LOG_BLOCK_SIZE;

		while (srv_log_write_ahead_size_tmp
		       < srv_log_write_ahead_size) {
			srv_log_write_ahead_size_tmp
				= srv_log_write_ahead_size_tmp * 2;
		}
		if (srv_log_write_ahead_size_tmp
		    != srv_log_write_ahead_size) {
			srv_log_write_ahead_size
				= srv_log_write_ahead_size_tmp / 2;
		}
	}

	srv_log_buffer_size = (ulint) innobase_log_buffer_size;

	srv_buf_pool_size = (ulint) innobase_buffer_pool_size;

	srv_n_read_io_threads = (ulint) innobase_read_io_threads;
	srv_n_write_io_threads = (ulint) innobase_write_io_threads;

	srv_use_doublewrite_buf = (ibool) innobase_use_doublewrite;

	if (!innobase_use_checksums) {
		ib::warn() << "Setting innodb_checksums to OFF is DEPRECATED."
			" This option may be removed in future releases. You"
			" should set innodb_checksum_algorithm=NONE instead.";
		srv_checksum_algorithm = SRV_CHECKSUM_ALGORITHM_NONE;
	}

	innodb_log_checksums_func_update(innodb_log_checksums);

#ifdef HAVE_LINUX_LARGE_PAGES
	if ((os_use_large_pages = my_use_large_pages)) {
		os_large_page_size = opt_large_page_size;
	}
#endif

	row_rollback_on_timeout = (ibool) innobase_rollback_on_timeout;

	srv_locks_unsafe_for_binlog = (ibool) innobase_locks_unsafe_for_binlog;
	if (innobase_locks_unsafe_for_binlog) {
		ib::warn() << "Using innodb_locks_unsafe_for_binlog is"
			" DEPRECATED. This option may be removed in future"
			" releases. Please use READ COMMITTED transaction"
			" isolation level instead; " << SET_TRANSACTION_MSG;
	}

	if (innobase_open_files < 10) {
		innobase_open_files = 300;
		if (srv_file_per_table && table_cache_size > 300) {
			innobase_open_files = table_cache_size;
		}
	}

	if (innobase_open_files > (long) open_files_limit) {
		ib::warn() << "innodb_open_files should not be greater"
                       " than the open_files_limit.\n";
		if (innobase_open_files > (long) table_cache_size) {
			innobase_open_files = table_cache_size;
		}
	}

	srv_max_n_open_files = (ulint) innobase_open_files;
	srv_innodb_status = (ibool) innobase_create_status_file;

	srv_print_verbose_log = mysqld_embedded ? 0 : 1;

	/* Round up fts_sort_pll_degree to nearest power of 2 number */
	for (num_pll_degree = 1;
	     num_pll_degree < fts_sort_pll_degree;
	     num_pll_degree <<= 1) {

		/* No op */
	}

	fts_sort_pll_degree = num_pll_degree;

	/* Store the default charset-collation number of this MySQL
	installation 
        MySQL默认的 charset-collation.
        */
	data_mysql_default_charset_coll = (ulint) default_charset_info->number;
        // 初始化 innodb_commit_concurrency[限制并发提交] 的默认值
	innobase_commit_concurrency_init_default();

        // 初始化 os_event 对象。
	os_event_global_init();

	/* Set buffer pool size to default for fast startup when mysqld is
	run with --help --verbose options. */
	ulint	srv_buf_pool_size_org = 0;
	if (opt_help && opt_verbose
	    && srv_buf_pool_size > srv_buf_pool_def_size) {
		ib::warn() << "Setting innodb_buf_pool_size to "
			<< srv_buf_pool_def_size << " for fast startup, "
			<< "when running with --help --verbose options.";
		srv_buf_pool_size_org = srv_buf_pool_size;
		srv_buf_pool_size = srv_buf_pool_def_size;
	}

	/* Since we in this module access directly the fields of a trx
	struct, and due to different headers and flags it might happen that
	ib_mutex_t has a different size in this module and in InnoDB
	modules, we check at run time that the size is the same in
	these compilation modules. */
        // 启动或直接创建 innobase
	err = innobase_start_or_create_for_mysql();
        // innobase_buffer_pool_size
	if (srv_buf_pool_size_org != 0) {
		/* Set the original value back to show in help. */
		srv_buf_pool_size_org =
			buf_pool_size_align(srv_buf_pool_size_org);
		innobase_buffer_pool_size =
			static_cast<long long>(srv_buf_pool_size_org);
	} else {
		innobase_buffer_pool_size =
			static_cast<long long>(srv_buf_pool_size);
	}

	if (err != DB_SUCCESS) {
		DBUG_RETURN(innobase_init_abort());
	}

	/* Create mutex to protect encryption master_key_id. */
	mutex_create(LATCH_ID_MASTER_KEY_ID_MUTEX, &master_key_id_mutex);

	/* Adjust the innodb_undo_logs config object 
           调整 innodb_undo_logs
        */
	innobase_undo_logs_init_default_max();

	innobase_old_blocks_pct = static_cast<uint>(
		buf_LRU_old_ratio_update(innobase_old_blocks_pct, TRUE));

	ibuf_max_size_update(srv_change_buffer_max_size);

	innobase_open_tables = hash_create(200);
	mysql_mutex_init(innobase_share_mutex_key.m_value,
			 &innobase_share_mutex,
			 MY_MUTEX_INIT_FAST);
	mysql_mutex_init(commit_cond_mutex_key.m_value,
			 &commit_cond_m, MY_MUTEX_INIT_FAST);
	mysql_cond_init(commit_cond_key.m_value, &commit_cond);

	innodb_inited= 1;
#ifdef MYSQL_DYNAMIC_PLUGIN
	if (innobase_hton != p) {
		innobase_hton = reinterpret_cast<handlerton*>(p);
		*innobase_hton = *innodb_hton_ptr;
	}
#endif /* MYSQL_DYNAMIC_PLUGIN */

	/* Get the current high water mark format. */
	innobase_file_format_max = (char*) trx_sys_file_format_max_get();

	/* Currently, monitor counter information are not persistent. 
           Innodb monitor
        */
	memset(monitor_set_tbl, 0, sizeof monitor_set_tbl);

	memset(innodb_counter_value, 0, sizeof innodb_counter_value);

	/* Do this as late as possible so server is fully starts up,
	since  we might get some initial stats if user choose to turn
	on some counters from start up */
	if (innobase_enable_monitor_counter) {
		innodb_enable_monitor_at_startup(
			innobase_enable_monitor_counter);
	}

	/* Turn on monitor counters that are default on */
	srv_mon_default_on();


	/* Unit Tests */
#ifdef UNIV_ENABLE_UNIT_TEST_GET_PARENT_DIR
	unit_test_os_file_get_parent_dir();
#endif /* UNIV_ENABLE_UNIT_TEST_GET_PARENT_DIR */

#ifdef UNIV_ENABLE_UNIT_TEST_MAKE_FILEPATH
	test_make_filepath();
#endif /*UNIV_ENABLE_UNIT_TEST_MAKE_FILEPATH */

#ifdef UNIV_ENABLE_DICT_STATS_TEST
	test_dict_stats_all();
#endif /*UNIV_ENABLE_DICT_STATS_TEST */

#ifdef UNIV_ENABLE_UNIT_TEST_ROW_RAW_FORMAT_INT
# ifdef HAVE_UT_CHRONO_T
	test_row_raw_format_int();
# endif /* HAVE_UT_CHRONO_T */
#endif /* UNIV_ENABLE_UNIT_TEST_ROW_RAW_FORMAT_INT */

#ifndef UNIV_HOTBACKUP
#ifdef _WIN32
	if (ut_win_init_time()) {
		DBUG_RETURN(innobase_init_abort());
	}
#endif /* _WIN32 */
#endif /* !UNIV_HOTBACKUP */

	DBUG_RETURN(0);
}

  

innobase_start_or_create_for_mysql() 函数解析如下:

dberr_t
innobase_start_or_create_for_mysql(void)
/*====================================*/
{
	bool		create_new_db = false;
	lsn_t		flushed_lsn;
	ulint		sum_of_data_file_sizes;
	ulint		tablespace_size_in_header;
	dberr_t		err;
	ulint		srv_n_log_files_found = srv_n_log_files;
	mtr_t		mtr;
	purge_pq_t*	purge_queue;
	char		logfilename[10000];
	char*		logfile0	= NULL;
	size_t		dirnamelen;
	unsigned	i = 0;

	/* Reset the start state. 
	重置 start state.
	*/
	srv_start_state = SRV_START_STATE_NONE;
	// SRV_FORCE_NO_LOG_REDO: 不做 redo log 的前滚操作
	if (srv_force_recovery == SRV_FORCE_NO_LOG_REDO) {
		srv_read_only_mode = true;
	}
	// high_level_read_only: 
	high_level_read_only = srv_read_only_mode
		|| srv_force_recovery > SRV_FORCE_NO_TRX_UNDO;
	// 如果处于 read_only mode, 那么除了内部表之外,没有其他写操作,关闭两次写机制。
	if (srv_read_only_mode) {
		ib::info() << "Started in read only mode";

		/* There is no write except to intrinsic table and so turn-off
		doublewrite mechanism completely. */
		srv_use_doublewrite_buf = FALSE;
	}


#ifdef _WIN32
	srv_use_native_aio = TRUE;

#elif defined(LINUX_NATIVE_AIO)

	if (srv_use_native_aio) {
		ib::info() << "Using Linux native AIO";
	}
#else
	/* Currently native AIO is supported only on windows and linux
	and that also when the support is compiled in. In all other
	cases, we ignore the setting of innodb_use_native_aio. */
	srv_use_native_aio = FALSE;
#endif /* _WIN32 */

	/* Register performance schema stages before any real work has been
	started which may need to be instrumented. */
	mysql_stage_register("innodb", srv_stages, UT_ARR_SIZE(srv_stages));
	/**
	处理参数 innodb_flush_method
	通常情况下,innodb_flush_method 设置为 O_DIRECT | O_DIRECT_NO_FSYNC; 
	*/
	if (srv_file_flush_method_str == NULL) {
		/* These are the default options */
#ifndef _WIN32
		srv_unix_file_flush_method = SRV_UNIX_FSYNC;
	} else if (0 == ut_strcmp(srv_file_flush_method_str, "fsync")) {
		srv_unix_file_flush_method = SRV_UNIX_FSYNC;

	} else if (0 == ut_strcmp(srv_file_flush_method_str, "O_DSYNC")) {
		srv_unix_file_flush_method = SRV_UNIX_O_DSYNC;

	} else if (0 == ut_strcmp(srv_file_flush_method_str, "O_DIRECT")) {
		srv_unix_file_flush_method = SRV_UNIX_O_DIRECT;

	} else if (0 == ut_strcmp(srv_file_flush_method_str, "O_DIRECT_NO_FSYNC")) {
		srv_unix_file_flush_method = SRV_UNIX_O_DIRECT_NO_FSYNC;

	} else if (0 == ut_strcmp(srv_file_flush_method_str, "littlesync")) {
		srv_unix_file_flush_method = SRV_UNIX_LITTLESYNC;

	} else if (0 == ut_strcmp(srv_file_flush_method_str, "nosync")) {
		srv_unix_file_flush_method = SRV_UNIX_NOSYNC;
#else
		srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED;
	} else if (0 == ut_strcmp(srv_file_flush_method_str, "normal")) {
		srv_win_file_flush_method = SRV_WIN_IO_NORMAL;
		srv_use_native_aio = FALSE;

	} else if (0 == ut_strcmp(srv_file_flush_method_str, "unbuffered")) {
		srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED;
		srv_use_native_aio = FALSE;

	} else if (0 == ut_strcmp(srv_file_flush_method_str,
				  "async_unbuffered")) {
		srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED;
#endif /* _WIN32 */
	} else {
		ib::error() << "Unrecognized value "
			<< srv_file_flush_method_str
			<< " for innodb_flush_method";
		return(srv_init_abort(DB_ERROR));
	}

	/* Note that the call srv_boot() also changes the values of
	some variables to the units used by InnoDB internally */

	/* Set the maximum number of threads which can wait for a semaphore
	inside InnoDB: this is the 'sync wait array' size, as well as the
	maximum number of threads that can wait in the 'srv_conc array' for
	their time to enter InnoDB. 
	设置 Innodb 内部可能等待信号量的最大线程数量: 这是 sync wait array 的大小, 以及
	在 srv_conc 数组中等待进入 Innodb的最大线程数。
	*/

	srv_max_n_threads = 1   /* io_ibuf_thread */
			    + 1 /* io_log_thread */
			    + 1 /* lock_wait_timeout_thread */
			    + 1 /* srv_error_monitor_thread */
			    + 1 /* srv_monitor_thread */
			    + 1 /* srv_master_thread */
			    + 1 /* srv_purge_coordinator_thread */
			    + 1 /* buf_dump_thread */
			    + 1 /* dict_stats_thread */
			    + 1 /* fts_optimize_thread */
			    + 1 /* recv_writer_thread */
			    + 1 /* trx_rollback_or_clean_all_recovered */
			    + 128 /* added as margin, for use of
				  InnoDB Memcached etc. */
			    + max_connections
			    + srv_n_read_io_threads
			    + srv_n_write_io_threads
			    + srv_n_purge_threads
			    + srv_n_page_cleaners
			    /* FTS Parallel Sort */
			    + fts_sort_pll_degree * FTS_NUM_AUX_INDEX
			      * max_connections;
	/**
		重置 innodb_buffer_pool_instances
	*/
	if (srv_buf_pool_size >= BUF_POOL_SIZE_THRESHOLD) {

		if (srv_buf_pool_instances == srv_buf_pool_instances_default) {
#if defined(_WIN32) && !defined(_WIN64)
			/* Do not allocate too large of a buffer pool on
			Windows 32-bit systems, which can have trouble
			allocating larger single contiguous memory blocks. */
			srv_buf_pool_instances = ut_min(
				static_cast<ulong>(MAX_BUFFER_POOLS),
				static_cast<ulong>(srv_buf_pool_size
						   / (128 * 1024 * 1024)));
#else /* defined(_WIN32) && !defined(_WIN64) */
			/* Default to 8 instances when size > 1GB. */
			srv_buf_pool_instances = 8;
#endif /* defined(_WIN32) && !defined(_WIN64) */
		}
	} else {
		/* If buffer pool is less than 1 GiB, assume fewer
		threads. Also use only one buffer pool instance. */
		if (srv_buf_pool_instances != srv_buf_pool_instances_default
		    && srv_buf_pool_instances != 1) {
			/* We can't distinguish whether the user has explicitly
			started mysqld with --innodb-buffer-pool-instances=0,
			(srv_buf_pool_instances_default is 0) or has not
			specified that option at all. Thus we have the
			limitation that if the user started with =0, we
			will not emit a warning here, but we should actually
			do so. */
			ib::info()
				<< "Adjusting innodb_buffer_pool_instances"
				" from " << srv_buf_pool_instances << " to 1"
				" since innodb_buffer_pool_size is less than "
				<< BUF_POOL_SIZE_THRESHOLD / (1024 * 1024)
				<< " MiB";
		}

		srv_buf_pool_instances = 1;
	}
	// 调整 srv_buf_pool_chunk_unit 大小。
	if (srv_buf_pool_chunk_unit * srv_buf_pool_instances
	    > srv_buf_pool_size) {
		/* Size unit of buffer pool is larger than srv_buf_pool_size.
		adjust srv_buf_pool_chunk_unit for srv_buf_pool_size. */
		srv_buf_pool_chunk_unit
			= static_cast<ulong>(srv_buf_pool_size)
			  / srv_buf_pool_instances;
		if (srv_buf_pool_size % srv_buf_pool_instances != 0) {
			++srv_buf_pool_chunk_unit;
		}
	}
	// 基于 srv_buf_pool_chunk_unit 对齐 srv_buf_pool_size
	srv_buf_pool_size = buf_pool_size_align(srv_buf_pool_size);
	// 根据 srv_buf_pool_instances 重置 innodb_page_cleaners
	if (srv_n_page_cleaners > srv_buf_pool_instances) {
		/* limit of page_cleaner parallelizability
		is number of buffer pool instances. */
		srv_n_page_cleaners = srv_buf_pool_instances;
	}
	/**
	启动innodb server, 进行相关参数和组件的初始化。
	*/
	srv_boot();

	ib::info() << (ut_crc32_sse2_enabled ? "Using" : "Not using")
		<< " CPU crc32 instructions";
	// innodb monitor 相关
	if (!srv_read_only_mode) {

		mutex_create(LATCH_ID_SRV_MONITOR_FILE,
			     &srv_monitor_file_mutex);

		if (srv_innodb_status) {

			srv_monitor_file_name = static_cast<char*>(
				ut_malloc_nokey(
					strlen(fil_path_to_mysql_datadir)
					+ 20 + sizeof "/innodb_status."));

			sprintf(srv_monitor_file_name,
				"%s/innodb_status." ULINTPF,
				fil_path_to_mysql_datadir,
				os_proc_get_number());

			srv_monitor_file = fopen(srv_monitor_file_name, "w+");

			if (!srv_monitor_file) {
				ib::error() << "Unable to create "
					<< srv_monitor_file_name << ": "
					<< strerror(errno);
				return(srv_init_abort(DB_ERROR));
			}
		} else {

			srv_monitor_file_name = NULL;
			srv_monitor_file = os_file_create_tmpfile(NULL);

			if (!srv_monitor_file) {
				return(srv_init_abort(DB_ERROR));
			}
		}

		mutex_create(LATCH_ID_SRV_DICT_TMPFILE,
			     &srv_dict_tmpfile_mutex);

		srv_dict_tmpfile = os_file_create_tmpfile(NULL);

		if (!srv_dict_tmpfile) {
			return(srv_init_abort(DB_ERROR));
		}

		mutex_create(LATCH_ID_SRV_MISC_TMPFILE,
			     &srv_misc_tmpfile_mutex);

		srv_misc_tmpfile = os_file_create_tmpfile(NULL);

		if (!srv_misc_tmpfile) {
			return(srv_init_abort(DB_ERROR));
		}
	}
	/**
	file_io_threads
	*/
	// innodb_read_io_threads & innodb_write_io_threads
	srv_n_file_io_threads = srv_n_read_io_threads;

	srv_n_file_io_threads += srv_n_write_io_threads;
	// 非 read only, 添加 log & ibuf io thread
	if (!srv_read_only_mode) {
		/* Add the log and ibuf IO threads. */
		srv_n_file_io_threads += 2;
	} else {
		ib::info() << "Disabling background log and ibuf IO write"
			<< " threads.";
	}

	ut_a(srv_n_file_io_threads <= SRV_MAX_N_IO_THREADS);
	// 初始化异步IO子系统。
	if (!os_aio_init(srv_n_read_io_threads,
			 srv_n_write_io_threads,
			 SRV_MAX_N_PENDING_SYNC_IOS)) {

		ib::error() << "Cannot initialize AIO sub-system";

		return(srv_init_abort(DB_ERROR));
	}
	// 初始化各表空间的内存cache
	fil_init(srv_file_per_table ? 50000 : 5000, srv_max_n_open_files);

	double	size;
	char	unit;
	// innodb_buffer_pool_size 和 chunk_size
	if (srv_buf_pool_size >= 1024 * 1024 * 1024) {
		size = ((double) srv_buf_pool_size) / (1024 * 1024 * 1024);
		unit = 'G';
	} else {
		size = ((double) srv_buf_pool_size) / (1024 * 1024);
		unit = 'M';
	}

	double	chunk_size;
	char	chunk_unit;

	if (srv_buf_pool_chunk_unit >= 1024 * 1024 * 1024) {
		chunk_size = srv_buf_pool_chunk_unit / 1024.0 / 1024 / 1024;
		chunk_unit = 'G';
	} else {
		chunk_size = srv_buf_pool_chunk_unit / 1024.0 / 1024;
		chunk_unit = 'M';
	}

	ib::info() << "Initializing buffer pool, total size = "
		<< size << unit << ", instances = " << srv_buf_pool_instances
		<< ", chunk size = " << chunk_size << chunk_unit;
	// 创建 innodb_buffer_pool, 当没有足够的内存时会报错
	err = buf_pool_init(srv_buf_pool_size, srv_buf_pool_instances);

	if (err != DB_SUCCESS) {
		ib::error() << "Cannot allocate memory for the buffer pool";

		return(srv_init_abort(DB_ERROR));
	}

	ib::info() << "Completed initialization of buffer pool";

	// 初始化 fsp 系统 & redo log
	fsp_init();
	log_init();
	// 创建 recovery 系统, 针对一个 recovery 操作初始化 recovery 系统
	recv_sys_create();
	recv_sys_init(buf_pool_get_curr_size());
	// 数据库启动时创建锁系统
	lock_sys_create(srv_lock_table_size);
	// start lock-timeout thread
	srv_start_state_set(SRV_START_STATE_LOCK_SYS);

	/* Create i/o-handler threads: 
	创建 io 线程
	*/
	for (ulint t = 0; t < srv_n_file_io_threads; ++t) {

		n[t] = t;

		os_thread_create(io_handler_thread, n + t, thread_ids + t);
	}

	/* Even in read-only mode there could be flush job generated by
	intrinsic table operations. 
	初始化 page_cleaner
	*/
	buf_flush_page_cleaner_init();
	// 创建 buf_flush_page_cleaner_coordinator 线程
	os_thread_create(buf_flush_page_cleaner_coordinator,
			 NULL, NULL);
	// 创建 buf_flush_page_cleaner_worker 线程
	for (i = 1; i < srv_n_page_cleaners; ++i) {
		os_thread_create(buf_flush_page_cleaner_worker,
				 NULL, NULL);
	}

	/* Make sure page cleaner is active. 
	page_cleaner处于活跃状态
	*/
	while (!buf_page_cleaner_is_active) {
		os_thread_sleep(10000);
	}
	// start io-thread
	srv_start_state_set(SRV_START_STATE_IO);

	// 对目录进行规范
	os_normalize_path(srv_data_home);

	/* Check if the data files exist or not. 
	检查数据文件是否存在, ibdata1 ibdata2 等等,判断是否需要创建新的数据库
	*/
	err = srv_sys_space.check_file_spec(
		&create_new_db, MIN_EXPECTED_TABLESPACE_SIZE);

	if (err != DB_SUCCESS) {
		return(srv_init_abort(DB_ERROR));
	}
	// 不是创建新的db, 则需要回滚未完成的事务
	srv_startup_is_before_trx_rollback_phase = !create_new_db;

	/* Check if undo tablespaces and redo log files exist before creating
	a new system tablespace 
	检查是否存在 redo log file 和 undo 表空间
	*/
	if (create_new_db) {
		err = srv_check_undo_redo_logs_exists();
		if (err != DB_SUCCESS) {
			return(srv_init_abort(DB_ERROR));
		}
		recv_sys_debug_free();
	}

	/* Open or create the data files. 
	打开或者创建数据文件。
	*/
	ulint	sum_of_new_sizes;
	// 打开或者创建数据文件[ibdata文件],并从 ibdata1 文件中读取 flushed_lsn
	err = srv_sys_space.open_or_create(
		false, create_new_db, &sum_of_new_sizes, &flushed_lsn);

	switch (err) {
	case DB_SUCCESS:
		break;
	case DB_CANNOT_OPEN_FILE:
		ib::error()
			<< "Could not open or create the system tablespace. If"
			" you tried to add new data files to the system"
			" tablespace, and it failed here, you should now"
			" edit innodb_data_file_path in my.cnf back to what"
			" it was, and remove the new ibdata files InnoDB"
			" created in this failed attempt. InnoDB only wrote"
			" those files full of zeros, but did not yet use"
			" them in any way. But be careful: do not remove"
			" old data files which contain your precious data!";
		/* fall through */
	default:
		/* Other errors might come from Datafile::validate_first_page() */
		return(srv_init_abort(err));
	}

	dirnamelen = strlen(srv_log_group_home_dir);
	ut_a(dirnamelen < (sizeof logfilename) - 10 - sizeof "ib_logfile");
	memcpy(logfilename, srv_log_group_home_dir, dirnamelen);

	/* Add a path separator if needed. */
	if (dirnamelen && logfilename[dirnamelen - 1] != OS_PATH_SEPARATOR) {
		logfilename[dirnamelen++] = OS_PATH_SEPARATOR;
	}

	srv_log_file_size_requested = srv_log_file_size;
	
	if (create_new_db) {
		/**
			如果是 create new db 
		*/
		// 从所有缓冲池实例的 flush list 的末尾同步的 flush dirty blocks.
		buf_flush_sync_all_buf_pools();
		// 获取 current lsn
		flushed_lsn = log_get_lsn();
		// 创建 redo log file
		err = create_log_files(
			logfilename, dirnamelen, flushed_lsn, logfile0);

		if (err != DB_SUCCESS) {
			return(srv_init_abort(err));
		}
	} else {
		// not create new db
		for (i = 0; i < SRV_N_LOG_FILES_MAX; i++) {
			os_offset_t	size;
			os_file_stat_t	stat_info;

			sprintf(logfilename + dirnamelen,
				"ib_logfile%u", i);
			// 获取 logfile 文件状态
			err = os_file_get_status(
				logfilename, &stat_info, false,
				srv_read_only_mode);

			if (err == DB_NOT_FOUND) {
				if (i == 0) {
					if (flushed_lsn
					    < static_cast<lsn_t>(1000)) {
						ib::error()
							<< "Cannot create"
							" log files because"
							" data files are"
							" corrupt or the"
							" database was not"
							" shut down cleanly"
							" after creating"
							" the data files.";
						return(srv_init_abort(
							DB_ERROR));
					}

					err = create_log_files(
						logfilename, dirnamelen,
						flushed_lsn, logfile0);

					if (err != DB_SUCCESS) {
						return(srv_init_abort(err));
					}

					create_log_files_rename(
						logfilename, dirnamelen,
						flushed_lsn, logfile0);

					/* Suppress the message about
					crash recovery. */
					flushed_lsn = log_get_lsn();
					goto files_checked;
				} else if (i < 2) {
					/* must have at least 2 log files */
					ib::error() << "Only one log file"
						" found.";
					return(srv_init_abort(err));
				}

				/* opened all files */
				break;
			}
			// 检查 log file mode
			if (!srv_file_check_mode(logfilename)) {
				return(srv_init_abort(DB_ERROR));
			}
			// 打开 redo log file
			err = open_log_file(&files[i], logfilename, &size);

			if (err != DB_SUCCESS) {
				return(srv_init_abort(err));
			}

			ut_a(size != (os_offset_t) -1);
			
			if (size & ((1 << UNIV_PAGE_SIZE_SHIFT) - 1)) {

				ib::error() << "Log file " << logfilename
					<< " size " << size << " is not a"
					" multiple of innodb_page_size";
				return(srv_init_abort(DB_ERROR));
			}

			size >>= UNIV_PAGE_SIZE_SHIFT;

			if (i == 0) {
				srv_log_file_size = size;
			} else if (size != srv_log_file_size) {

				ib::error() << "Log file " << logfilename
					<< " is of different size "
					<< (size << UNIV_PAGE_SIZE_SHIFT)
					<< " bytes than other log files "
					<< (srv_log_file_size
					    << UNIV_PAGE_SIZE_SHIFT)
					<< " bytes!";
				return(srv_init_abort(DB_ERROR));
			}
		}
		// logfile的数量
		srv_n_log_files_found = i;

		/* Create the in-memory file space objects. 
			创建 log file 内存中的文件空间对象。
		*/
		
		sprintf(logfilename + dirnamelen, "ib_logfile%u", 0);

		/* Disable the doublewrite buffer for log files. 
			log file 禁用两次写缓冲区。
		*/
		fil_space_t*	log_space = fil_space_create(
			"innodb_redo_log",
			SRV_LOG_SPACE_FIRST_ID,
			fsp_flags_set_page_size(0, univ_page_size),
			FIL_TYPE_LOG);

		ut_a(fil_validate());
		ut_a(log_space);

		/* srv_log_file_size is measured in pages; if page size is 16KB,
		then we have a limit of 64TB on 32 bit systems */
		ut_a(srv_log_file_size <= ULINT_MAX);
		// 添加 log file文件到 log file space 中
		for (unsigned j = 0; j < i; j++) {
			sprintf(logfilename + dirnamelen, "ib_logfile%u", j);

			if (!fil_node_create(logfilename,
					     (ulint) srv_log_file_size,
					     log_space, false, false)) {
				return(srv_init_abort(DB_ERROR));
			}
		}
		// 初始化 redo log group
		if (!log_group_init(0, i, srv_log_file_size * UNIV_PAGE_SIZE,
				    SRV_LOG_SPACE_FIRST_ID)) {
			return(srv_init_abort(DB_ERROR));
		}
	}

files_checked:
	/* Open all log files and data files in the system
	tablespace: we keep them open until database
	shutdown */
	// 打开所有的日志文件和系统表数据文件。
	fil_open_log_and_system_tablespace_files();
	// 打开 undo 表空间, 在找到并打开所有的 undo 文件之后, 将他们全部加入文件管理系统
	err = srv_undo_tablespaces_init(
		create_new_db,
		srv_undo_tablespaces,
		&srv_undo_tablespaces_open);

	/* If the force recovery is set very high then we carry on regardless
	of all errors. Basically this is fingers crossed mode. 
	接下来涉及到数据的恢复。
	*/

	if (err != DB_SUCCESS
	    && srv_force_recovery < SRV_FORCE_NO_UNDO_LOG_SCAN) {

		return(srv_init_abort(err));
	}

	/* Initialize objects used by dict stats gathering thread, which
	can also be used by recovery if it tries to drop some table */
	if (!srv_read_only_mode) {
		dict_stats_thread_init();
	}
	// 初始化 file_format_max变量。
	trx_sys_file_format_init();
	// 创建 trx_sys instance 并初始化 purge_queue 和 mutex
	trx_sys_create();

	if (create_new_db) {

		ut_a(!srv_read_only_mode);

		mtr_start(&mtr);

		bool ret = fsp_header_init(0, sum_of_new_sizes, &mtr);

		mtr_commit(&mtr);

		if (!ret) {
			return(srv_init_abort(DB_ERROR));
		}

		/* To maintain backward compatibility we create only
		the first rollback segment before the double write buffer.
		All the remaining rollback segments will be created later,
		after the double write buffer has been created. */
		trx_sys_create_sys_pages();

		purge_queue = trx_sys_init_at_db_start();

		DBUG_EXECUTE_IF("check_no_undo",
				ut_ad(purge_queue->empty());
				);

		/* The purge system needs to create the purge view and
		therefore requires that the trx_sys is inited. */

		trx_purge_sys_create(srv_n_purge_threads, purge_queue);

		err = dict_create();

		if (err != DB_SUCCESS) {
			return(srv_init_abort(err));
		}

		buf_flush_sync_all_buf_pools();

		flushed_lsn = log_get_lsn();

		fil_write_flushed_lsn(flushed_lsn);

		create_log_files_rename(
			logfilename, dirnamelen, flushed_lsn, logfile0);

	} else {

		/* Check if we support the max format that is stamped
		on the system tablespace.
		Note:  We are NOT allowed to make any modifications to
		the TRX_SYS_PAGE_NO page before recovery  because this
		page also contains the max_trx_id etc. important system
		variables that are required for recovery.  We need to
		ensure that we return the system to a state where normal
		recovery is guaranteed to work. We do this by
		invalidating the buffer cache, this will force the
		reread of the page and restoration to its last known
		consistent state, this is REQUIRED for the recovery
		process to work. */
		// 检查是否支持系统表空间上的 max 格式。
		err = trx_sys_file_format_max_check(
			srv_max_file_format_at_startup);

		if (err != DB_SUCCESS) {
			return(srv_init_abort(err));
		}

		/* Invalidate the buffer pool to ensure that we reread
		the page that we read above, during recovery.
		Note that this is not as heavy weight as it seems. At
		this point there will be only ONE page in the buf_LRU
		and there must be no page in the buf_flush list. 
		使整个缓冲池无效, 来确保在 recovery的过程中我们重启读取之前读取的页。
		这是一个很轻量级的操作, 此时再 LRU 列表中只有一个数据页, 在 flush 列表中没有任何数据页。
		*/
		buf_pool_invalidate();

		/* Scan and locate truncate log files. Parsed located files
		and add table to truncate information to central vector for
		truncate fix-up action post recovery. 
		扫描并定位 truncate log file, 解析truncate log file.
		*/
		err = TruncateLogParser::scan_and_parse(srv_log_group_home_dir);
		if (err != DB_SUCCESS) {

			return(srv_init_abort(DB_ERROR));
		}

		/* We always try to do a recovery, even if the database had
		been shut down normally: this is the normal startup path 
		通常情况下, 需要做一个 recovery 操作, 即使 database 正常关闭。
		*/
		/**
		从 checkpoint  flushed_lsn 位置开始恢复。
		1. 初始化红黑树, 以便在恢复的过程中快速插入 flush 列表。
		2. 在 log groups 中查找 latest checkpoint
		3. 读取 latest checkpoint 所在的 redo log 页到 log_sys->checkpoint_buf中
		4. 获取 checkpoint_lsn 和 checkpoint_no
		5. 从 checkpoing_lsn 读取 redo log 到 hash 表中。
		6. 检查 crash recovery 所需的表空间, 处理并删除double write buf 中的数据页, 这里会检查double write buf 中页对应的真实数据页的
		完整性, 如果有问题, 则使用 double write buf 中页进行恢复。同时, 生成后台线程 recv_writer_thread 以清理缓冲池中的脏页。
		7. 将日志段从最新的日志组复制到其他组, 我们目前只有一个日志组。
		*/ 
		err = recv_recovery_from_checkpoint_start(flushed_lsn);
          // 清除 double write buf 中的数据页
		recv_sys->dblwr.pages.clear();
		// 初始化 数据字典系统,并初始化change buffer
		if (err == DB_SUCCESS) {
			/* Initialize the change buffer. */
			err = dict_boot();
		}

		if (err != DB_SUCCESS) {

			/* A tablespace was not found during recovery. The
			user must force recovery. */

			if (err == DB_TABLESPACE_NOT_FOUND) {

				srv_fatal_error();

				ut_error;
			}

			return(srv_init_abort(DB_ERROR));
		}
		// 创建并初始化事务系统。
		purge_queue = trx_sys_init_at_db_start();

		if (srv_force_recovery < SRV_FORCE_NO_LOG_REDO) {
			/* Apply the hashed log records to the
			respective file pages, for the last batch of
			recv_group_scan_log_recs(). */
			// 应用 redo log, 完成 crash recovery 操作.
			recv_apply_hashed_log_recs(TRUE);
			DBUG_PRINT("ib_log", ("apply completed"));

			if (recv_needed_recovery) {
				/// Last MySQL binlog file position 0 894036112, file name mysql-bin.002128
				trx_sys_print_mysql_binlog_offset();
			}
		}

		if (recv_sys->found_corrupt_log) {
			ib::warn()
				<< "The log file may have been corrupt and it"
				" is possible that the log scan or parsing"
				" did not proceed far enough in recovery."
				" Please run CHECK TABLE on your InnoDB tables"
				" to check that they are ok!"
				" It may be safest to recover your"
				" InnoDB database from a backup!";
		}

		/* The purge system needs to create the purge view and
		therefore requires that the trx_sys is inited. */
		// 创建 trx_purge_sys
		trx_purge_sys_create(srv_n_purge_threads, purge_queue);

		/* recv_recovery_from_checkpoint_finish needs trx lists which
		are initialized in trx_sys_init_at_db_start(). */
		/*
			完成 recovery 操作。
			1. 确保 recv_writer 线程已完成
			2. 等待 flush 操作完成, flush脏页操作已经完成
			3. 等待 recv_writer 线程终止
			4. 释放 flush 红黑树
			5. 回滚所有的数据字典表的事务,以便数据字典表没有被锁定。数据字典 latch 应保证一次只有一个数据字典事务处于活跃状态。
		*/ 
		recv_recovery_from_checkpoint_finish();

		/* Fix-up truncate of tables in the system tablespace
		if server crashed while truncate was active. The non-
		system tables are done after tablespace discovery. Do
		this now because this procedure assumes that no pages
		have changed since redo recovery.  Tablespace discovery
		can do updates to pages in the system tablespace.*/
		// 修复系统表空间中的表
		err = truncate_t::fixup_tables_in_system_tablespace();

		if (srv_force_recovery < SRV_FORCE_NO_IBUF_MERGE) {
			/* Open or Create SYS_TABLESPACES and SYS_DATAFILES
			so that tablespace names and other metadata can be
			found. */
			srv_sys_tablespaces_open = true;
			// 检查数据字典中每个表的表空间
			err = dict_create_or_check_sys_tablespace();
			if (err != DB_SUCCESS) {
				return(srv_init_abort(err));
			}

			/* The following call is necessary for the insert
			buffer to work with multiple tablespaces. We must
			know the mapping between space id's and .ibd file
			names.

			In a crash recovery, we check that the info in data
			dictionary is consistent with what we already know
			about space id's from the calls to fil_ibd_load().

			In a normal startup, we create the space objects for
			every table in the InnoDB data dictionary that has
			an .ibd file.

			We also determine the maximum tablespace id used.

			The 'validate' flag indicates that when a tablespace
			is opened, we also read the header page and validate
			the contents to the data dictionary. This is time
			consuming, especially for databases with lots of ibd
			files.  So only do it after a crash and not forcing
			recovery.  Open rw transactions at this point is not
			a good reason to validate. */
			bool validate = recv_needed_recovery
				&& srv_force_recovery == 0;

			dict_check_tablespaces_and_store_max_id(validate);
		}

		/* Rotate the encryption key for recovery. It's because
		server could crash in middle of key rotation. Some tablespace
		didn't complete key rotation. Here, we will resume the
		rotation. */
		if (!srv_read_only_mode
		    && srv_force_recovery < SRV_FORCE_NO_LOG_REDO) {
			fil_encryption_rotate();
		}


		/* Fix-up truncate of table if server crashed while truncate
		was active. */
		err = truncate_t::fixup_tables_in_non_system_tablespace();

		if (err != DB_SUCCESS) {
			return(srv_init_abort(err));
		}

		if (!srv_force_recovery
		    && !recv_sys->found_corrupt_log
		    && (srv_log_file_size_requested != srv_log_file_size
			|| srv_n_log_files_found != srv_n_log_files)) {

			/* Prepare to replace the redo log files. */

			if (srv_read_only_mode) {
				ib::error() << "Cannot resize log files"
					" in read-only mode.";
				return(srv_init_abort(DB_READ_ONLY));
			}

			/* Prepare to delete the old redo log files */
			flushed_lsn = srv_prepare_to_delete_redo_log_files(i);

			/* Prohibit redo log writes from any other
			threads until creating a log checkpoint at the
			end of create_log_files(). */
			ut_d(recv_no_log_write = true);
			ut_ad(!buf_pool_check_no_pending_io());

			RECOVERY_CRASH(3);

			/* Stamp the LSN to the data files. */
			fil_write_flushed_lsn(flushed_lsn);

			RECOVERY_CRASH(4);

			/* Close and free the redo log files, so that
			we can replace them. */
			fil_close_log_files(true);

			RECOVERY_CRASH(5);

			/* Free the old log file space. */
			log_group_close_all();

			ib::warn() << "Starting to delete and rewrite log"
				" files.";

			srv_log_file_size = srv_log_file_size_requested;

			err = create_log_files(
				logfilename, dirnamelen, flushed_lsn,
				logfile0);

			if (err != DB_SUCCESS) {
				return(srv_init_abort(err));
			}

			create_log_files_rename(
				logfilename, dirnamelen, flushed_lsn,
				logfile0);
		}
		// 回滚未提交的不完整的事务, 这是在一个后台线程中进行的。
		recv_recovery_rollback_active();

		/* It is possible that file_format tag has never
		been set. In this case we initialize it to minimum
		value.  Important to note that we can do it ONLY after
		we have finished the recovery process so that the
		image of TRX_SYS_PAGE_NO is not stale. */
		trx_sys_file_format_tag_init();
	}

	if (!create_new_db) {
		/* Check and reset any no-redo rseg slot on disk used by
		pre-5.7.2 redo resg with no data to purge. */
		trx_rseg_reset_pending();
	}

	if (!create_new_db && sum_of_new_sizes > 0) {
		/* New data file(s) were added */
		mtr_start(&mtr);

		fsp_header_inc_size(0, sum_of_new_sizes, &mtr);

		mtr_commit(&mtr);

		/* Immediately write the log record about increased tablespace
		size to disk, so that it is durable even if mysqld would crash
		quickly */

		log_buffer_flush_to_disk();
	}

	/* Open temp-tablespace and keep it open until shutdown. */
	// 打开临时表空间
	err = srv_open_tmp_tablespace(create_new_db, &srv_tmp_space);

	if (err != DB_SUCCESS) {
		return(srv_init_abort(err));
	}

	/* Create the doublewrite buffer to a new tablespace */
	if (buf_dblwr == NULL && !buf_dblwr_create()) {
		return(srv_init_abort(DB_ERROR));
	}

	/* Here the double write buffer has already been created and so
	any new rollback segments will be allocated after the double
	write buffer. The default segment should already exist.
	We create the new segments only if it's a new database or
	the database was shutdown cleanly. */

	/* Note: When creating the extra rollback segments during an upgrade
	we violate the latching order, even if the change buffer is empty.
	We make an exception in sync0sync.cc and check srv_is_being_started
	for that violation. It cannot create a deadlock because we are still
	running in single threaded mode essentially. Only the IO threads
	should be running at this stage. */

	/* Deprecate innodb_undo_logs.  But still use it if it is set to
	non-default and innodb_rollback_segments is default. */
	ut_a(srv_rollback_segments > 0);
	ut_a(srv_rollback_segments <= TRX_SYS_N_RSEGS);
	ut_a(srv_undo_logs > 0);
	ut_a(srv_undo_logs <= TRX_SYS_N_RSEGS);
	if (srv_undo_logs < TRX_SYS_N_RSEGS) {
		ib::warn() << deprecated_undo_logs;
		if (srv_rollback_segments == TRX_SYS_N_RSEGS) {
			srv_rollback_segments = srv_undo_logs;
		}
	}

	/* The number of rsegs that exist in InnoDB is given by status
	variable srv_available_undo_logs. The number of rsegs to use can
	be set using the dynamic global variable srv_rollback_segments. */
	// 创建回滚段
	srv_available_undo_logs = trx_sys_create_rsegs(
		srv_undo_tablespaces, srv_rollback_segments, srv_tmp_undo_logs);

	if (srv_available_undo_logs == ULINT_UNDEFINED) {
		/* Can only happen if server is read only. */
		ut_a(srv_read_only_mode);
		srv_rollback_segments = ULONG_UNDEFINED;
	} else if (srv_available_undo_logs < srv_rollback_segments
		   && !srv_force_recovery && !recv_needed_recovery) {
		ib::error() << "System or UNDO tablespace is running of out"
			    << " of space";
		/* Should due to out of file space. */
		return(srv_init_abort(DB_ERROR));
	}

	srv_startup_is_before_trx_rollback_phase = false;

	if (!srv_read_only_mode) {
		/* Create the thread which watches the timeouts
		for lock waits 
			创建 lock_wait_timeout_thread watch 线程
		*/
		os_thread_create(
			lock_wait_timeout_thread,
			NULL, thread_ids + 2 + SRV_MAX_N_IO_THREADS);

		/* Create the thread which warns of long semaphore waits 
			创建 srv_error_monitor_thread 线程
		*/
		os_thread_create(
			srv_error_monitor_thread,
			NULL, thread_ids + 3 + SRV_MAX_N_IO_THREADS);

		/* Create the thread which prints InnoDB monitor info 
			创建 Innodb monitor info print 线程
		*/
		os_thread_create(
			srv_monitor_thread,
			NULL, thread_ids + 4 + SRV_MAX_N_IO_THREADS);

		srv_start_state_set(SRV_START_STATE_MONITOR);
	}

	/* Create the SYS_FOREIGN and SYS_FOREIGN_COLS system tables */
	err = dict_create_or_check_foreign_constraint_tables();
	if (err != DB_SUCCESS) {
		return(srv_init_abort(err));
	}

	/* Create the SYS_TABLESPACES system table */
	err = dict_create_or_check_sys_tablespace();
	if (err != DB_SUCCESS) {
		return(srv_init_abort(err));
	}
	srv_sys_tablespaces_open = true;

	/* Create the SYS_VIRTUAL system table */
	err = dict_create_or_check_sys_virtual();
	if (err != DB_SUCCESS) {
		return(srv_init_abort(err));
	}

	srv_is_being_started = false;

	ut_a(trx_purge_state() == PURGE_STATE_INIT);

	/* Create the master thread which does purge and other utility
	operations 
		创建 master 线程
	*/

	if (!srv_read_only_mode) {

		os_thread_create(
			srv_master_thread,
			NULL, thread_ids + (1 + SRV_MAX_N_IO_THREADS));

		srv_start_state_set(SRV_START_STATE_MASTER);
	}
	// purge_coordinator 线程和 purge_worker 线程
	if (!srv_read_only_mode
	    && srv_force_recovery < SRV_FORCE_NO_BACKGROUND) {

		os_thread_create(
			srv_purge_coordinator_thread,
			NULL, thread_ids + 5 + SRV_MAX_N_IO_THREADS);

		ut_a(UT_ARR_SIZE(thread_ids)
		     > 5 + srv_n_purge_threads + SRV_MAX_N_IO_THREADS);

		/* We've already created the purge coordinator thread above. */
		for (i = 1; i < srv_n_purge_threads; ++i) {
			os_thread_create(
				srv_worker_thread, NULL,
				thread_ids + 5 + i + SRV_MAX_N_IO_THREADS);
		}
		// 等待 purge thread 启动
		srv_start_wait_for_purge_to_start();

		srv_start_state_set(SRV_START_STATE_PURGE);
	} else {
		purge_sys->state = PURGE_STATE_DISABLED;
	}

	/* wake main loop of page cleaner up 
		唤醒 page cleaner 主循环
	*/
	os_event_set(buf_flush_event);

	sum_of_data_file_sizes = srv_sys_space.get_sum_of_sizes();
	ut_a(sum_of_new_sizes != ULINT_UNDEFINED);

	tablespace_size_in_header = fsp_header_get_tablespace_size();

	if (!srv_read_only_mode
	    && !srv_sys_space.can_auto_extend_last_file()
	    && sum_of_data_file_sizes != tablespace_size_in_header) {

		ib::error() << "Tablespace size stored in header is "
			<< tablespace_size_in_header << " pages, but the sum"
			" of data file sizes is " << sum_of_data_file_sizes
			<< " pages";

		if (srv_force_recovery == 0
		    && sum_of_data_file_sizes < tablespace_size_in_header) {
			/* This is a fatal error, the tail of a tablespace is
			missing */

			ib::error()
				<< "Cannot start InnoDB."
				" The tail of the system tablespace is"
				" missing. Have you edited"
				" innodb_data_file_path in my.cnf in an"
				" inappropriate way, removing"
				" ibdata files from there?"
				" You can set innodb_force_recovery=1"
				" in my.cnf to force"
				" a startup if you are trying"
				" to recover a badly corrupt database.";

			return(srv_init_abort(DB_ERROR));
		}
	}

	if (!srv_read_only_mode
	    && srv_sys_space.can_auto_extend_last_file()
	    && sum_of_data_file_sizes < tablespace_size_in_header) {

		ib::error() << "Tablespace size stored in header is "
			<< tablespace_size_in_header << " pages, but the sum"
			" of data file sizes is only "
			<< sum_of_data_file_sizes << " pages";

		if (srv_force_recovery == 0) {

			ib::error()
				<< "Cannot start InnoDB. The tail of"
				" the system tablespace is"
				" missing. Have you edited"
				" innodb_data_file_path in my.cnf in an"
				" InnoDB: inappropriate way, removing"
				" ibdata files from there?"
				" You can set innodb_force_recovery=1"
				" in my.cnf to force"
				" InnoDB: a startup if you are trying to"
				" recover a badly corrupt database.";

			return(srv_init_abort(DB_ERROR));
		}
	}

	if (srv_print_verbose_log) {
		ib::info() << INNODB_VERSION_STR
			<< " started; log sequence number "
			<< srv_start_lsn;
	}

	if (srv_force_recovery > 0) {
		ib::info() << "!!! innodb_force_recovery is set to "
			<< srv_force_recovery << " !!!";
	}

	if (srv_force_recovery == 0) {
		/* In the insert buffer we may have even bigger tablespace
		id's, because we may have dropped those tablespaces, but
		insert buffer merge has not had time to clean the records from
		the ibuf tree. */

		ibuf_update_max_tablespace_id();
	}

	if (!srv_read_only_mode) {
		if (create_new_db) {
			srv_buffer_pool_load_at_startup = FALSE;
		}

		/* Create the buffer pool dump/load thread */
		os_thread_create(buf_dump_thread, NULL, NULL);

		/* Create the dict stats gathering thread */
		os_thread_create(dict_stats_thread, NULL, NULL);

		/* Create the thread that will optimize the FTS sub-system. */
		fts_optimize_init();

		srv_start_state_set(SRV_START_STATE_STAT);
	}

	/* Create the buffer pool resize thread */
	os_thread_create(buf_resize_thread, NULL, NULL);

	srv_was_started = TRUE;
	return(DB_SUCCESS);
}

  

posted @ 2022-04-13 10:25  卷毛狒狒  阅读(870)  评论(0编辑  收藏  举报