MySQL中slave端Retrieved

143次阅读

共计 9078 个字符，预计需要花费 23 分钟才能阅读完成。

丸趣 TV 小编给大家分享一下 MySQL 中 slave 端 Retrieved_Gtid_Set 读取改进的示例分析，相信大部分人都还不怎么了解，因此分享这篇文章给大家参考一下，希望大家阅读完这篇文章后大有收获，下面让我们一起去了解一下吧！

一、问题由来

今天朋友问我这样一个问题 @K.I.S.S，在官方文档中有这样一段描述：

When using GTIDs, the slave tells the master which transactions it has already received, executed, or both. To compute this set, it reads 
the global value of gtid_executed and the value of the Retrieved_gtid_set column from SHOW SLAVE STATUS. The GTID of the last transmitted 
transaction is included in Retrieved_gtid_set only when the full transaction is received. The slave computes the following set:
UNION(@@global.gtid_executed, Retrieved_gtid_set)
Prior to MySQL 5.7.5, the GTID of the last transmitted transaction was included in Retrieved_gtid_set even if the transaction was only 
partially transmitted, and the last received GTID was subtracted from this set. (Bug #17943188) Thus, the slave computed the following set: UNION(@@global.gtid_executed, Retrieved_gtid_set - last_received_GTID)

问为什么要减去 last_received_GTID。

二、查看 Bug #17943188 的 commit

对于这个问题我先查看了一下这个 bug 到底修复了什么问题如下：

 BUG#17943188 SHOW SLAVE STATUS/RETRIEVED_GTID_SET MAY HAVE PARTIAL TRX OR MISS COMPLETE TRX
 
 Problem:
 =======
 
 The SHOW SLAVE STATUS command contains the column RETRIEVED_GTID_SET.
 This is supposed to contain the set of GTIDs that exist in the relay
 log. However, the field is updated when the slave receiver thread
 (I/O thread) receives a Gtid_log_event, which happens at the beginning
 of the transaction.
 
 If the I/O thread gets disconnected in the middle of a transaction,
 RETRIEVED_GTID_SET can contain a GTID for a transaction that is only
 partially received in the relay log. This transaction will
 subsequently be rolled back, so it is wrong to pretend that the
 transaction is there.
 
 Typical fail-over algorithms use RETRIEVED_GTID_SET to determine which
 slave has received the most transactions to promote the slave to a
 master. This is true for e.g. the mysqlfailover utility.
 
 When RETRIEVED_GTID_SET can contain partially transmitted transactions,
 the fail-over utility can choose the wrong slave to promote. This can
 lead to data corruption later.
 
 This means that even if semi-sync is enabled, transactions that have
 been acknowledged by one slave can be lost.
 
 Fix:
 ===
 
 It was implemented a transaction boundaries parser that will give
 information about transaction boundaries of an event stream based on
 the event types and their queries (when they are Query_log_event).
 
 As events are queued by the I/O thread, it feeds the Master_info
 transaction boundary parser. The slave I/O recovery also uses the
 transaction parser to determine if a given GTID can be added to the
 Retrieved_Gtid_Set or not.
 
 When the event parser is in GTID state because a Gtid_log_event was
 queued, the event s GTID isn t added to the retrieved list yet.
 It is stored in an auxiliary GTID variable.
 
 After flushing an event into the relay log, the IO thread verifies if the transaction parser is not inside a transaction anymore (meaning
 that the last event of the transaction has been flushed).
 
 If transaction parser is outside a transaction, the I/O thread
 verifies if a GTID was stored in the start of the transaction, adding
 it to the retrieved list, ensuring that all the transaction has arrived and was flushed to the relay log.
 
 Also, before this patch, after the I/O thread flushed a single received
 event into the relaylog, it was possible to rotate the relaylog if the
 current relaylog file size exceeded max_binlog_size/max_relaylog_size.
 After this patch, when GTIDs are enabled we only allow this rotation by
 size if the transaction parser is not in the middle of a transaction.
 
 Note: The current patch removed the changes for BUG#17280176, as it also dealt with similar problem in a different way.

大概就是说对于某些 I / O 线程并没有完整传输的 Gtid 事物记录到了 RETRIEVED_GTID_SET 中这会导致比较严重问题，因为某些监控工具根据这个来判断是否切换之类的，因此我们加入了一个事物边界分析器来判断事物是否完整传输，如果完整传输才记录到 RETRIEVED_GTID_SET 中这是 5.7.5 过后加入的。总的说来为了进行两个问题的修复：

1、最后一个 gtid 事物是否完整。

2、跨越多个 relay log 的 binlog 得到正确的 gtid 集合。

三、修改了什么

其实修改得非常多，不一一列举，有兴趣可以自己看看 commit 9dab9dad975d09b8f37f33bf3c522d36fdf1d0f9，这里列举几个我看了的地方。

1、在 MYSQL_BIN_LOG::init_gtid_sets 加入了如下逻辑

/*
 If we use GTIDs and have partial transactions on the relay log,
 must check if it ends on next relay log files.
 We also need to feed the boundary parser with the rest of the
 relay log to put it in the correct state before receiving new
 events from the master in the case of GTID auto positioning be
 disabled.
 */ if (is_relay_log)
 { /*
 Suppose the following relaylog:
 rl-bin.000001 | rl-bin.000002 | rl-bin.000003 | rl-bin-000004
 ---------------+---------------+---------------+---------------
 PREV_GTIDS | PREV_GTIDS | PREV_GTIDS | PREV_GTIDS
 (empty) | (UUID:1) | (UUID:1) | (UUID:1)
 ---------------+---------------+---------------+---------------
 GTID(UUID:1) | QUERY(INSERT) | QUERY(INSERT) | XID
 ---------------+---------------+---------------+---------------
 QUERY(CREATE |
 TABLE t1 ...) |
 ---------------+
 GTID(UUID:2) |
 ---------------+
 QUERY(BEGIN) |
 ---------------+
 As it is impossible to determine the current Retrieved_Gtid_Set by only
 looking to the PREVIOUS_GTIDS on the last relay log file, and scanning
 events on it, we tried to find a relay log file that contains at least
 one GTID event during the backwards search.
 In the example, we will find a GTID only in rl-bin.000001, as the
 UUID:2 transaction was spanned across 4 relay log files.
 The transaction spanning can be caused by  FLUSH RELAY LOGS  commands
 on slave while it is queuing the transaction.
 So, in order to correctly add UUID:2 into Retrieved_Gtid_Set, we need
 to parse the relay log starting on the file we found the last GTID
 queued to know if the transaction was fully retrieved or not.
 */ /*
 Adjust the reverse iterator to point to the relaylog file we
 need to start parsing, as it was incremented after generating
 the relay log file name.
 */ rit--;// 回退一个文件   因为前面为 rit++ 做操作  exp:1 2 3 {4} 5 6 7 ---  1 2 3 4 {5} 6 7 /* Reset the transaction parser before feeding it with events */ trx_parser- reset();
 gtid_partial_trx- clear();
 DBUG_PRINT(info , ( Iterating forwards through relay logs,    updating the Retrieved_Gtid_Set and updating    IO thread trx parser before start.)); for (it= find(filename_list.begin(), filename_list.end(), *rit);
 it != filename_list.end(); it++)// 从匹配的位置继续向后  { const char *filename= it- c_str();
 DBUG_PRINT(info , ( filename= %s , filename)); if (read_gtids_and_update_trx_parser_from_relaylog(filename, all_gtids, true, trx_parser,
 gtid_partial_trx))
 {
 error= 1; goto end;
 }
 }
 }
 }

其实这一块也说明了解决的什么问题，我们发现一个事物的 binlog event 在 relay log 中是可以跨文件的。而在 bin log 中是不能跨文件的。仅仅判断最后一个 gtid priv event 是不正确的。因此需要这样修改。

2、其次加入了边界分析器 Transaction_boundary_parser 类

这个类是完全新加入的，这里是其中的一些状态：

num enum_event_boundary_type { EVENT_BOUNDARY_TYPE_ERROR= -1, /* Gtid_log_event */ EVENT_BOUNDARY_TYPE_GTID= 0, /* Query_log_event(BEGIN), Query_log_event(XA START) */ EVENT_BOUNDARY_TYPE_BEGIN_TRX= 1, /* Xid, Query_log_event(COMMIT), Query_log_event(ROLLBACK), XA_Prepare_log_event */ EVENT_BOUNDARY_TYPE_END_TRX= 2, /* Query_log_event(XA ROLLBACK) */ EVENT_BOUNDARY_TYPE_END_XA_TRX= 3, /* User_var, Intvar and Rand */ EVENT_BOUNDARY_TYPE_PRE_STATEMENT= 4, /*
 All other Query_log_events and all other DML events
 (Rows, Load_data, etc.)
 */ EVENT_BOUNDARY_TYPE_STATEMENT= 5, /*
 All non DDL/DML events: Format_desc, Rotate, Incident,
 Previous_gtids, Stop, etc.
 */ EVENT_BOUNDARY_TYPE_IGNORE= 6 }; /*
 Internal states for parsing a stream of events.
 DDL has the format:
 DDL-1: [GTID]
 DDL-2: [User] [Intvar] [Rand]
 DDL-3: Query
 DML has the format:
 DML-1: [GTID]
 DML-2: Query(BEGIN)
 DML-3: Statements
 DML-4: (Query(COMMIT) | Query([XA] ROLLBACK) | Xid | Xa_prepare)
 */ enum enum_event_parser_state { /* NONE is set after DDL-3 or DML-4 */ EVENT_PARSER_NONE, /* GTID is set after DDL-1 or DML-1 */ EVENT_PARSER_GTID, /* DDL is set after DDL-2 */ EVENT_PARSER_DDL, /* DML is set after DML-2 */ EVENT_PARSER_DML, /* ERROR is set whenever the above pattern is not followed */ EVENT_PARSER_ERROR
 };

3、read_gtids_and_update_trx_parser_from_relaylog 函数新增

这个函数是完全新加入的就是为了完成所说的功能，在 read_gtids_and_update_trx_parser_from_relaylog 中我看到对文件所有 event 进行了读取，并且用 switch 进行了不同 event 类型的处理，但是具体没有细看。但是最后看到对于对于是否加入 retrieve gtid 的判断如下：

 /*
 If we reached the end of a transaction after storing it s GTID
 in gtid_partial_trx variable, it is time to add this GTID to the
 retrieved_gtids set because the transaction is complete and there is no
 need for asking this transaction again.
 */ if (trx_parser- is_not_inside_transaction())
 { if (!gtid_partial_trx- is_empty())
 { DBUG_PRINT( info , ( Adding Gtid to Retrieved_Gtid_Set as the    transaction was completed at    relaylog file  %s : Gtid(%d, %lld). ,
 filename, gtid_partial_trx- sidno,
 gtid_partial_trx- gno));
 retrieved_gtids- _add_gtid(gtid_partial_trx- sidno,
 gtid_partial_trx- gno);
 gtid_partial_trx- clear();
 }
 }

四、初始化的时候 retrieve gtid 到底如何计算

实际上就在现在看来应该就是读取 relay_log 的最后一个 gtid 事物 (gtid event 或者 gtid priv event) 同时需要判断此 gtid 事物是否完整。对于官方文档给出的 UNION(@@global.gtid_executed, Retrieved_gtid_set – last_received_GTID)，个人觉得这里的 last_received_GTID 应该是经过判断的，如果完整则不减，如果不完整则减去。

以上是“MySQL 中 slave 端 Retrieved_Gtid_Set 读取改进的示例分析”这篇文章的所有内容，感谢各位的阅读！相信大家都有了一定的了解，希望分享的内容对大家有所帮助，如果还想学习更多知识，欢迎关注丸趣 TV 行业资讯频道！

正文完