mysql中order by和limit混用出现的问题

mysql中order by和limit混用出现的问题

硅谷探秘者

2021-10-14发表数据库 0 0 1358

在Mysql中我们常常用order by来进行排序，使用limit来进行分页，当需要先排序后分页时我们往往使用类似的写法select * from table order by column limt M,N。但是这种写法却隐藏着较深的使用陷阱。在排序字段有数据重复的情况下，会很容易出现排序结果与预期不一致的问题。

一、案例

mysql版本：

mysql> select version();
+------------+
| version()  |
+------------+
| 5.7.35-log |
+------------+
1 row in set (0.00 sec)

表结构:

CREATE TABLE `user` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `create_time` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=90 DEFAULT CHARSET=utf8;

数据

INSERT INTO `test`.`user` (`id`, `name`, `create_time`) VALUES (1, '1', '2017-04-01 00:01:00');
INSERT INTO `test`.`user` (`id`, `name`, `create_time`) VALUES (2, '1', '2017-04-01 00:00:00');
INSERT INTO `test`.`user` (`id`, `name`, `create_time`) VALUES (3, '1', '2017-04-01 00:00:00');
INSERT INTO `test`.`user` (`id`, `name`, `create_time`) VALUES (4, '2', '2017-04-01 00:00:00');
INSERT INTO `test`.`user` (`id`, `name`, `create_time`) VALUES (5, '3', '2017-04-01 00:00:00');
INSERT INTO `test`.`user` (`id`, `name`, `create_time`) VALUES (6, '6', '2017-04-01 00:00:00');
INSERT INTO `test`.`user` (`id`, `name`, `create_time`) VALUES (7, '7', '2017-04-01 00:00:00');
INSERT INTO `test`.`user` (`id`, `name`, `create_time`) VALUES (8, '8', '2017-04-01 00:00:00');

数据特点，大部分数据create_time都相同。
现在想根据创建时间升序查询user表，并且分页查询，每页2条，那很容易写出sql为：

select * from user order by create_time limit M,2;

而实际查询过程中会发现分页会出现重复数据。
第一页数据：

mysql> select * from user order by create_time limit 2;
+----+------+---------------------+
| id | name | create_time         |
+----+------+---------------------+
|  8 | 8    | 2017-04-01 00:00:00 |
|  2 | 1    | 2017-04-01 00:00:00 |
+----+------+---------------------+
2 rows in set (0.00 sec)

第四页数据：

mysql> select * from user order by create_time limit 6,2;
+----+------+---------------------+
| id | name | create_time         |
+----+------+---------------------+
|  8 | 8    | 2017-04-01 00:00:00 |
|  1 | 1    | 2017-04-01 00:01:00 |
+----+------+---------------------+
2 rows in set (0.00 sec)

发现第一页和第四页出现了重复的数据。

二、分析

官方文档：https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html

从官方文档得知，这其实是mysql对limit做的优化。

官方文档里提到，如果你将Limit row_count与order by混用，mysql会找到排序的row_count行后立马返回，而不是排序整个查询结果再返回。如果是通过索引排序，会非常快；如果是文件排序，所有匹配查询的行（不带Limit的）都会被选中，被选中的大多数或者全部会被排序，直到limit要求的row_count被找到了。如果limit要求的row_count行一旦被找到，Mysql就不会排序结果集中剩余的行了。

查看执行计划：

mysql> explain select * from user order by create_time limit 6,2;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
|  1 | SIMPLE      | user  | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    8 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
1 row in set, 1 warning (0.01 sec)

从执行计划中可以看出，该sql用的是文件排序，排序字段没有添加任何索引。
为什么文件排序会出现这种问题呢？主要还是取决与filesort的排序算法。filesort采用的是快速排序和堆排序，而这两种排序都是不稳定的排序。所以排序值相同的时候无法保证排序后的顺序，所以也无法保证排序后结果集的顺序。

三、解决方案

官方也给出了解决方案，可以在ORDER BY子句中包含额外的列以使顺序具有确定性。例如，如果id值是唯一的，您可以通过如下排序：

mysql> select * from user order by create_time,id limit 2;
+----+------+---------------------+
| id | name | create_time         |
+----+------+---------------------+
|  2 | 1    | 2017-04-01 00:00:00 |
|  3 | 1    | 2017-04-01 00:00:00 |
+----+------+---------------------+
2 rows in set (0.00 sec)

另外也可以通过给排序字段加索引来避免文件排序。

fixed

没有一个冬天不可逾越，没有一个春天不会来临。最慢的步伐不是跬步，而是徘徊，最快的脚步不是冲刺，而是坚持。