Python如何識別 MySQL 中的冗餘索引

2022-10-20 14:02:59

前言

最近在搞標準化巡檢平臺，通過 MySQL 的後設資料分析一些潛在的問題。冗餘索引也是一個非常重要的巡檢目，表中索引過多，會導致表空間佔用較大，索引的數量與表的寫入速度與索引數成線性關係(微秒級)，如果發現有冗餘索引，建議立即稽核刪除。

PS：之前見過一個客戶的資料庫上面竟然建立 300 多個索引！？當時的想法是 “他們在玩排列組合呢” 表寫入非常慢，嚴重影響效能和表維護的複雜度。

指令碼介紹

表結構

下方是演示的表結構：

CREATE TABLE `index_test03` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) NOT NULL,
  `create_time` varchar(20) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uqi_name` (`name`),
  KEY `idx_name` (`name`),
  KEY `idx_name_createtime`(name, create_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

MySQL 後設資料

MySQL 可以通過 information_schema.STATISTICS 表查詢索引資訊：

SELECT * from information_schema.STATISTICS  where TABLE_SCHEMA = 'test02' and TABLE_NAME = 'index_test03';

TABLE_CATALOG	TABLE_SCHEMA	TABLE_NAME	NON_UNIQUE	INDEX_SCHEMA	INDEX_NAME	SEQ_IN_INDEX	COLUMN_NAME	COLLATION	SUB_PART	PACKED	INDEX_TYPE
def	test02	index_test03	0	test02	PRIMARY	1	id	A	NULL	NULL	BTREE
def	test02	index_test03	0	test02	uqi_name	1	name	A	NULL	NULL	BTREE
def	test02	index_test03	1	test02	idx_name	1	name	A	NULL	NULL	BTREE
def	test02	index_test03	1	test02	idx_name_createtime	1	name	A	NULL	NULL	BTREE
def	test02	index_test03	1	test02	idx_name_createtime	2	create_time	A	NULL	NULL	BTREE

指令碼通過獲得 STATISTICS 表中的索引資訊來分析表中是否存在冗餘索引，分析粒度為表級別。

DEMO 演示

需要使用 pandas 模組。

import pandas as pd

df_table_level = pd.read_excel('/Users/cooh/Desktop/STATISTICS.xlsx')

table_indexes = df_table_level['INDEX_NAME'].drop_duplicates().tolist()

_indexes = list()
for index_name in table_indexes:
    index_info = {'index_cols': df_table_level[df_table_level['INDEX_NAME'] == index_name]['COLUMN_NAME'].tolist(),
                  'non_unique': df_table_level[df_table_level['INDEX_NAME'] == index_name]['NON_UNIQUE'].tolist()[0],
                  'index_name': index_name
                  }
    _indexes.append(index_info)

content = ''
election_dict = {i['index_name']: 0 for i in _indexes}

while len(_indexes) > 0:
    choice_index_1 = _indexes.pop(0)

    for choice_index_2 in _indexes:
        # 對比兩個索引欄位的個數，使用欄位小的進行迭代
        min_len = min([len(choice_index_1['index_cols']), len(choice_index_2['index_cols'])])

        # 獲得相似欄位的個資料
        similarity_col = 0
        for i in range(min_len):
            # print(i)
            if choice_index_1['index_cols'][i] == choice_index_2['index_cols'][i]:
                similarity_col += 1

        # 然後進行邏輯判斷
        if similarity_col == 0:
            # print('毫無冗餘')
            pass
        else:
            # 兩個索引的欄位包含內容都相同，說明兩個索引完全相同，接下來就需要從中選擇一個刪除
            if len(choice_index_1['index_cols']) == similarity_col and len(
                    choice_index_2['index_cols']) == similarity_col:
                # 等於 0 表示有唯一約束
                if choice_index_1['non_unique'] == 1:
                    content += '索引 {0} 與索引 {1} 重複, '.format(choice_index_2['index_name'], choice_index_1['index_name'])
                    election_dict[choice_index_1['index_name']] += 1
                elif choice_index_2['non_unique'] == 1:
                    content += '索引 {0} 與索引 {1} 重複, '.format(choice_index_1['index_name'], choice_index_2['index_name'])
                    election_dict[choice_index_2['index_name']] += 1
                else:
                    content += '索引 {0} 與索引 {1} 重複, '.format(choice_index_2['index_name'], choice_index_1['index_name'])
                    election_dict[choice_index_1['index_name']] += 1

            elif len(choice_index_1['index_cols']) == similarity_col and choice_index_1['non_unique'] != 0:
                content += '索引 {0} 與索引 {1} 重複, '.format(choice_index_2['index_name'], choice_index_1['index_name'])
                election_dict[choice_index_1['index_name']] += 1

            elif len(choice_index_2['index_cols']) == similarity_col and choice_index_1['non_unique'] != 0:
                content += '索引 {0} 與索引 {1} 重複, '.format(choice_index_1['index_name'], choice_index_2['index_name'])
                election_dict[choice_index_2['index_name']] += 1

redundancy_indexes = list()
for _k_name, _vote in election_dict.items():
    if _vote > 0:
        redundancy_indexes.append(_k_name)

content += '建議刪除索引：{0}'.format(', '.join(redundancy_indexes))

print(content)

輸出結果：

索引 uqi_name 與索引 idx_name 重複, 索引 idx_name_createtime 與索引 idx_name 重複, 建議刪除索引：idx_name

SQL 查詢冗餘索引

MySQL 5.7 是可以直接通過 sys 後設資料庫中的檢視來查冗餘索引的，但是雲上 RDS 使用者看不到 sys 庫。所以才被迫寫這個指令碼，因為範例太多了，一個一個看不現實。如果你是自建的 MySQL，就不用費那麼大勁了，直接使用下面 SQL 來統計。

select * from sys.schema_redundant_indexes;

後記

刪除索引屬於高危操作，刪除前需要多次 check 後再刪除。上面是一個 demo 可以包裝成函數，使用 pandas 以表為粒度傳入資料，就可以嵌入到程式中。有問題歡迎評論溝通。

到此這篇關於Python 識別 MySQL 中的冗餘索引的文章就介紹到這了,更多相關MySQL冗餘索引內容請搜尋it145.com以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援it145.com！

Python如何識別 MySQL 中的冗餘索引

目錄

前言

指令碼介紹

表結構

MySQL 後設資料

DEMO 演示

SQL 查詢冗餘索引

後記

熱門文章