Redis中AOF與RDB持久化策略深入分析

2022-11-28 22:02:32

寫在前面

以下內容是基於Redis 6.2.6 版本整理總結

一、Redis為什麼要持久化

Redis 是一個記憶體資料庫，就是將資料庫中的內容儲存在記憶體中，這與傳統的MySQL，Oracle等關係型資料庫直接將內容儲存到硬碟中相比，記憶體資料庫的讀寫效率比傳統資料庫要快的多（記憶體的讀寫效率遠遠大於硬碟的讀寫效率）。但是記憶體中儲存的缺點就是，一旦斷電或者宕機，那麼記憶體資料庫中的資料將會全部丟失。而且，有時候redis需要重啟，要載入回原來的狀態，也需要持久化重啟之前的狀態。

為了解決這個缺點，Redis提供了將記憶體資料持久化到硬碟，以及用持久化檔案來恢復資料庫資料的功能。Redis 支援兩種形式的持久化，一種是RDB快照（snapshotting），另外一種是AOF（append-only-file）。從Redis4.0版本開始還通過RDB和AOF的混合持久化。

二、Redis的持久化方式

2.1. AOF持久化（Append of file）

OF採用的就是順序追加的方式，對於磁碟來說，順序寫是最快、最友好的方式。AOF檔案儲存的是redis命令協定格式的資料。Redis通過重放AOF檔案，也就是執行AOF檔案裡的命令，來恢復資料。

2.1.1 fsync 系統呼叫

fsync 是系統調動。核心自己的機制，呼叫fysnc把資料從核心緩衝區刷到磁碟。如果想主動刷盤，就write完呼叫一次fysnc。

2.1.2 AOF持久化策略

always 在主執行緒中執行，每次增刪改操作，都要呼叫fsync 落盤，資料最安全，但效率最低
every second 在後臺執行緒(bio_fsync_aof)中執行，會丟1~2s的資料
no 由作業系統決定什麼時候刷盤，不可控

缺點：

對資料庫所有的修改命令（增刪改）都會記錄到AOF檔案，資料冗餘，隨著執行時間增加，AOF檔案會太過龐大，導致恢復速度變慢。比如：set key v1 ，set key v2 ，del key , set key v3，這四條命令都會被記錄。但最終的狀態就是key == v3，其餘的命令就是冗餘的資料。也就是說，我們只需要最後一個狀態即可。

2.1.3 aof_rewrite

redis針對AOF檔案過大的問題，推出了aof_rewrite來優化。aof_rewrite 原理：通過 fork 程序，在子程序中根據當前記憶體中的資料狀態，生成命令協定資料，也就是最新的狀態儲存到aof檔案，避免同一個key的歷史資料冗餘，提升恢復速度。

在重寫aof期間，redis的主程序還在繼續響應使用者端的請求，redis會將寫請求寫到重寫的緩衝區，等到子程序aof持久化結束，給主程序發訊號，主程序再將重寫緩衝區的資料追加到新的aof檔案中。

雖然rewrite後AOF檔案會變小，但aof還是要通過重放的方式恢復資料，需要耗費cpu資源，比較慢。

2.2 RDB快照（redis預設持久化方式）

RDB是把當前記憶體中的資料集快照寫入磁碟RDB檔案，也就是 Snapshot 快照（資料庫中所有鍵值對二進位制資料）。恢復時是將快照檔案直接讀到記憶體裡。也是通過fork出子程序去持久化。Redis沒有專門的載入RDB檔案的命令，Redis伺服器會在啟動時，如果檢測到了RDB檔案就會自動載入RDB檔案。

觸發方式（自動觸發和非自動觸發）

（1）自動觸發

在redis.conf 檔案中，SNAPSHOTTING 的設定選項就是用來設定自動觸發條件。

save：用來設定RDB持久化觸發的條件。save m n 表示 m 秒內，資料存在n次修改時，自動觸發 bgsave (後臺持久化)。

save “” 表示禁用快照；

save 900 1：表示900 秒內如果至少有 1 個 key 的值變化，則儲存；

save 300 10：表示300 秒內如果至少有 10 個 key 的值變化，則儲存；

save 60 10000：表示60 秒內如果至少有 10000 個 key 的值變化，則儲存。

如果你只需要使用Redis的快取功能，不需要持久化，只需要註釋掉所有的save行即可。

stop-writes-on-bgsave-error：預設值為 yes。如果RDB快照開啟，並且最近的一次快照儲存失敗了，Redis會拒絕接收更新操作，以此來提醒使用者資料持久化失敗了，否則這些更新的資料可能會丟失。

rdbcompression：是否啟用RDB快照檔案壓縮儲存，預設是開啟的，當資料量特別大時，壓縮可以節省硬碟空間，但是會增加CPU消耗，可以選擇關閉來節省CPU資源，建議開啟。

rdbchecksum：檔案校驗，預設開啟。在Redis 5.0版本後，新增了校驗功能，用於保證檔案的完整性。開啟這個選項會增加10%左右的效能損耗，如果追求高效能，可以關閉該選項。

dbfilename ：RDB檔名，預設為 dump.rdb

rdb-del-sync-files： Redis主從全量同步時，通過RDB檔案傳輸實現。如果沒有開啟持久化，同步完成後，是否要移除主從同步的RDB檔案，預設為no。

dir：存放RDB和AOF持久化檔案的目錄預設為當前目錄

（2）手動觸發

Redis手動觸發RDB持久化的命令有兩種：

1）save ：該命令會阻塞Redis主程序，在save持久化期間，Redis不能響應處理其他命令，這段時間Redis不可用，可能造成業務的停擺，直至RDB過程完成。一般不用。

2）bgsave：會在主程序fork出子程序進行RDB的持久化。阻塞只發生在fork階段，而大key會導致fork時間增長。

2.3 RDB和AOF混用

RDB借鑑了aof_rewrite的思路，就是rbd檔案寫完，再把重寫緩衝區的資料，追加到rbd檔案的末尾，追加的這部分資料的格式是AOF的命令格式，這就是rdb_aof的混用。

2.4 三種持久化方式比較

AOF 優點：資料可靠，丟失少；缺點：AOF 檔案大，恢復速度慢；
RDB 優點：RDB檔案體積小，資料恢復快。缺點：無法做到實時/秒級持久化，會丟失最後一次快照後的所有資料。每次bgsave執行都需要fork程序，主程序和子程序共用一份記憶體空間，主程序在繼續處理使用者端命令時，採用的時寫時複製技術，只有修改的那部分記憶體會重新複製出一份，更新頁表指向。複製出的那部分，會導致記憶體膨脹。具體膨脹的程度，取決於主程序修改的比例有多大。注意：子程序只是讀取資料，並不修改記憶體中的資料。

三、什麼是大key以及大key對持久化的影響

3.1 什麼是大key

redis 是kv 中的v站用了大量的空間。比如當v的型別是hash、zset，並且裡面儲存了大量的元素，這個v對應的key就是大key。

3.2 fork程序寫時複製原理

在Redis主程序中呼叫fork()函數，建立出子程序。這個子程序在fork()函數返回時，跟主程序的狀態是一模一樣的。包括mm_struct和頁表。此時，他們的頁表都被標記為私有的寫時複製狀態（唯讀狀態）。當某個程序試圖寫某個資料頁時，會觸發防寫，核心會重新為該程序對映一段記憶體，供其讀寫，並將頁表指向這個新的資料頁。

3.3 面試題-大key對持久化有什麼影響

結合不同的持久化方式回答。fsync壓力大，fork時間長。

如果是AOF：always、every second、no aof_rewrite

如果是RDB： rdb_aof

fork是在主程序中執行的，如果fork慢，會影響到主程序的響應。

四、持久化原始碼分析

4.1 RDB持久化

4.1.1 RDB檔案的建立

Redis是通過rdbSave函數來建立RDB檔案的，SAVE 和 BGSAVE 會以不同的方式去呼叫rdbSave。

// src/rdb.c
/* Save the DB on disk. Return C_ERR on error, C_OK on success. */
int rdbSave(char *filename, rdbSaveInfo *rsi) {
    char tmpfile[256];
    char cwd[MAXPATHLEN]; /* Current working dir path for error messages. */
    FILE *fp = NULL;
    rio rdb;
    int error = 0;
    snprintf(tmpfile,256,"temp-%d.rdb", (int) getpid());
    fp = fopen(tmpfile,"w");
    if (!fp) {
        char *cwdp = getcwd(cwd,MAXPATHLEN);
        serverLog(LL_WARNING,
            "Failed opening the RDB file %s (in server root dir %s) "
            "for saving: %s",
            filename,
            cwdp ? cwdp : "unknown",
            strerror(errno));
        return C_ERR;
    }
    rioInitWithFile(&rdb,fp);
    startSaving(RDBFLAGS_NONE);
    if (server.rdb_save_incremental_fsync)
        rioSetAutoSync(&rdb,REDIS_AUTOSYNC_BYTES);
    if (rdbSaveRio(&rdb,&error,RDBFLAGS_NONE,rsi) == C_ERR) {
        errno = error;
        goto werr;
    }
    /* Make sure data will not remain on the OS's output buffers */
    if (fflush(fp)) goto werr;
    if (fsync(fileno(fp))) goto werr;
    if (fclose(fp)) { fp = NULL; goto werr; }
    fp = NULL;
    /* Use RENAME to make sure the DB file is changed atomically only
     * if the generate DB file is ok. */
    if (rename(tmpfile,filename) == -1) {
        char *cwdp = getcwd(cwd,MAXPATHLEN);
        serverLog(LL_WARNING,
            "Error moving temp DB file %s on the final "
            "destination %s (in server root dir %s): %s",
            tmpfile,
            filename,
            cwdp ? cwdp : "unknown",
            strerror(errno));
        unlink(tmpfile);
        stopSaving(0);
        return C_ERR;
    }
    serverLog(LL_NOTICE,"DB saved on disk");
    server.dirty = 0;
    server.lastsave = time(NULL);
    server.lastbgsave_status = C_OK;
    stopSaving(1);
    return C_OK;
werr:
    serverLog(LL_WARNING,"Write error saving DB on disk: %s", strerror(errno));
    if (fp) fclose(fp);
    unlink(tmpfile);
    stopSaving(0);
    return C_ERR;
}

SAVE命令，在Redis主執行緒中執行，如果save時間太長會影響Redis的效能。

void saveCommand(client *c) {
    // 如果已經有子程序在進行RDB持久化
    if (server.child_type == CHILD_TYPE_RDB) {
        addReplyError(c,"Background save already in progress");
        return;
    }
    rdbSaveInfo rsi, *rsiptr;
    rsiptr = rdbPopulateSaveInfo(&rsi);
    // 持久化
    if (rdbSave(server.rdb_filename,rsiptr) == C_OK) {
        addReply(c,shared.ok);
    } else {
        addReplyErrorObject(c,shared.err);
    }
}

BGSAVE命令是通過執行rdbSaveBackground函數，可以看到rdbSave的呼叫時在子程序中。在BGSAVE執行期間，使用者端傳送的SAVE命令會被拒絕，禁止SAVE和BGSAVE同時執行，主要時為了防止主程序和子程序同時執行rdbSave，產生競爭；同理，也不能同時執行兩個BGSAVE，也會產生競爭條件。

/* BGSAVE [SCHEDULE] */
void bgsaveCommand(client *c) {
    int schedule = 0;
    /* The SCHEDULE option changes the behavior of BGSAVE when an AOF rewrite
     * is in progress. Instead of returning an error a BGSAVE gets scheduled. */
    if (c->argc > 1) {
        if (c->argc == 2 && !strcasecmp(c->argv[1]->ptr,"schedule")) {
            schedule = 1;
        } else {
            addReplyErrorObject(c,shared.syntaxerr);
            return;
        }
    }
    rdbSaveInfo rsi, *rsiptr;
    rsiptr = rdbPopulateSaveInfo(&rsi);
    if (server.child_type == CHILD_TYPE_RDB) {
        addReplyError(c,"Background save already in progress");
    } else if (hasActiveChildProcess()) {
        if (schedule) {
            server.rdb_bgsave_scheduled = 1;
            addReplyStatus(c,"Background saving scheduled");
        } else {
            addReplyError(c,
            "Another child process is active (AOF?): can't BGSAVE right now. "
            "Use BGSAVE SCHEDULE in order to schedule a BGSAVE whenever "
            "possible.");
        }
    } else if (rdbSaveBackground(server.rdb_filename,rsiptr) == C_OK) {
        addReplyStatus(c,"Background saving started");
    } else {
        addReplyErrorObject(c,shared.err);
    }
}
int rdbSaveBackground(char *filename, rdbSaveInfo *rsi) {
    pid_t childpid;
    if (hasActiveChildProcess()) return C_ERR;
    server.dirty_before_bgsave = server.dirty;
    server.lastbgsave_try = time(NULL);
	// 子程序
    if ((childpid = redisFork(CHILD_TYPE_RDB)) == 0) {
        int retval;
        /* Child */
        redisSetProcTitle("redis-rdb-bgsave");
        redisSetCpuAffinity(server.bgsave_cpulist);
        retval = rdbSave(filename,rsi);
        if (retval == C_OK) {
            sendChildCowInfo(CHILD_INFO_TYPE_RDB_COW_SIZE, "RDB");
        }
        exitFromChild((retval == C_OK) ? 0 : 1);
    } else {
        /* Parent */
        if (childpid == -1) {
            server.lastbgsave_status = C_ERR;
            serverLog(LL_WARNING,"Can't save in background: fork: %s",
                strerror(errno));
            return C_ERR;
        }
        serverLog(LL_NOTICE,"Background saving started by pid %ld",(long) childpid);
        server.rdb_save_time_start = time(NULL);
        server.rdb_child_type = RDB_CHILD_TYPE_DISK;
        return C_OK;
    }
    return C_OK; /* unreached */
}

4.1.2 RDB檔案的載入

Redis通過rdbLoad函數完成RDB檔案的載入工作。Redis伺服器在RDB的載入過程中會一直阻塞，直到完成載入。

int rdbLoad(char *filename, rdbSaveInfo *rsi, int rdbflags) {
    FILE *fp;
    rio rdb;
    int retval;
    if ((fp = fopen(filename,"r")) == NULL) return C_ERR;
    startLoadingFile(fp, filename,rdbflags);
    rioInitWithFile(&rdb,fp);
    retval = rdbLoadRio(&rdb,rdbflags,rsi);
    fclose(fp);
    stopLoading(retval==C_OK);
    return retval;
}

4.2 AOF持久化

4.2.1 AOF持久化實現

AOF命令追加：當Redis伺服器執行完一個寫命令後，會將該命令以協定格式追加到aof_buf緩衝區的末尾
AOF檔案的寫入和同步：Redis服務是單執行緒的，主要在一個事件迴圈（event loop）中迴圈。Redis中事件分為檔案事件和時間事件，檔案事件負責接收使用者端的命令請求和給使用者端回覆資料，時間事件負責執行定時任務。在一次的事件迴圈結束之前，都會呼叫flushAppendOnlyFile函數，該函數會根據redis.conf組態檔中的持久化策略決定何時將aof_buf緩衝區中的命令資料寫入的AOF檔案。

4.2.2 原始碼分析

// src/server.h
/* Append only defines */
#define AOF_FSYNC_NO 0
#define AOF_FSYNC_ALWAYS 1
#define AOF_FSYNC_EVERYSEC 2
// src/aof.c
void flushAppendOnlyFile(int force) {
    ssize_t nwritten;
    int sync_in_progress = 0;
    mstime_t latency;
	// 如果當前aof_buf緩衝區為空
    if (sdslen(server.aof_buf) == 0) {
        /* Check if we need to do fsync even the aof buffer is empty,
         * because previously in AOF_FSYNC_EVERYSEC mode, fsync is
         * called only when aof buffer is not empty, so if users
         * stop write commands before fsync called in one second,
         * the data in page cache cannot be flushed in time. */
        if (server.aof_fsync == AOF_FSYNC_EVERYSEC &&
            server.aof_fsync_offset != server.aof_current_size &&
            server.unixtime > server.aof_last_fsync &&
            !(sync_in_progress = aofFsyncInProgress())) {
            goto try_fsync;
        } else {
            return;
        }
    }
    if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
        sync_in_progress = aofFsyncInProgress();
    if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {
        /* With this append fsync policy we do background fsyncing.
         * If the fsync is still in progress we can try to delay
         * the write for a couple of seconds. */
        if (sync_in_progress) {
            if (server.aof_flush_postponed_start == 0) {
                /* No previous write postponing, remember that we are
                 * postponing the flush and return. */
                server.aof_flush_postponed_start = server.unixtime;
                return;
            } else if (server.unixtime - server.aof_flush_postponed_start < 2) {
                /* We were already waiting for fsync to finish, but for less
                 * than two seconds this is still ok. Postpone again. */
                return;
            }
            /* Otherwise fall trough, and go write since we can't wait
             * over two seconds. */
            server.aof_delayed_fsync++;
            serverLog(LL_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");
        }
    }
    /* We want to perform a single write. This should be guaranteed atomic
     * at least if the filesystem we are writing is a real physical one.
     * While this will save us against the server being killed I don't think
     * there is much to do about the whole server stopping for power problems
     * or alike */
    if (server.aof_flush_sleep && sdslen(server.aof_buf)) {
        usleep(server.aof_flush_sleep);
    }
    latencyStartMonitor(latency);
    nwritten = aofWrite(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));
    latencyEndMonitor(latency);
    /* We want to capture different events for delayed writes:
     * when the delay happens with a pending fsync, or with a saving child
     * active, and when the above two conditions are missing.
     * We also use an additional event name to save all samples which is
     * useful for graphing / monitoring purposes. */
    if (sync_in_progress) {
        latencyAddSampleIfNeeded("aof-write-pending-fsync",latency);
    } else if (hasActiveChildProcess()) {
        latencyAddSampleIfNeeded("aof-write-active-child",latency);
    } else {
        latencyAddSampleIfNeeded("aof-write-alone",latency);
    }
    latencyAddSampleIfNeeded("aof-write",latency);
    /* We performed the write so reset the postponed flush sentinel to zero. */
    server.aof_flush_postponed_start = 0;
    if (nwritten != (ssize_t)sdslen(server.aof_buf)) {
        static time_t last_write_error_log = 0;
        int can_log = 0;
        /* Limit logging rate to 1 line per AOF_WRITE_LOG_ERROR_RATE seconds. */
        if ((server.unixtime - last_write_error_log) > AOF_WRITE_LOG_ERROR_RATE) {
            can_log = 1;
            last_write_error_log = server.unixtime;
        }
        /* Log the AOF write error and record the error code. */
        if (nwritten == -1) {
            if (can_log) {
                serverLog(LL_WARNING,"Error writing to the AOF file: %s",
                    strerror(errno));
                server.aof_last_write_errno = errno;
            }
        } else {
            if (can_log) {
                serverLog(LL_WARNING,"Short write while writing to "
                                       "the AOF file: (nwritten=%lld, "
                                       "expected=%lld)",
                                       (long long)nwritten,
                                       (long long)sdslen(server.aof_buf));
            }
            if (ftruncate(server.aof_fd, server.aof_current_size) == -1) {
                if (can_log) {
                    serverLog(LL_WARNING, "Could not remove short write "
                             "from the append-only file.  Redis may refuse "
                             "to load the AOF the next time it starts.  "
                             "ftruncate: %s", strerror(errno));
                }
            } else {
                /* If the ftruncate() succeeded we can set nwritten to
                 * -1 since there is no longer partial data into the AOF. */
                nwritten = -1;
            }
            server.aof_last_write_errno = ENOSPC;
        }
        /* Handle the AOF write error. */
        if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
            /* We can't recover when the fsync policy is ALWAYS since the reply
             * for the client is already in the output buffers (both writes and
             * reads), and the changes to the db can't be rolled back. Since we
             * have a contract with the user that on acknowledged or observed
             * writes are is synced on disk, we must exit. */
            serverLog(LL_WARNING,"Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting...");
            exit(1);
        } else {
            /* Recover from failed write leaving data into the buffer. However
             * set an error to stop accepting writes as long as the error
             * condition is not cleared. */
            server.aof_last_write_status = C_ERR;
            /* Trim the sds buffer if there was a partial write, and there
             * was no way to undo it with ftruncate(2). */
            if (nwritten > 0) {
                server.aof_current_size += nwritten;
                sdsrange(server.aof_buf,nwritten,-1);
            }
            return; /* We'll try again on the next call... */
        }
    } else {
        /* Successful write(2). If AOF was in error state, restore the
         * OK state and log the event. */
        if (server.aof_last_write_status == C_ERR) {
            serverLog(LL_WARNING,
                "AOF write error looks solved, Redis can write again.");
            server.aof_last_write_status = C_OK;
        }
    }
    server.aof_current_size += nwritten;
    /* Re-use AOF buffer when it is small enough. The maximum comes from the
     * arena size of 4k minus some overhead (but is otherwise arbitrary). */
    if ((sdslen(server.aof_buf)+sdsavail(server.aof_buf)) < 4000) {
        sdsclear(server.aof_buf);
    } else {
        sdsfree(server.aof_buf);
        server.aof_buf = sdsempty();
    }
try_fsync:
    /* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are
     * children doing I/O in the background. */
    if (server.aof_no_fsync_on_rewrite && hasActiveChildProcess())
        return;
    /* Perform the fsync if needed. */
    if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
        /* redis_fsync is defined as fdatasync() for Linux in order to avoid
         * flushing metadata. */
        latencyStartMonitor(latency);
        /* Let's try to get this data on the disk. To guarantee data safe when
         * the AOF fsync policy is 'always', we should exit if failed to fsync
         * AOF (see comment next to the exit(1) after write error above). */
        if (redis_fsync(server.aof_fd) == -1) {
            serverLog(LL_WARNING,"Can't persist AOF for fsync error when the "
              "AOF fsync policy is 'always': %s. Exiting...", strerror(errno));
            exit(1);
        }
        latencyEndMonitor(latency);
        latencyAddSampleIfNeeded("aof-fsync-always",latency);
        server.aof_fsync_offset = server.aof_current_size;
        server.aof_last_fsync = server.unixtime;
    } else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC &&
                server.unixtime > server.aof_last_fsync)) {
        if (!sync_in_progress) {
            aof_background_fsync(server.aof_fd);
            server.aof_fsync_offset = server.aof_current_size;
        }
        server.aof_last_fsync = server.unixtime;
    }
}

到此這篇關於Redis中AOF與RDB持久化策略深入分析的文章就介紹到這了,更多相關Redis持久化策略內容請搜尋it145.com以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援it145.com！