<em>Mac</em>Book项目 2009年学校开始实施<em>Mac</em>Book项目,所有师生配备一本<em>Mac</em>Book,并同步更新了校园无线网络。学校每周进行电脑技术更新,每月发送技术支持资料,极大改变了教学及学习方式。因此2011
2021-06-01 09:32:01
最近專案上有大量的字串資料需要儲存到記憶體,並且需要儲存至一定時間,於是自然而然的想到了使用字串壓縮演演算法對“源串”進行壓縮儲存。由此觸發了對一些優秀壓縮演演算法的調研。
字串壓縮,我們通常的需求有幾個,一是高壓縮率,二是壓縮速率高,三是解壓速率高。不過高壓縮率與高壓縮速率是魚和熊掌的關係,不可皆得,優秀的演演算法一般也是採用壓縮率與效能折中的方案。從壓縮率、壓縮速率、解壓速率考慮,zstd與lz4有較好的壓縮與解壓效能,最終選取zstd與lz4進行調研。
zstd是facebook開源的提供高壓縮比的快速壓縮演演算法(參考https://github.com/facebook/zstd),很想了解一下它在壓縮與解壓方面的實際表現。
ZSTD_compress屬於ZSTD的Simple API範疇,只有壓縮級別可以設定。
ZSTD_compress函數原型如下:
size_t ZSTD_compress(void* dst, size_t dstCapacity, const void* src, size_t srcSize, int compressionLevel)
ZSTD_decompress函數原型如下:
size_t ZSTD_decompress( void* dst, size_t dstCapacity, const void* src, size_t compressedSize); 我們先來看看zstd的壓縮與解壓縮範例。
#include <stdio.h> #include <string.h> #include <sys/time.h> #include <malloc.h> #include <zstd.h> #include <iostream> using namespace std; int main() { // compress size_t com_space_size; size_t peppa_pig_text_size; char *com_ptr = NULL; char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: It's only mud."; peppa_pig_text_size = strlen(peppa_pig_buf); com_space_size= ZSTD_compressBound(peppa_pig_text_size); com_ptr = (char *)malloc(com_space_size); if(NULL == com_ptr) { cout << "compress malloc failed" << endl; return -1; } size_t com_size; com_size = ZSTD_compress(com_ptr, com_space_size, peppa_pig_buf, peppa_pig_text_size, ZSTD_fast); cout << "peppa pig text size:" << peppa_pig_text_size << endl; cout << "compress text size:" << com_size << endl; cout << "compress ratio:" << (float)peppa_pig_text_size / (float)com_size << endl << endl; // decompress char* decom_ptr = NULL; unsigned long long decom_buf_size; decom_buf_size = ZSTD_getFrameContentSize(com_ptr, com_size); decom_ptr = (char *)malloc((size_t)decom_buf_size); if(NULL == decom_ptr) { cout << "decompress malloc failed" << endl; return -1; } size_t decom_size; decom_size = ZSTD_decompress(decom_ptr, decom_buf_size, com_ptr, com_size); cout << "decompress text size:" << decom_size << endl; if(strncmp(peppa_pig_buf, decom_ptr, peppa_pig_text_size)) { cout << "decompress text is not equal peppa pig text" << endl; } free(com_ptr); free(decom_ptr); return 0; }
執行結果:
從結果可以發現,壓縮之前的peppa pig文字長度為1827,壓縮後的文字長度為759,壓縮率為2.4,解壓後的長度與壓縮前相等。
另外,上文提到可以調整ZSTD_compress函數的壓縮級別,zstd的預設級別為ZSTD_CLEVEL_DEFAULT = 3,最小值為0,最大值為ZSTD_MAX_CLEVEL = 22。另外也提供一些策略設定,例如 ZSTD_fast, ZSTD_greedy, ZSTD_lazy, ZSTD_lazy2, ZSTD_btlazy2。壓縮級別越高,壓縮率越高,但是壓縮速率越低。
上面探索了zstd的基礎壓縮與解壓方法,接下來再摸索一下zstd的壓縮與解壓縮效能。
測試方法是,使用ZSTD_compress連續壓縮同一段文字並持續10秒,最後得到每一秒的平均壓縮速率。測試壓縮效能的程式碼範例如下:
#include <stdio.h> #include <string.h> #include <sys/time.h> #include <malloc.h> #include <zstd.h> #include <iostream> using namespace std; int main() { int cnt = 0; size_t com_size; size_t com_space_size; size_t peppa_pig_text_size; char *com_ptr = NULL; char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: It's only mud."; timeval st, et; peppa_pig_text_size = strlen(peppa_pig_buf); com_space_size= ZSTD_compressBound(peppa_pig_text_size); gettimeofday(&st, NULL); while(1) { com_ptr = (char *)malloc(com_space_size); com_size = ZSTD_compress(com_ptr, com_space_size, peppa_pig_buf, peppa_pig_text_size, ZSTD_fast); free(com_ptr); cnt++; gettimeofday(&et, NULL); if(et.tv_sec - st.tv_sec >= 10) { break; } } cout << "compress per second:" << cnt/10 << " times" << endl; return 0; }
執行結果:
結果顯示ZSTD的壓縮效能大概在每秒6-7萬次左右,這個結果其實並不是太理想。需要說明的是壓縮效能與待壓縮文字的長度、字元內容也是有關係的。
我們再來探索一下ZSTD的解壓縮效能。與上面的測試方法類似,先對本文進行壓縮,然後連續解壓同一段被壓縮過的資料並持續10秒,最後得到每一秒的平均解壓速率。測試解壓效能的程式碼範例如下:
#include <stdio.h> #include <string.h> #include <sys/time.h> #include <malloc.h> #include <zstd.h> #include <iostream> using namespace std; int main() { int cnt = 0; size_t com_size; size_t com_space_size; size_t peppa_pig_text_size; timeval st, et; char *com_ptr = NULL; char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: It's only mud."; size_t decom_size; char* decom_ptr = NULL; unsigned long long decom_buf_size; peppa_pig_text_size = strlen(peppa_pig_buf); com_space_size= ZSTD_compressBound(peppa_pig_text_size); com_ptr = (char *)malloc(com_space_size); com_size = ZSTD_compress(com_ptr, com_space_size, peppa_pig_buf, peppa_pig_text_size, 1); gettimeofday(&st, NULL); decom_buf_size = ZSTD_getFrameContentSize(com_ptr, com_size); while(1) { decom_ptr = (char *)malloc((size_t)decom_buf_size); decom_size = ZSTD_decompress(decom_ptr, decom_buf_size, com_ptr, com_size); if(decom_size != peppa_pig_text_size) { cout << "decompress error" << endl; break; } free(decom_ptr); cnt++; gettimeofday(&et, NULL); if(et.tv_sec - st.tv_sec >= 10) { break; } } cout << "decompress per second:" << cnt/10 << " times" << endl; free(com_ptr); return 0; }
執行結果:
結果顯示ZSTD的解壓縮效能大概在每秒12萬次左右,解壓效能比壓縮效能高。
zstd提供了一個名為PZSTD的壓縮和解壓工具。PZSTD(parallel zstd),並行壓縮的zstd,是一個使用多執行緒對待壓縮文字進行切片分段,且進行並行壓縮的命令列工具。
其實高版本(v1.4.0及以上)的zstd也提供了指定多執行緒對文字進行並行壓縮的相關API介面,也就是本小節要介紹的zstd高階API用法。下面我們再來探索一下zstd的多執行緒壓縮使用方法。
多執行緒並行壓縮的兩個關鍵API,一個是引數設定API,另一個是壓縮API。
引數設定API的原型是:
size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, int value)
壓縮API的原型是:
size_t ZSTD_compress2(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize)
下面給出zstd並行壓縮的範例demo,通過ZSTD_CCtx_setParameter設定執行緒數為3,即指定宏ZSTD_c_nbWorkers為3,通過ZSTD_compress2壓縮相關文字。另外,為了展示zstd確實使用了多執行緒,需要先讀取一個非常大的檔案,作為zstd的壓縮文字源,儘量使zstd執行較長時間。
#include <stdio.h> #include <string.h> #include <sys/time.h> #include <malloc.h> #include <zstd.h> #include <iostream> using namespace std; int main() { size_t com_size; size_t com_space_size; FILE *fp = NULL; unsigned int file_len; char *com_ptr = NULL; char *file_text_ptr = NULL; fp = fopen("xxxxxx", "r"); if(NULL == fp){ cout << "file open failed" << endl; return -1; } fseek(fp, 0, SEEK_END); file_len = ftell(fp); fseek(fp, 0, SEEK_SET); cout << "file length:" << file_len << endl; // malloc space for file content file_text_ptr = (char *)malloc(file_len); if(NULL == file_text_ptr) { cout << "malloc failed" << endl; return -1; } // malloc space for compress space com_space_size = ZSTD_compressBound(file_len); com_ptr = (char *)malloc(com_space_size); if(NULL == com_ptr) { cout << "malloc failed" << endl; return -1; } // read text from source file fread(file_text_ptr, 1, file_len, fp); fclose(fp); ZSTD_CCtx* cctx; cctx = ZSTD_createCCtx(); // set multi-thread parameter ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, 3); ZSTD_CCtx_setParameter(cctx, ZSTD_c_compressionLevel, ZSTD_btlazy2); com_size = ZSTD_compress2(cctx, com_ptr, com_space_size, file_text_ptr, file_len); free(com_ptr); free(file_text_ptr); return 0; }
執行上述demo,可見zstd確實啟動了3個執行緒對文字進行了並行壓縮。且設定的執行緒數越多,壓縮時間越短,這裡就不詳細展示了,讀者可以自行實驗。
需要說明的是,zstd當前預設編譯單執行緒的庫檔案,要實現多執行緒的API呼叫,需要在make的時候指定編譯引數ZSTD_MULTITHREAD。
另外,zstd還支援執行緒池的方式,執行緒池的函數原型:
POOL_ctx* ZSTD_createThreadPool(size_t numThreads)
執行緒池可以避免在多次、連續壓縮場景時頻繁的去建立執行緒、復原執行緒產生的非必要開銷,使得算力主要開銷在文字壓縮方面。
本篇分享了zstd壓縮與解壓縮使用的基本方法,對壓縮與解壓的效能進行了摸底,最後探索了zstd多執行緒壓縮的使用方法。
從壓縮測試來看,zstd的壓縮比其實已經比較好了,比原文所佔用空間縮小了一半以上,當然壓縮比也跟待壓縮文字的內容有關。
從效能執行結果來看,zstd的壓縮與解壓效能表現比較勉強,我認為zstd在魚(效能)和熊掌(壓縮比)之間更偏向熊掌一些,不過對一些效能要求不太高的,但是要高壓縮比的場景是比較符合的。
多執行緒並行壓縮,在有大文字需要連續多次壓縮的場景下,結合執行緒池可以很好的提升壓縮速率。
以上就是C語言字串壓縮之ZSTD演演算法詳解的詳細內容,更多關於C語言字串壓縮的資料請關注it145.com其它相關文章!
相關文章
<em>Mac</em>Book项目 2009年学校开始实施<em>Mac</em>Book项目,所有师生配备一本<em>Mac</em>Book,并同步更新了校园无线网络。学校每周进行电脑技术更新,每月发送技术支持资料,极大改变了教学及学习方式。因此2011
2021-06-01 09:32:01
综合看Anker超能充系列的性价比很高,并且与不仅和iPhone12/苹果<em>Mac</em>Book很配,而且适合多设备充电需求的日常使用或差旅场景,不管是安卓还是Switch同样也能用得上它,希望这次分享能给准备购入充电器的小伙伴们有所
2021-06-01 09:31:42
除了L4WUDU与吴亦凡已经多次共事,成为了明面上的厂牌成员,吴亦凡还曾带领20XXCLUB全队参加2020年的一场音乐节,这也是20XXCLUB首次全员合照,王嗣尧Turbo、陈彦希Regi、<em>Mac</em> Ova Seas、林渝植等人全部出场。然而让
2021-06-01 09:31:34
目前应用IPFS的机构:1 谷歌<em>浏览器</em>支持IPFS分布式协议 2 万维网 (历史档案博物馆)数据库 3 火狐<em>浏览器</em>支持 IPFS分布式协议 4 EOS 等数字货币数据存储 5 美国国会图书馆,历史资料永久保存在 IPFS 6 加
2021-06-01 09:31:24
开拓者的车机是兼容苹果和<em>安卓</em>,虽然我不怎么用,但确实兼顾了我家人的很多需求:副驾的门板还配有解锁开关,有的时候老婆开车,下车的时候偶尔会忘记解锁,我在副驾驶可以自己开门:第二排设计很好,不仅配置了一个很大的
2021-06-01 09:30:48
不仅是<em>安卓</em>手机,苹果手机的降价力度也是前所未有了,iPhone12也“跳水价”了,发布价是6799元,如今已经跌至5308元,降价幅度超过1400元,最新定价确认了。iPhone12是苹果首款5G手机,同时也是全球首款5nm芯片的智能机,它
2021-06-01 09:30:45