一文教會你pandas plot各種繪圖

2022-03-04 13:01:29

一、介紹

使用pandas.DataFrame的plot方法繪製影象會按照資料的每一列繪製一條曲線，預設按照列columns的名稱在適當的位置展示圖例，比matplotlib繪製節省時間，且DataFrame格式的資料更規範，方便向量化及計算。

DataFrame.plot( )函數：

DataFrame.plot(x=None, y=None, kind='line', ax=None, subplots=False, 
                sharex=None, sharey=False, layout=None, figsize=None, 
                use_index=True, title=None, grid=None, legend=True, 
                style=None, logx=False, logy=False, loglog=False, 
                xticks=None, yticks=None, xlim=None, ylim=None, rot=None, 
                fontsize=None, colormap=None, position=0.5, table=False, yerr=None, 
                xerr=None, stacked=True/False, sort_columns=False, 
                secondary_y=False, mark_right=True, **kwds)

1.1 引數介紹

x和y：表示標籤或者位置，用來指定顯示的索引，預設為None
kind：表示繪圖的型別，預設為line，折線圖
- line：折線圖
- bar/barh：柱狀圖（條形圖），縱向/橫向
- pie：餅狀圖
- hist：直方圖（數值頻率分佈）
- box：箱型圖
- kde：密度圖，主要對柱狀圖新增Kernel 概率密度線
- area：區域圖（面積圖）
- scatter：散點圖
- hexbin：蜂巢圖
ax：子圖，可以理解成第二座標軸，預設None
subplots：是否對列分別作子圖，預設False
sharex：共用x軸刻度、標籤。如果ax為None，則預設為True，如果傳入ax，則預設為False
sharey：共用y軸刻度、標籤
layout：子圖的行列布局，(rows, columns)
figsize：圖形尺寸大小，(width, height)
use_index：用索引做x軸，預設True
title：圖形的標題
grid：圖形是否有網格，預設None
legend：子圖的圖例
style：對每列折線圖設定線的型別，list or dict
logx：設定x軸刻度是否取對數，預設False
logy
loglog：同時設定x，y軸刻度是否取對數，預設False
xticks：設定x軸刻度值，序列形式（比如列表）
yticks
xlim：設定座標軸的範圍。數值，列表或元組（區間範圍）
ylim
rot：軸標籤（軸刻度）的顯示旋轉度數，預設None
fontsize : int, default None#設定軸刻度的字型大小
colormap：設定圖的區域顏色
colorbar：柱子顏色
position：柱形圖的對齊方式，取值範圍[0,1]，預設0.5（中間對齊）
table：圖下新增表，預設False。若為True，則使用DataFrame中的資料繪製表格
yerr：誤差線
xerr
stacked：是否堆積，在折線圖和柱狀圖中預設為False，在區域圖中預設為True
sort_columns：對列名稱進行排序，預設為False
secondary_y：設定第二個y軸（右輔助y軸），預設為False
mark_right : 當使用secondary_y軸時，在圖例中自動用“(right)”標記列標籤，預設True
x_compat：適配x軸刻度顯示，預設為False。設定True可優化時間刻度的顯示

1.2 其他常用說明

color：顏色
s：散點圖大小，int型別
設定x,y軸名稱
- ax.set_ylabel(‘yyy’)
- ax.set_xlabel(‘xxx’)

二、舉例說明

2.1 折線圖 line

1. 基本用法

ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()
ts.plot();

2. 展示多列資料

df = pd.DataFrame(np.random.randn(1000, 4), index=pd.date_range("1/1/2000", periods=1000), columns=list("ABCD"))
df = df.cumsum()
df.plot()

3. 使用x和y引數，繪製一列與另一列的對比

df3 = pd.DataFrame(np.random.randn(1000, 2), columns=["B", "C"]).cumsum()
df3["A"] = pd.Series(list(range(1000)))
df3.plot(x="A", y="B")

4. secondary_y引數，設定第二Y軸及圖例位置

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
df = df.cumsum()
print(df)
# 圖1：其中A列用左Y軸標註，B列用右Y軸標註，二者共用一個X軸
df.A.plot()  # 對A列作圖，同理可對行做圖
df.B.plot(secondary_y=True)  # 設定第二個y軸（右y軸）
# 圖2
ax = df.plot(secondary_y=['A', 'B'])  # 定義column A B使用右Y軸。
# ax（axes）可以理解為子圖，也可以理解成對黑板進行切分，每一個板塊就是一個axes
ax.set_ylabel('CD scale')   # 主y軸標籤
ax.right_ax.set_ylabel('AB scale')  # 第二y軸標籤
ax.legend(loc='upper left')  # 設定圖例的位置
ax.right_ax.legend(loc='upper right')   # 設定第二圖例的位置

5. x_compat引數，X軸為時間刻度的良好展示

ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()
ts.plot(x_compat=True)

6. color引數，設定多組圖形的顏色

df = pd.DataFrame(np.random.randn(1000, 4), index=pd.date_range('1/1/2000', periods=1000),
                  columns=list('ABCD')).cumsum()
df.A.plot(color='red')
df.B.plot(color='blue')
df.C.plot(color='yellow')

2.2 條型圖 bar

DataFrame.plot.bar() 或者 DataFrame.plot(kind=‘bar’)

1. 基本用法

df2 = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])
df2.plot.bar()

2. 引數stacked=True，生成堆積條形圖

df2.plot.bar(stacked=True)

3. 使用barh，生成水平條形圖

df2.plot.barh()

4. 使用rot引數，設定軸刻度的顯示旋轉度數

df2.plot.bar(rot=0)	# 0表示水平顯示

2.3 直方圖 hist

1. 基本使用

df3 = pd.DataFrame(
    {
        "a": np.random.randn(1000) + 1,
        "b": np.random.randn(1000),
        "c": np.random.randn(1000) - 1,
    },
    columns=["a", "b", "c"],
)
# alpha設定透明度
df3.plot.hist(alpha=0.5)
# 設定座標軸顯示負號
plt.rcParams['axes.unicode_minus']=False

2. 直方圖可以使用堆疊，stacked=True。可以使用引數 bins 更改素材箱大小

df3.plot.hist(alpha=0.5,stacked=True, bins=20)

3. 可以使用引數 by 指定關鍵字來繪製分組直方圖

data = pd.Series(np.random.randn(1000))
data.hist(by=np.random.randint(0, 4, 1000), figsize=(6, 4))

2.4 箱型圖 box

箱型圖，用來視覺化每列中值的分佈

.1. 基本使用

範例：這裡有一個箱形圖，代表對[0，1]上的均勻隨機變數的10個觀察結果進行的五次試驗。

df = pd.DataFrame(np.random.rand(10, 5), columns=["A", "B", "C", "D", "E"])
df.plot.box();

2. 箱型圖可以通過引數 color 進行著色

color是dict型別，包含的鍵分別是 boxes, whiskers, medians and caps

color = {
    "boxes": "DarkGreen",
    "whiskers": "DarkOrange",
    "medians": "DarkBlue",
    "caps": "Gray",
}
df.plot.box(color=color, sym="r+")

3. 可以使用引數 vert=False，指定水平方向顯示，預設為True表示垂直顯示

df.plot.box(vert=False)

4. 可以使用boxplot()方法，繪製帶有網格的箱型圖

df = pd.DataFrame(np.random.rand(10, 5))
bp = df.boxplot()

5. 可以使用引數 by 指定關鍵字來繪製分組箱型圖

df = pd.DataFrame(np.random.rand(10, 2), columns=["Col1", "Col2"])
df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
bp = df.boxplot(by="X")

6. 可以使用多個列進行分組

df = pd.DataFrame(np.random.rand(10, 3), columns=["Col1", "Col2", "Col3"])
df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
df["Y"] = pd.Series(["A", "B", "A", "B", "A", "B", "A", "B", "A", "B"])
bp = df.boxplot(column=["Col1", "Col2"], by=["X", "Y"])

2.5 區域圖 area

預設情況下，區域圖為堆疊。要生成區域圖，每列必須全部為正值或全部為負值。

1. 基本使用

df = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])
df.plot.area()

2.6 散點圖 scatter

散點圖需要x和y軸的數位列。這些可以由x和y關鍵字指定。

1. 基本使用

df = pd.DataFrame(np.random.rand(50, 4), columns=["a", "b", "c", "d"])
df["species"] = pd.Categorical(
    ["setosa"] * 20 + ["versicolor"] * 20 + ["virginica"] * 10
)
df.plot.scatter(x="a", y="b")

2. 可以使用引數 ax 和 label 設定多組資料

ax = df.plot.scatter(x="a", y="b", color="DarkBlue", label="Group 1")
df.plot.scatter(x="c", y="d", color="DarkGreen", label="Group 2", ax=ax)

3. 使用引數 c 可以作為列的名稱來為每個點提供顏色，引數s可以指定散點大小

df.plot.scatter(x="a", y="b", c="c", s=50)

4. 如果將一個分類列傳遞給c，那麼將產生一個離散的顏色條

df.plot.scatter(x="a", y="b", c="species", cmap="viridis", s=50)

5. 可以使用DataFrame的一列值作為散點的大小

df.plot.scatter(x="a", y="b", s=df["c"] * 200)

2.7 蜂巢圖 hexbin

如果資料過於密集而無法單獨繪製每個點，則蜂巢圖可能是散點圖的有用替代方法。

df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])
df["b"] = df["b"] + np.arange(1000)
df.plot.hexbin(x="a", y="b", gridsize=25)

2.8 餅型圖 pie

如果您的資料包含任何NaN，則它們將自動填充為0。如果資料中有任何負數，則會引發ValueError

1. 基本使用

series = pd.Series(3 * np.random.rand(4), index=["a", "b", "c", "d"], name="series")
series.plot.pie(figsize=(6, 6))

2. 如果指定subplot =True，則將每個列的餅圖繪製為子圖。預設情況下，每個餅圖中都會繪製一個圖例; 指定legend=False隱藏它。

df = pd.DataFrame(
    3 * np.random.rand(4, 2), index=["a", "b", "c", "d"], columns=["x", "y"]
)
df.plot.pie(subplots=True, figsize=(8, 4))

3. autopct 顯示所佔總數的百分比

series.plot.pie(
    labels=["AA", "BB", "CC", "DD"],
    colors=["r", "g", "b", "c"],
    autopct="%.2f",	
    fontsize=20,
    figsize=(6, 6),
)

三、其他格式

3.1 設定顯示中文標題

df = pd.DataFrame(np.random.rand(5, 3), columns=["a", "b", "c"])
df.plot.bar(title='中文標題測試',rot=0)
# 預設不支援中文 ---修改RC引數，指定字型
plt.rcParams['font.sans-serif'] = 'SimHei'

3.2 設定座標軸顯示負號

df3 = pd.DataFrame(
    {
        "a": np.random.randn(1000) + 1,
        "b": np.random.randn(1000),
        "c": np.random.randn(1000) - 1,
    },
    columns=["a", "b", "c"],
)
df3.plot.hist(alpha=0.5)
# 設定座標軸顯示負號
plt.rcParams['axes.unicode_minus']=False

3.3 使用誤差線 yerr 進行繪圖

範例1：使用與原始資料的標準偏繪製組均值

ix3 = pd.MultiIndex.from_arrays([['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], ['foo', 'foo', 'bar', 'bar', 'foo', 'foo', 'bar', 'bar']], names=['letter', 'word'])
df3 = pd.DataFrame({'data1': [3, 2, 4, 3, 2, 4, 3, 2], 'data2': [6, 5, 7, 5, 4, 5, 6, 5]}, index=ix3) 
# 分組
gp3 = df3.groupby(level=('letter', 'word'))
means = gp3.mean() 
errors = gp3.std() 
means.plot.bar(yerr=errors,rot=0)

範例2：使用非對稱誤差線繪製最小/最大範圍

mins = gp3.min()
maxs = gp3.max()
errors = [[means[c] - mins[c], maxs[c] - means[c]] for c in df3.columns]
means.plot.bar(yerr=errors,capsize=4, rot=0)

3.4 使用 layout 將目標分成多個子圖

df = pd.DataFrame(np.random.randn(1000, 4), index=pd.date_range("1/1/2000", periods=1000), columns=list("ABCD"))
df = df.cumsum()
df.plot(subplots=True, layout=(2, 3), figsize=(6, 6), sharex=False)

3.5 使用 table 繪製表，上圖下表

使用 table=True，繪製表格。圖下新增表

fig, ax = plt.subplots(1, 1, figsize=(7, 6.5))
df = pd.DataFrame(np.random.rand(5, 3), columns=["a", "b", "c"])
ax.xaxis.tick_top()  # 在上方展示x軸
df.plot(table=True, ax=ax)

3.6 使用 colormap 設定圖的區域顏色

在繪製大量列時，一個潛在的問題是，由於預設顏色的重複，很難區分某些序列。為了解決這個問題，DataFrame繪圖支援使用colormap引數，該引數接受Matplotlib的colormap或一個字串，該字串是在Matplotlib中註冊的一個colormap的名稱。在這裡可以看到預設matplotlib顏色對映的視覺化。

df = pd.DataFrame(np.random.randn(1000, 10), index=pd.date_range("1/1/2000", periods=1000))
df = df.cumsum()
df.plot(colormap="cubehelix")

參考文章：https://www.jb51.net/article/188648.htm

總結

到此這篇關於pandas plot各種繪圖的文章就介紹到這了,更多相關pandas plot各種繪圖內容請搜尋it145.com以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援it145.com！

一文教會你pandas plot各種繪圖

目錄

一、介紹

1.1 引數介紹

1.2 其他常用說明

二、舉例說明

2.1 折線圖 line

2.2 條型圖 bar

2.3 直方圖 hist

2.4 箱型圖 box

2.5 區域圖 area

2.6 散點圖 scatter

2.7 蜂巢圖 hexbin

2.8 餅型圖 pie

三、其他格式

3.1 設定顯示中文標題

3.2 設定座標軸顯示負號

3.3 使用誤差線 yerr 進行繪圖

3.4 使用 layout 將目標分成多個子圖

3.5 使用 table 繪製表，上圖下表

3.6 使用 colormap 設定圖的區域顏色

總結

熱門文章