Python讀取csv檔案做K-means分析詳情

2022-03-30 13:01:03

1.執行環境及資料

Python3.7、PyCharm Community Edition 2021.1.1，win10系統。

使用的庫：matplotlib、numpy、sklearn、pandas等

資料：CSV檔案，包含時間，經緯度，高程等資料

2.基於時間序列的分析2D

讀取時間列和高程做一下分析：

程式碼如下：

from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans, MiniBatchKMeans
import pandas as pd
 
if __name__ == "__main__":
    data = pd.read_csv(r"H:CSDN_Test_DataUseYourTestData.csv")
    x, y = data['Time (sec)'], data['Height (m HAE)']
    n = len(x)
    x = np.array(x)
    x = x.reshape(n, 1)#reshape 為一列
    y = np.array(y)
    y = y.reshape(n, 1)#reshape 為一列
    data = np.hstack((x, y)) #水平合併為兩列
    k = 8  # 設定顏色聚類的類別個數（我們分別設定8，16，32，64，128進行對比）
    cluster = KMeans(n_clusters=k)  # 構造聚類器
    C = cluster.fit_predict(data)
    # C_Image = cluster.fit_predict(data)
    print("訓練總耗時為：%s(s)" % (Trainingtime).seconds)
    plt.figure()
    plt.scatter(data[:, 0], data[:, 1], marker='o', s=2, c=C)
    plt.show()

結果展示：

2.1 2000行資料結果展示

2.2 6950行資料結果展示

2.3 300M,約105萬行資料結果展示

CPU立馬90%以上了。大約1-2分鐘，也比較快了。

markersize有些大了，將markersize改小一些顯示，設定為0.1，點太多還是不明顯。

3.經緯度高程三維座標分類顯示3D-空間點聚類

修改程式碼，讀取相應的列修改為X,Y,Z座標：如下：

from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans, MiniBatchKMeans
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D
 
if __name__ == "__main__":
    data = pd.read_csv(r"H:CSDN_Test_DataUseYourTestData.csv")
    x, y,z = data['Longitude (deg)'],data['Latitude (deg)'],  data['Height (m HAE)']
    n = len(x)
    x = np.array(x)
    x = x.reshape(n, 1)#reshape 為一列
    y = np.array(y)
    y = y.reshape(n, 1)#reshape 為一列
    z = np.array(z)
    z = z.reshape(n, 1)  # reshape 為一列
    data = np.hstack((x, y, z)) #水平合併為兩列
    k = 8  # 設定顏色聚類的類別個數（我們分別設定8，16，32，64，128進行對比）
    cluster = KMeans(n_clusters=k)  # 構造聚類器
    C = cluster.fit_predict(data)
 
    # C_Image = cluster.fit_predict(data)
    print("訓練總耗時為：%s(s)" % (Trainingtime).seconds)
    fig = plt.figure()
    ax = Axes3D(fig)
 
    ax.scatter(data[:, 0], data[:, 1],data[:, 2], s=1, c=C)
    # 繪製圖例
    ax.legend(loc='best')
    # 新增座標軸
    ax.set_zlabel('Z Label', fontdict={'size': 15, 'color': 'red'})
    ax.set_ylabel('Y Label', fontdict={'size': 15, 'color': 'red'})
    ax.set_xlabel('X Label', fontdict={'size': 15, 'color': 'red'})
    plt.show()