<em>Mac</em>Book项目 2009年学校开始实施<em>Mac</em>Book项目,所有师生配备一本<em>Mac</em>Book,并同步更新了校园无线网络。学校每周进行电脑技术更新,每月发送技术支持资料,极大改变了教学及学习方式。因此2011
2021-06-01 09:32:01
pandas官方檔案:https://pandas.pydata.org/pandas-docs/stable/?v=20190307135750
pandas基於Numpy,可以看成是處理文字或者表格資料。
pandas中有兩個主要的資料結構,其中Series資料結構類似於Numpy中的一維陣列,DataFrame類似於多維表格資料結構。
pandas是python資料分析的核心模組。它主要提供了五大功能:
Series是一種類似於一維陣列的物件,由一組資料和一組與之相關的資料標籤(索引)組成。
Series比較像列表(陣列)和字典的結合體
import numpy as np import pandas as pd df = pd.Series(0, index=['a', 'b', 'c', 'd']) print(df) # a 0 # b 0 # c 0 # d 0 # dtype: int64 print(df.values) # 值 # [0 0 0 0] print(df.index) # 索引 # Index(['a', 'b', 'c', 'd'], dtype='object')
import numpy as np import pandas as pd df = pd.Series(np.array([1, 2, 3, 4, np.nan]), index=['a', 'b', 'c', 'd', 'e']) # 1、從ndarray建立Series print(df) # a 1.0 # b 2.0 # c 3.0 # d 4.0 # e NaN # dtype: float64 df = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': np.nan}) # 2、也可以從字典建立Series dates = pd.date_range('20190101', periods=6, freq='M') print(type(dates)) # print(dates) # DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30', # '2019-05-31', '2019-06-30'], # dtype='datetime64[ns]', freq='M') df=pd.Series(0,index=dates) # 3、時間序列索引 print(df) # 2019-01-31 0 # 2019-02-28 0 # 2019-03-31 0 # 2019-04-30 0 # 2019-05-31 0 # 2019-06-30 0 # Freq: M, dtype: int64
產生時間物件陣列:date_range引數詳解:
print(df ** 2) # 3、與標量運算 # a 1.0 # b 4.0 # c 9.0 # d 16.0 # e NaN # dtype: float64 print(df + df) # 4、兩個Series運算 # a 2.0 # b 4.0 # c 6.0 # d 8.0 # e NaN # dtype: float64 print(df[0] ) # 5、數位索引; 1.0 print(df[[0, 1, 2]]) # 行索引 # a 1.0 # b 2.0 # c 3.0 # dtype: float64 print(df['a'] ) # 6、鍵索引(行標籤) ;1.0 print(df[['b','c']]) print('a' in df) # 7、in運算;True print(df[0:2] ) # 8、切片 # a 1.0 # b 2.0 # dtype: float64 print(np.sin(df)) # 9、通用函數 # a 0.841471 # b 0.909297 # c 0.141120 # d -0.756802 # e NaN # dtype: float64 print(df[df > 1] ) # 10、布林值過濾 # b 2.0 # c 3.0 # d 4.0 # dtype: float64
df = pd.Series([1, 2, 3, 4, np.nan], index=['a', 'b', 'c', 'd', 'e']) print(df) # a 1.0 # b 2.0 # c 3.0 # d 4.0 # e NaN # dtype: float64 print(df.dropna() ) # 1、過濾掉值為NaN的行 # a 1.0 # b 2.0 # c 3.0 # d 4.0 # dtype: float64 print(df.fillna(5) ) # 2、用指定值填充缺失資料 # a 1.0 # b 2.0 # c 3.0 # d 4.0 # e 5.0 # dtype: float64 print(df.isnull() ) # 3、返回布林陣列,缺失值對應為True # a False # b False # c False # d False # e True # dtype: bool print(df.notnull() ) # 4、返回布林陣列,缺失值對應為False # a True # b True # c True # d True # e False # dtype: bool
DataFrame是一個表格型的資料結構,含有一組有序的列。
DataFrame可以被看做是由Series組成的字典,並且共用一個索引。
import numpy as np import pandas as pd df1 = pd.DataFrame(np.zeros((3, 4))) # 建立一個三行四列的DataFrame print(df1) # 0 1 2 3 # 0 0.0 0.0 0.0 0.0 # 1 0.0 0.0 0.0 0.0 # 2 0.0 0.0 0.0 0.0 dates = pd.date_range('20190101', periods=6, freq='M') np.random.seed(1) arr = 10 * np.random.randn(6, 4) print(arr) # [[ 16.24345364 -6.11756414 -5.28171752 -10.72968622] # [ 8.65407629 -23.01538697 17.44811764 -7.61206901] # [ 3.19039096 -2.49370375 14.62107937 -20.60140709] # [ -3.22417204 -3.84054355 11.33769442 -10.99891267] # [ -1.72428208 -8.77858418 0.42213747 5.82815214] # [-11.00619177 11.4472371 9.01590721 5.02494339]] df = pd.DataFrame(arr, index=dates, columns=['c1', 'c2', 'c3', 'c4']) # 自定義index和column print(df) # c1 c2 c3 c4 # 2019-01-31 16.243454 -6.117564 -5.281718 -10.729686 # 2019-02-28 8.654076 -23.015387 17.448118 -7.612069 # 2019-03-31 3.190391 -2.493704 14.621079 -20.601407 # 2019-04-30 -3.224172 -3.840544 11.337694 -10.998913 # 2019-05-31 -1.724282 -8.778584 0.422137 5.828152 # 2019-06-30 -11.006192 11.447237 9.015907 5.024943
print(df.dtypes) # 1、檢視資料型別 # 0 float64 # 1 float64 # 2 float64 # 3 float64 # dtype: object print(df.index) # 2、檢視行索引 # DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30', # '2019-05-31', '2019-06-30'], # dtype='datetime64[ns]', freq='M') print(df.columns) # 3、檢視各列的標籤 # Index(['c1', 'c2', 'c3', 'c4'], dtype='object') print(df.values) # 4、檢視資料框內的資料,也即不含行標籤和列頭的資料 # [[ 16.24345364 -6.11756414 -5.28171752 -10.72968622] # [ 8.65407629 -23.01538697 17.44811764 -7.61206901] # [ 3.19039096 -2.49370375 14.62107937 -20.60140709] # [ -3.22417204 -3.84054355 11.33769442 -10.99891267] # [ -1.72428208 -8.77858418 0.42213747 5.82815214] # [-11.00619177 11.4472371 9.01590721 5.02494339]] print(df.describe()) # 5、檢視資料每一列的極值,均值,中位數,只可用於數值型資料 # c1 c2 c3 c4 # count 6.000000 6.000000 6.000000 6.000000 # mean 2.022213 -5.466424 7.927203 -6.514830 # std 9.580084 11.107772 8.707171 10.227641 # min -11.006192 -23.015387 -5.281718 -20.601407 # 25% -2.849200 -8.113329 2.570580 -10.931606 # 50% 0.733054 -4.979054 10.176801 -9.170878 # 75% 7.288155 -2.830414 13.800233 1.865690 # max 16.243454 11.447237 17.448118 5.828152 print(df.T) # 6、transpose轉置,也可用T來操作 # 2019-01-31 2019-02-28 2019-03-31 2019-04-30 2019-05-31 2019-06-30 # c1 16.243454 8.654076 3.190391 -3.224172 -1.724282 -11.006192 # c2 -6.117564 -23.015387 -2.493704 -3.840544 -8.778584 11.447237 # c3 -5.281718 17.448118 14.621079 11.337694 0.422137 9.015907 # c4 -10.729686 -7.612069 -20.601407 -10.998913 5.828152 5.024943 print(df.sort_index(axis=0)) # 7、排序,axis=0 可按行標籤排序輸出; 按行標籤][2019-01-01, 2019-01-02...]從大到小排序 # c1 c2 c3 c4 # 2019-01-31 16.243454 -6.117564 -5.281718 -10.729686 # 2019-02-28 8.654076 -23.015387 17.448118 -7.612069 # 2019-03-31 3.190391 -2.493704 14.621079 -20.601407 # 2019-04-30 -3.224172 -3.840544 11.337694 -10.998913 # 2019-05-31 -1.724282 -8.778584 0.422137 5.828152 # 2019-06-30 -11.006192 11.447237 9.015907 5.024943 print(df.sort_index(axis=1)) # 7、排序,axis=1 可按列頭標籤排序輸出;按列標籤[c1, c2, c3, c4從大到小排序 # c1 c2 c3 c4 # 2019-01-31 16.243454 -6.117564 -5.281718 -10.729686 # 2019-02-28 8.654076 -23.015387 17.448118 -7.612069 # 2019-03-31 3.190391 -2.493704 14.621079 -20.601407 # 2019-04-30 -3.224172 -3.840544 11.337694 -10.998913 # 2019-05-31 -1.724282 -8.778584 0.422137 5.828152 # 2019-06-30 -11.006192 11.447237 9.015907 5.024943 print(df.sort_values(by='c2')) # 8、按資料值來排序 ;按c2列的值從大到小排序 # c1 c2 c3 c4 # 2019-02-28 8.654076 -23.015387 17.448118 -7.612069 # 2019-05-31 -1.724282 -8.778584 0.422137 5.828152 # 2019-01-31 16.243454 -6.117564 -5.281718 -10.729686 # 2019-04-30 -3.224172 -3.840544 11.337694 -10.998913 # 2019-03-31 3.190391 -2.493704 14.621079 -20.601407 # 2019-06-30 -11.006192 11.447237 9.015907 5.024943
print(df['c2']) # 1、 通過columns標籤取值 # 2019-01-31 -6.117564 # 2019-02-28 -23.015387 # 2019-03-31 -2.493704 # 2019-04-30 -3.840544 # 2019-05-31 -8.778584 # 2019-06-30 11.447237 # Freq: M, Name: c2, dtype: float64 print(df[['c2', 'c3']]) # c2 c3 # 2019-01-31 -6.117564 -5.281718 # 2019-02-28 -23.015387 17.448118 # 2019-03-31 -2.493704 14.621079 # 2019-04-30 -3.840544 11.337694 # 2019-05-31 -8.778584 0.422137 # 2019-06-30 11.447237 9.015907 print(df[0:3]) # 2、 通過columns索引取值 # c1 c2 c3 c4 # 2019-01-31 16.243454 -6.117564 -5.281718 -10.729686 # 2019-02-28 8.654076 -23.015387 17.448118 -7.612069 # 2019-03-31 3.190391 -2.493704 14.621079 -20.601407 print(df.loc['20200228':'20200430']) # 3、loc 通過行標籤取值: # c1 c2 c3 c3 # 2020-02-29 8.654076 -23.015387 17.448118 -7.612069 # 2020-03-31 3.190391 -2.493704 14.621079 -20.601407 # 2020-04-30 -3.224172 -3.840544 11.337694 -10.998913 print(df.iloc[1:3]) # 4、iloc 通過行索引選擇資料,取第二行到三行。 # c1 c2 c3 c3 # 2020-02-29 8.654076 -23.015387 17.448118 -7.612069 # 2020-03-31 3.190391 -2.493704 14.621079 -20.601407 print(df.iloc[2, 1]) # 第三行第二列值:-2.493703754774101 print(df.iloc[1:4, 1:4]) # 第 2-4行與第2-4列: # c2 c3 c4 # 2019-02-28 -23.015387 17.448118 -7.612069 # 2019-03-31 -2.493704 14.621079 -20.601407 # 2019-04-30 -3.840544 11.337694 -10.998913 print(df['c3'] > 10) # 5、 使用邏輯判斷取值 # 2020-01-31 False # 2020-02-29 True # 2020-03-31 True # 2020-04-30 True # 2020-05-31 False # 2020-06-30 False # Freq: M, Name: c3, dtype: bool print(df[df['c3'] > 10]) # 5、 使用邏輯判斷取值 # c1 c2 c3 c4 # 2020-02-29 8.654076 -23.015387 17.448118 -7.612069 # 2020-03-31 3.190391 -2.493704 14.621079 -20.601407 # 2020-04-30 -3.224172 -3.840544 11.337694 -10.998913 print(df[(df['c1'] > 0) & (df['c2'] > -8)]) # c1 c2 c3 c4 # 2019-01-31 16.243454 -6.117564 -5.281718 -10.729686 # 2019-03-31 3.190391 -2.493704 14.621079 -20.601407
df.iloc[1:3]=5 # 將2-3行的值設為5 print(df) # c1 c2 c3 c4 # 2020-01-31 16.243454 -6.117564 -5.281718 -10.729686 # 2020-02-29 5.000000 5.000000 5.000000 5.000000 # 2020-03-31 5.000000 5.000000 5.000000 5.000000 # 2020-04-30 -3.224172 -3.840544 11.337694 -10.998913 # 2020-05-31 -1.724282 -8.778584 0.422137 5.828152 df.iloc[0:3, 0:2] = 0 # 將1-3行1-2列的值設為0 print(df) # c1 c2 c3 c4 # 2019-01-31 0.000000 0.000000 -5.281718 -10.729686 # 2019-02-28 0.000000 0.000000 17.448118 -7.612069 # 2019-03-31 0.000000 0.000000 14.621079 -20.601407 # 2019-04-30 -3.224172 -3.840544 11.337694 -10.998913 # 2019-05-31 -1.724282 -8.778584 0.422137 5.828152 # 2019-06-30 -11.006192 11.447237 9.015907 5.024943 # 針對行做處理 df[df['c3'] > 10] = 100 # 將C3列的大於10的行數值設為0 print(df) # c1 c2 c3 c4 # 2019-01-31 0.000000 0.000000 -5.281718 -10.729686 # 2019-02-28 100.000000 100.000000 100.000000 100.000000 # 2019-03-31 100.000000 100.000000 100.000000 100.000000 # 2019-04-30 100.000000 100.000000 100.000000 100.000000 # 2019-05-31 -1.724282 -8.778584 0.422137 5.828152 # 2019-06-30 -11.006192 11.447237 9.015907 5.024943 # 針對行做處理 df = df.astype(np.int32) df[df['c3'].isin([100])] = 1000 # 將C3列的等於100的行數值設為1000 print(df) # c1 c2 c3 c4 # 2019-01-31 0 0 -5 -10 # 2019-02-28 1000 1000 1000 1000 # 2019-03-31 1000 1000 1000 1000 # 2019-04-30 1000 1000 1000 1000 # 2019-05-31 -1 -8 0 5 # 2019-06-30 -11 11 9 5
print(df.isnull()) # c1 c2 c3 c4 # 0 False True False False # 1 False False False False # 2 False False True False # 3 False False False False # 4 False False False False # 5 False False False True # 6 True True True True print(df.isnull().sum()) # 1、通過在isnull()方法後使用sum()方法即可獲得該資料集某個特徵含有多少個缺失值 # c1 1 # c2 2 # c3 2 # c4 2 # dtype: int64 print(df.dropna(axis=0)) # 2、axis=0刪除有NaN值的行 # c1 c2 c3 c4 # 1 4.9 3.0 1.4 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 print(df.dropna(axis=1)) # 3、axis=1刪除有NaN值的列 # Empty DataFrame # Columns: [] # Index: [0, 1, 2, 3, 4, 5, 6] print(df.dropna(how='all')) # 4、刪除全為NaN值得行或列 # c1 c2 c3 c4 # 0 5.1 NaN 1.4 0.2 # 1 4.9 3.0 1.4 0.2 # 2 4.7 3.2 NaN 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 # 5 6.9 3.1 4.9 NaN print(df.dropna(thresh=4)) #5、 保留至少有4個非NaN資料的行,刪除行不為4個值的, # c1 c2 c3 c4 # 1 4.9 3.0 1.4 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 print(df.dropna(subset=['c2'])) # 6、刪除c2中有NaN值的行 # c1 c2 c3 c4 # 1 4.9 3.0 1.4 0.2 # 2 4.7 3.2 NaN 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 # 5 6.9 3.1 4.9 NaN print(df.fillna(value=10)) # 7、用指定值填充nan值 # c1 c2 c3 c4 # 0 5.1 10.0 1.4 0.2 # 1 4.9 3.0 1.4 0.2 # 2 4.7 3.2 10.0 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 # 5 6.9 3.1 4.9 10.0 # 6 10.0 10.0 10.0 10.0
print(df.isnull()) # c1 c2 c3 c4 # 0 False True False False # 1 False False False False # 2 False False True False # 3 False False False False # 4 False False False False # 5 False False False True # 6 True True True True print(df.isnull().sum()) # 1、通過在isnull()方法後使用sum()方法即可獲得該資料集某個特徵含有多少個缺失值 # c1 1 # c2 2 # c3 2 # c4 2 # dtype: int64 print(df.dropna(axis=0)) # 2、axis=0刪除有NaN值的行 # c1 c2 c3 c4 # 1 4.9 3.0 1.4 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 print(df.dropna(axis=1)) # 3、axis=1刪除有NaN值的列 # Empty DataFrame # Columns: [] # Index: [0, 1, 2, 3, 4, 5, 6] print(df.dropna(how='all')) # 4、刪除全為NaN值得行或列 # c1 c2 c3 c4 # 0 5.1 NaN 1.4 0.2 # 1 4.9 3.0 1.4 0.2 # 2 4.7 3.2 NaN 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 # 5 6.9 3.1 4.9 NaN print(df.dropna(thresh=4)) #5、 保留至少有4個非NaN資料的行,刪除行不為4個值的, # c1 c2 c3 c4 # 1 4.9 3.0 1.4 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 print(df.dropna(subset=['c2'])) # 6、刪除c2中有NaN值的行 # c1 c2 c3 c4 # 1 4.9 3.0 1.4 0.2 # 2 4.7 3.2 NaN 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 # 5 6.9 3.1 4.9 NaN print(df.fillna(value=10)) # 7、用指定值填充nan值 # c1 c2 c3 c4 # 0 5.1 10.0 1.4 0.2 # 1 4.9 3.0 1.4 0.2 # 2 4.7 3.2 10.0 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 # 5 6.9 3.1 4.9 10.0 # 6 10.0 10.0 10.0 10.0
import pandas as pd from io import StringIO test_data = ''' 5.1,,1.4,0.2 4.9,3.0,1.4,0.2 4.7,3.2,,0.2 7.0,3.2,4.7,1.4 6.4,3.2,4.5,1.5 6.9,3.1,4.9, ,,, ''' test_data = StringIO(test_data) df = pd.read_csv(test_data, header=None) df.columns = ['c1', 'c2', 'c3', 'c4'] print(df) # c1 c2 c3 c4 # 0 5.1 NaN 1.4 0.2 # 1 4.9 3.0 1.4 0.2 # 2 4.7 3.2 NaN 0.2 # 3 7.0 3.2 4.7 1.4 # 4 6.4 3.2 4.5 1.5 # 5 6.9 3.1 4.9 NaN # 6 NaN NaN NaN NaN
pandas的讀寫Excel需要依賴xlrd模組,所以我們需要去安裝一下, 命令:pip install xlrd
使用df = pd.read_excel(filename)讀取檔案,使用df.to_excel(filename)儲存檔案。
df = pd.read_excel(filename)
讀取檔案匯入資料函數主要引數:
df.to_excel(filename)
寫入檔案函數的主要引數:
import pandas as pd import numpy as np df = pd.read_excel("http://pbpython.com/extras/excel-comp-data.xlsx") print(df.head()) print(len(df.index)) # 行數 (不包含表頭,且一下均如此) print(df.index.values) # 行索引 print(len(df.columns)) # 列數 print(df.columns.values) # 列索引 data = df.loc[0].values # 表示第0行資料 data = df.loc[[1, 2]].values # 讀取多行資料(這裡是第1行和第2行) data = df.iloc[:, 1].values # 讀第1列資料 data = df.iloc[:, [1, 2]].values # 讀取多列資料(這裡是第1列和第2列) data = df.iloc[1, 2] # 讀取指定單元格資料(這裡是第1行第一列資料) data = df.iloc[[1, 2], [1, 2]].values # 讀取多行多列資料(第1,2行1,2列的資料) # 任務:輸出滿足成績大於等於90的資料 temp = [] for i in range(len(df.index.values)): if df.iloc[i, 3] >= 90: temp.append(df.iloc[i].values) df2 = pd.DataFrame(data=temp, columns=df.columns.values) writer = pd.ExcelWriter('out_test.xlsx')# 不寫index會輸出索引 df2.to_excel(writer, 'Sheet', index=False) writer.save()
import pandas as pd strtext = '[{"ttery":"min","issue":"20130801-3391","code":"8,4,5,2,9","code1":"297734529","code2":null,"time":1013395466000}, {"ttery":"min","issue":"20130801-3390","code":"7,8,2,1,2","code1":"298058212","code2":null,"time":1013395406000}, {"ttery":"min","issue":"20130801-3389","code":"5,9,1,2,9","code1":"298329129","code2":null,"time":1013395346000}, {"ttery":"min","issue":"20130801-3388","code":"3,8,7,3,3","code1":"298588733","code2":null,"time":1013395286000}, {"ttery":"min","issue":"20130801-3387","code":"0,8,5,2,7","code1":"298818527","code2":null,"time":1013395226000}]' df = pd.read_json(strtext, orient='records') print(df) # ttery issue code code1 code2 time # 0 min 20130801-3391 8,4,5,2,9 297734529 NaN 1013395466000 # 1 min 20130801-3390 7,8,2,1,2 298058212 NaN 1013395406000 # 2 min 20130801-3389 5,9,1,2,9 298329129 NaN 1013395346000 # 3 min 20130801-3388 3,8,7,3,3 298588733 NaN 1013395286000 # 4 min 20130801-3387 0,8,5,2,7 298818527 NaN 1013395226000 df = pd.read_json(strtext, orient='records') df.to_excel('pandas處理json.xlsx', index=False, columns=["ttery", "issue", "code", "code1", "code2", "time"])
orient引數的五種形式
orient是表明預期的json字串格式。orient的設定有以下五個值:
1.'split' : dict like {index -> [index], columns -> [columns], data -> [values]}
這種就是有索引,有列欄位,和資料矩陣構成的json格式。key名稱只能是index,columns和data。
s = '{"index":[1,2,3],"columns":["a","b"],"data":[[1,3],[2,8],[3,9]]}' df = pd.read_json(s, orient='split') print(df) # a b # 1 1 3 # 2 2 8 # 3 3 9
2.'records' : list like [{column -> value}, ... , {column -> value}]
這種就是成員為字典的列表。如我今天要處理的json資料範例所見。構成是列欄位為鍵,值為鍵值,每一個字典成員就構成了dataframe的一行資料。
strtext = '[{"ttery":"min","issue":"20130801-3391","code":"8,4,5,2,9","code1":"297734529","code2":null,"time":1013395466000}, {"ttery":"min","issue":"20130801-3390","code":"7,8,2,1,2","code1":"298058212","code2":null,"time":1013395406000}]' df = pd.read_json(strtext, orient='records') print(df) # ttery issue code code1 code2 time # # 0 min 20130801-3391 8,4,5,2,9 297734529 NaN 1013395466000 # # 1 min 20130801-3390 7,8,2,1,2 298058212 NaN 1013395406000
3.'index' : dict like {index -> {column -> value}}
以索引為key,以列欄位構成的字典為鍵值。如:
s = '{"0":{"a":1,"b":2},"1":{"a":9,"b":11}}' df = pd.read_json(s, orient='index') print(df) # a b # 0 1 2 # 1 9 11
4.'columns' : dict like {column -> {index -> value}}
這種處理的就是以列為鍵,對應一個值字典的物件。這個字典物件以索引為鍵,以值為鍵值構成的json字串。如下圖所示:
s = '{"a":{"0":1,"1":9},"b":{"0":2,"1":11}}' df = pd.read_json(s, orient='columns') print(df) # a b # 0 1 2 # 1 9 11
5.'values' : just the values array。
values這種我們就很常見了。就是一個巢狀的列表。裡面的成員也是列表,2層的。
s = '[["a",1],["b",2]]' df = pd.read_json(s, orient='values') print(df) # 0 1 # 0 a 1 # 1 b 2
import numpy as np import pandas as pd import pymysql def conn(sql): # 連線到mysql資料庫 conn = pymysql.connect( host="localhost", port=3306, user="root", passwd="123", db="db1", ) try: data = pd.read_sql(sql, con=conn) return data except Exception as e: print("SQL is not correct!") finally: conn.close() sql = "select * from test1 limit 0, 10" # sql語句 data = conn(sql) print(data.columns.tolist()) # 檢視欄位 print(data) # 檢視資料
到此這篇關於Python中的pandas表格模組、檔案模組和資料庫模組的文章就介紹到這了。希望對大家的學習有所幫助,也希望大家多多支援it145.com。
相關文章
<em>Mac</em>Book项目 2009年学校开始实施<em>Mac</em>Book项目,所有师生配备一本<em>Mac</em>Book,并同步更新了校园无线网络。学校每周进行电脑技术更新,每月发送技术支持资料,极大改变了教学及学习方式。因此2011
2021-06-01 09:32:01
综合看Anker超能充系列的性价比很高,并且与不仅和iPhone12/苹果<em>Mac</em>Book很配,而且适合多设备充电需求的日常使用或差旅场景,不管是安卓还是Switch同样也能用得上它,希望这次分享能给准备购入充电器的小伙伴们有所
2021-06-01 09:31:42
除了L4WUDU与吴亦凡已经多次共事,成为了明面上的厂牌成员,吴亦凡还曾带领20XXCLUB全队参加2020年的一场音乐节,这也是20XXCLUB首次全员合照,王嗣尧Turbo、陈彦希Regi、<em>Mac</em> Ova Seas、林渝植等人全部出场。然而让
2021-06-01 09:31:34
目前应用IPFS的机构:1 谷歌<em>浏览器</em>支持IPFS分布式协议 2 万维网 (历史档案博物馆)数据库 3 火狐<em>浏览器</em>支持 IPFS分布式协议 4 EOS 等数字货币数据存储 5 美国国会图书馆,历史资料永久保存在 IPFS 6 加
2021-06-01 09:31:24
开拓者的车机是兼容苹果和<em>安卓</em>,虽然我不怎么用,但确实兼顾了我家人的很多需求:副驾的门板还配有解锁开关,有的时候老婆开车,下车的时候偶尔会忘记解锁,我在副驾驶可以自己开门:第二排设计很好,不仅配置了一个很大的
2021-06-01 09:30:48
不仅是<em>安卓</em>手机,苹果手机的降价力度也是前所未有了,iPhone12也“跳水价”了,发布价是6799元,如今已经跌至5308元,降价幅度超过1400元,最新定价确认了。iPhone12是苹果首款5G手机,同时也是全球首款5nm芯片的智能机,它
2021-06-01 09:30:45