pandas分組聚合的用法

發布時間：2020-08-04 12:14:15 來源：億速云閱讀：168 作者：小豬欄目：開發技術

這篇文章主要講解了pandas分組聚合的用法，內容清晰明了，對此有興趣的小伙伴可以學習一下，相信大家閱讀完之后會有幫助。

一前言

pandas學到分組迭代，那么基礎的pandas系列就學的差不多了，自我感覺不錯，知識追尋者用pandas處理過一些數據，蠻好用的；

知識追尋者(Inheriting the spirit of open source, Spreading technology knowledge;)

二分組

2.1 數據準備

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

frame = pd.DataFrame({
 'user' : ['zszxz','craler','rose','zszxz','rose'],
 'hobby' : ['reading','running','hiking','reading','hiking'],
 'price' : np.random.randn(5),
 'number' : np.random.randn(5)
})
print(frame)

輸出

     user    hobby     price    number
0   zszxz reading 0.275752 -0.075841
1 craler running -1.410682 0.259869
2    rose   hiking -0.353269 -0.392659
3   zszxz reading 1.484604 0.659274
4    rose   hiking -1.348315 2.492047

2.2 分組求均值

提取DataFrame中price 列，根據hobby列進行分組，最后對分好組的數據進行處理求均值；

# 是個生成器
group = frame['price'].groupby(frame['hobby'])
# 求均值
print(group.mean())

輸出

hobby
hiking    -0.850792
reading    0.880178
running   -1.410682
Name: price, dtype: float64

Tip: 可以理解為根據愛好分組，查詢價格；查詢的列必須是數字，否則求均值時會報異常

如果是根據多列分組則在groupby后面使用列表指定，并且調用求均值函數；輸出的值將是分組列，均值結果；

group = frame['price'].groupby([frame['hobby'],frame['user']])
print(group.mean())

輸出

hobby    user
hiking   rose      0.063972
reading zszxz     0.393164
running craler   -1.395186
Name: price, dtype: float64

如果對整個DataFrame進行分組，則不再需要提取指定的列；

group = frame.groupby(frame['hobby'])
print(group.mean())

輸出

hobby
hiking -0.116659 -0.316222
reading -0.651365 0.856299
running -0.282676 -0.585124

Tip: 求均值后，默認是對數字類型的數據進行分組求均值；非數字列自動忽略

2.3 分組求數量

分組求數量是統計分析中應用最為廣泛的函數；如下示例中對DataFrame根據hobby分組，并且調用 size()函數統計個數；此方法常用的統計技巧；

group = frame.groupby(frame['hobby'])
print(group.size())

輸出

hobby
hiking     2
reading    2
running    1
dtype: int64

2.4 分組迭代

當對groupby的列只有單個時（示例根據hobby進行分組），可以使用 key , value 形式對分組后的數據進行迭代，其中key 是分組的名稱，value是分組的數據；

group = frame['price'].groupby(frame['hobby'])
for key , data in group:
 print(key)
 print(data)

輸出

hiking
2   -0.669410
4   -0.246816
Name: price, dtype: float64
reading
0    1.362191
3   -0.052538
Name: price, dtype: float64
running
1    0.8963
Name: price, dtype: float64

當對多個列進行分組迭代時，有多少列則需要指定多少個key與其對應，key可以是任何不重復的變量名稱

group = frame['price'].groupby([frame['hobby'],frame['user']])
for (key1, key2) , data in group:
 print(key1,key2)
 print(data)

輸出

hiking rose
2   -0.019423
4   -2.642912
Name: price, dtype: float64
reading zszxz
0    0.405016
3    0.422182
Name: price, dtype: float64
running craler
1   -0.724752
Name: price, dtype: float64

2.5 分組數據轉為字典

可以對分組后的數據轉為字典；

dic = dict(list(frame.groupby(frame['hobby'])))
print(dic)

輸出

{'hiking':    user   hobby     price    number
2 rose hiking 0.351633 0.523272
4 rose hiking 0.800039 0.331646,
'reading':     user    hobby     price    number
0 zszxz reading -0.074857 -0.928798
3 zszxz reading 0.666925 0.606706,
'running':      user    hobby     price    number
1 craler running -2.525633 0.895776}

獲取key

print(dic['hiking'])

輸出

user hobby price number
2 rose hiking 0.382225 -0.242055
4 rose hiking 1.055785 -0.328943

2.6 分組取值

對frame進行hobby分組，就算查詢 price 的均值；返回Series；

mean = frame.groupby('hobby')['price'].mean()
print(type(mean))
print(mean)

輸出

<class 'pandas.core.series.Series'>
hobby
hiking     0.973211
reading   -1.393790
running   -0.286236
Name: price, dtype: float64

Tip: frame.groupby(‘hobby')[‘price'] 與 frame[‘price'] .groupby(frame[‘hobby']) 相等

如果想要返回 DataFrame

mean = frame.groupby('hobby')[['price']].mean()
print(type(mean))
print(mean)

輸出

<class 'pandas.core.frame.DataFrame'>
            price
hobby
hiking   0.973211
reading -1.393790
running -0.286236

2.5 Series作為分組

也可以傳入Series作為DataFrame的分組列

ser = pd.Series(['hiking','reading','running'])
data = frame.groupby(ser).mean()
print(data)

輸出

price number
hiking 1.233396 0.313839
reading -0.298887 0.982853
running -0.797734 -1.230811

Tip: 本質上都是數組，除了Series，還可以使用字典，列表，數組，函數作為分組列

2.6 通過索引層級分組

傳入級別的名稱即可實現層級化索引分組

# 創建2個列，并且指定名稱
columns = pd.MultiIndex.from_arrays([['Python', 'Java', 'Python', 'Java', 'Python'],
          ['a', 'b', 'a', 'b', 'c']], names=['language', 'alpha'])
frame = pd.DataFrame(np.random.randint(1, 10, (5, 5)), columns=columns)
print(frame)

# 根據language進行分組
print(frame.groupby(level='language', axis=1).sum())
# 根據index進行分組
print(frame.groupby(level='alpha', axis=1).sum())

frame輸出如下

language Python Java Python Java Python
alpha         a    b      a    b      c
0             9    9      7    4      5
1             3    4      7    6      6
2             6    6      3    9      1
3             1    1      8    5      2
4             6    5      9    5      4

language分組如下

language Java Python
0           13      21
1           10      16
2           15      10
3            6      11
4           10      19

alpha分組如下

alpha   a   b c
0      16 13 5
1      10 10 6
2       9 15 1
3       9   6 2
4      15 10 4

看完上述內容，是不是對pandas分組聚合的用法有進一步的了解，如果還想學習更多內容，歡迎關注億速云行業資訊頻道。

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

pandas分組聚合的用法

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

pandas分組聚合的用法

猜你喜歡

最新資訊

相關推薦

相關標簽