Pandas 0.24發布，將放棄Python 2

發布時間：2020-08-13 15:32:24 來源：ITPUB博客閱讀：472 作者：挖地兔訂閱號欄目：編程語言

很多Pythoner應該早就知道，Python核心團隊將在2020年1月1日停止支持Python 2.7。

我們也看到了用于Python數據分析的各種神器也陸續公布了取消支持Python 2.7的計劃。IPython是首批放棄對Python 2支持的工具之一，緊隨其后的是Matplotlib和最近的NumPy。其他流行的庫，如scikit-learn和SciPy，也將取消對Python 2的支持。

2019年1月25日，Pandas發布了0.24.0版本，在對Python 2.7的取消支持的計劃里，提到了從0.24開始，所有的新功能將不在支持Python 2.7，全面轉向只對Python 3的支持。

可以感覺到，從2019年開始，很多的Python包即將全面支持Python 3，請各位Python 初學者在選擇Python版本的時候，各位Python老程序員依然還在考慮是否繼續使用Python 2的時候，各位手里掌控了Python包目前只支持Python 2.7的并有不少用戶的朋友，可以死心塌地的轉向Python 3了。

Tushare SDK在很早前就同時支持Python 2和3，所以并不存在版本的問題，未來在發布新工具的時候，也會轉向Python 3版本。

Pandas 0.24的變化

提升依賴包版本

Pandas 0.24對各依賴包的最低版本進行了調整，提升了一些版本號，我們可以從以下表格中查看到各依賴包的最低版本要求。

其實用戶并不需要專門考慮各種依賴包的版本問題，不管是新安裝也好，手動升級Pandas版本也好，安裝程序會自行升級依賴包的版本。

但是，如果用戶對這些依賴包的版本有特定要求的時候，請謹慎升級。

重要新增功能介紹

0.24版本做了一些功能增強，這里只介紹增加的幾個新接口。

1、創建數組的新方法

新版本添加了一個新的頂級方法array()來創建一維數組，可用于創建任意擴展，擴展數組是從0.23版本開始的一個概念，用于實現擴展 NumPy類型系統的數據類型和數組。用興趣的用戶，可以查詢pandas官網獲得更多信息。

In [1]: pd.array([1, 2, np.nan], dtype='Int64')
Out[2]: 
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

In [2]: pd.array(['a', 'b', 'c'], dtype='category')
Out[2]: 
[a, b, c]
Categories (3, object): [a, b, c]

2、用于提取Series或Index數組的新方法

在老的pandas版本里，我們可以通過.values來提取Series或者DataFrame的數據數組，而從0.24版本開始，Pandas提供了兩個新的方法.array或.to_numpy()。

In [3]: idx = pd.period_range('2000', periods=4)

In [4]: idx.array
Out[4]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]

In [5]: pd.Series(idx).array
Out[5]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]

老的方法每次返回的都是ndarray類型，而如果數據是Pandas自定義的數據類型就無法實現。所以在新版里，如果你想獲取NumPy的ndarry，可以使用新辦法：In [7]: idx.to_numpy() Out[7]: array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'), Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object) In [8]: pd.Series(idx).to_numpy() Out[8]: array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'), Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

Pandas新版依然保留了.values的方法，但官方強烈建議用.array或.to_numpy()來替代.values。

3、read_html()功能改進

在之前的版本，如果是一個正常的html table，pandas的read_html方法可以快速的將表格數據讀取為一個DataFrame。但是，如果html table帶有colspan和rowspan屬性的合并字段情況下，pandas會讀取錯誤。

比如，我們這里有一個表格：

Pandas 0.24發布，將放棄Python 2

html代碼是：

In [8]: result = pd.read_html("""
   ....:   <table>
   ....:     <thead>
   ....:       <tr>
   ....:         <th>A</th><th>B</th><th>C</th>
   ....:       </tr>
   ....:     </thead>
   ....:     <tbody>
   ....:       <tr>
   ....:         <td colspan="2">1</td><td>2</td>
   ....:       </tr>
   ....:     </tbody>
   ....:   </table>""")
   ....:

老版本read_html讀取到的數據格式為：

In [9]: result
Out [9]:
[   A  B   C
 0  1  2 NaN]

而新版pandas讀取到的結果是：

In [10]: result
Out[10]: 
[   A  B  C
 0  1  1  2

 [1 rows x 3 columns]]

可以看出，實際上舊版讀取出來的數據是錯誤，而0.24版本進行了改進。

新舊版本的不兼容問題

除了增加了新接口，在一些功能方面也做了一些調整，我只拿最重要的變化來舉例，希望各位Pandas的重度用戶注意一下這些變化。

1、時間周期對象的加減操作
對于時間類型的加減操作，在以前的版本，返回的是整形結果，比如說兩個日期相減：

In [12]: june = pd.Period('June 2018')

In [13]: april = pd.Period('April 2018')

In [14]: june - april
Out [14]: 2

而在新版里，結果為DateOffset對象：

In [16]: june = pd.Period('June 2018')

In [17]: april = pd.Period('April 2018')

In [18]: june - april
Out[18]: <2 * MonthEnds>

2、DataFrame廣播運算的變化

對于DF的廣播運算操作主要的變化有：
1）對于具有1行或1列的2維的DF運算操作，將以相同的ndarray方式進行廣播。
2）DataFrame進行一個列表或元組運算，進行逐列操作，而不是行數全匹配。

來看一個實例：

In [87]: arr = np.arange(6).reshape(3, 2)

In [88]: df = pd.DataFrame(arr)

In [89]: df
Out[89]: 
   0  1
0  0  1
1  2  3
2  4  5

[3 rows x 2 columns]

以前的方式，如果不匹配，會拋出ValueError

In [5]: df == arr[[0], :]
    ...: # comparison previously broadcast where arithmetic would raise
Out[5]:
       0      1
0   True   True
1  False  False
2  False  False
In [6]: df + arr[[0], :]
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)

In [7]: df == (1, 2)
    ...: # length matches number of columns;
    ...: # comparison previously raised where arithmetic would broadcast
...
ValueError: Invalid broadcasting comparison [(1, 2)] with block values
In [8]: df + (1, 2)
Out[8]:
   0  1
0  1  3
1  3  5
2  5  7

In [9]: df == (1, 2, 3)
    ...:  # length matches number of rows
    ...:  # comparison previously broadcast where arithmetic would raise
Out[9]:
       0      1
0  False   True
1   True  False
2  False  False
In [10]: df + (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

在新版里，是這樣的效果：

# Comparison operations and arithmetic operations both broadcast.
In [90]: df == arr[[0], :]
Out[90]: 
       0      1
0   True   True
1  False  False
2  False  False

[3 rows x 2 columns]

In [91]: df + arr[[0], :]
Out[91]: 
   0  1
0  0  2
1  2  4
2  4  6

[3 rows x 2 columns]

# Comparison operations and arithmetic operations both broadcast.
In [92]: df == (1, 2)
Out[92]: 
       0      1
0  False  False
1  False  False
2  False  False

[3 rows x 2 columns]

In [93]: df + (1, 2)
Out[93]: 
   0  1
0  1  3
1  3  5
2  5  7

[3 rows x 2 columns]

總結

除了上述一些變化以外，其實還有很多改進或者變動。總的來說，0.24.0版做了不少改進，也開啟了pandas正式全面擁抱Python 3的進程，希望Pandas越來越好，也希望每一個用Pandas做數據分析的用戶都能在數據里挖據出數據價值，同時實現自己的價值。

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Pandas 0.24發布，將放棄Python 2

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Pandas 0.24發布，將放棄Python 2

猜你喜歡

最新資訊

相關推薦

相關標簽