sklearn隨機森林的參數有哪些

發布時間：2022-03-22 10:11:28 來源：億速云閱讀：268 作者：iii 欄目：大數據

這篇文章主要介紹“sklearn隨機森林的參數有哪些”，在日常操作中，相信很多人在sklearn隨機森林的參數有哪些問題上存在疑惑，小編查閱了各式資料，整理出簡單好用的操作方法，希望對大家解答”sklearn隨機森林的參數有哪些”的疑惑有所幫助！接下來，請跟著小編一起來學習吧！

隨機森林

隨機森林是一個元估計器，它適合數據集的各個子樣本上的多個決策樹分類器，并使用平均值來提高預測精度和控制過度擬合。子樣本大小始終與原始輸入樣本大小相同，但如果bootstrap = True（默認值），則會使用替換來繪制樣本。
先看這個類的參數：

class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)

代碼舉例：

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=4,
                            n_informative=2, n_redundant=0,
                            random_state=0, shuffle=False)
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, y)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=2, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=0, verbose=0, warm_start=False)
print(clf.feature_importances_)
[ 0.17287856  0.80608704  0.01884792  0.00218648]
print(clf.predict([[0, 0, 0, 0]]))
[1]

具體參數意義如下：
參數：

n_estimators :森林里（決策）樹的數目
integer, optional (default=10) 整數，可選擇(默認值為10)
criterion : string, optional (default=”gini”) 字符串，可選擇(默認值為“gini”)。
衡量分裂質量的性能（函數）。受支持的標準是基尼不純度的"gini",和信息增益的"entropy"（熵）。
注意：這個參數是特定樹的。
max_features : int, float, string or None, optional (default=”auto”) 整數，浮點數，字符串或者無值，可選的（默認值為"auto"）
尋找最佳分割時需要考慮的特征數目：
&如果是int，就要考慮每一次分割處的max_feature特征
&如果是float，那么max_features就是一個百分比，那么（max_feature*n_features）特征整數值是在每個分割處考慮的。
&如果是auto，那么max_features=sqrt(n_features)，即n_features的平方根值。
&如果是log2，那么max_features=log2(n_features)
&如果是None,那么max_features=n_features
注意：尋找分割點不會停止，直到找到最少一個有效的節點劃分區，即使它需要有效檢查超過max_features的特征。
max_depth : integer or None, optional (default=None) 整數或者無值，可選的（默認為None）
（決策）樹的最大深度。如果值為None，那么會擴展節點，直到所有的葉子是純凈的，或者直到所有葉子包含少于min_sample_split的樣本。
min_samples_split : int, float, optional (default=2) 整數，浮點數，可選的（默認值為2）
分割內部節點所需要的最小樣本數量：
~如果為int，那么考慮min_samples_split作為最小的數字。
~如果為float，那么min_samples_split是一個百分比，并且把ceil(min_samples_split*n_samples)是每一個分割最小的樣本數量。
在版本0.18中更改：為百分比添加浮點值。
min_samples_leaf : int, float, optional (default=1) 整數，浮點數，可選的（默認值為1）
需要在葉子結點上的最小樣本數量：
~如果為int，那么考慮min_samples_leaf作為最小的數字。
~如果為float，那么min_samples_leaf為一個百分比，并且ceil(min_samples_leaf*n_samples)是每一個節點的最小樣本數量。
在版本0.18中更改：為百分比添加浮點值。
min_weight_fraction_leaf : float, optional (default=0.) 浮點數，可選的（默認值是0.0）
一個葉子節點所需要的權重總和（所有的輸入樣本）的最小加權分數。當sample_weight沒有提供時，樣本具有相同的權重
max_leaf_nodes : int or None, optional (default=None) 整數或者無值,可選的（默認值為None）
以最優的方法使用max_leaf_nodes來生長樹。最好的節點被定義為不純度上的相對減少。如果為None,那么不限制葉子節點的數量。
min_impurity_split : float, 浮點數
樹早期生長的閾值。如果一個節點的不純度超過閾值那么這個節點將會分裂，否則它還是一片葉子。
min_impurity_decrease : float, optional (default=0.) 浮點數，可選的（默認值為0）
bootstrap : boolean, optional (default=True) 布爾值，可選的（默認值為True）建立決策樹時，是否使用有放回抽樣。
oob_score : bool (default=False) bool，（默認值為False）是否使用袋外樣本來估計泛化精度。
n_jobs : integer, optional (default=1) 整數，可選的（默認值為1）用于擬合和預測的并行運行的工作（作業）數量。如果值為-1，那么工作數量被設置為核的數量。
random_state : int, RandomState instance or None, optional (default=None) 整數，RandomState實例，或者為None,可選（默認值為None）RandomStateIf int，random_state是隨機數生成器使用的種子; 如果是RandomState實例，random_state就是隨機數生成器; 如果為None，則隨機數生成器是np.random使用的RandomState實例。
verbose : int, optional (default=0) 整數，可選的（默認值為0）控制決策樹建立過程的冗余度。
warm_start : bool, optional (default=False) 布爾值，可選的（默認值為False）當被設置為True時，重新使用之前呼叫的解決方案，用來給全體擬合和添加更多的估計器，反之，僅僅只是為了擬合一個全新的森林。
class_weight : dict, list of dicts, “balanced”, 字典，字典序列，"balanced"

屬性：

estimators_ : list of DecisionTreeClassifier 決策樹分類器的序列，擬合的子估計器的集合。
classes_ : array of shape = [n_classes] or a list of such arrays 數組維度=[n_classes]的數組或者一個這樣數組的序列。類別標簽（單一輸出問題），或者類別標簽的數組序列（多輸出問題）。
n_classes_ : int or list 整數或者序列，類別的數量（單輸出問題），或者一個序列，包含每一個輸出的類別數量（多輸出問題）
n_features_ : int 整數，執行擬合時的特征數量
n_outputs_ : int 整數，執行擬合時的輸出數量。
feature_importances_ : array of shape = [n_features] 維度等于n_features的數組，特征的重要性（值越高，特征越重要）
oob_score_ : float 浮點數，使用袋外估計獲得的訓練數據集的得分。
oob_decision_function_ : array of shape = [n_samples, n_classes] 維度=[n_samples,n_classes]的數組，在訓練集上用袋外估計計算的決策函數。如果n_estimators很小的話，那么在有放回抽樣中，一個數據點也不會被忽略是可能的。在這種情況下，oob_decision_function_ 可能包括NaN。

注意點：

參數的默認值控制決策樹的大小（例如，max_depth，，min_samples_leaf等等），導致完全的生長和在某些數據集上可能非常大的未修剪的樹。為了降低內容消耗，決策樹的復雜度和大小應該通過設置這些參數值來控制。

這些特征總是在每個分割中隨機排列。因此，即使使用相同的訓練數據，max_features = n_features和bootstrap = False，如果在搜索最佳分割期間所列舉的若干分割的準則的改進是相同的，那么找到的最佳分割點可能會不同。為了在擬合過程中獲得一個確定的行為，random_state將不得不被修正。

方法：

apply(X) Apply trees in the forest to X, return leaf indices.
decision_path(X) Return the decision path in the forest
fit(X, y[, sample_weight]) Build a forest of trees from the training set (X, y).
get_params([deep]) Get parameters for this estimator.
predict(X) Predict class for X.
predict_log_proba(X) Predict class log-probabilities for X.
predict_proba(X) Predict class probabilities for X.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_params(**params) Set the parameters of this estimator.

到此，關于“sklearn隨機森林的參數有哪些”的學習就結束了，希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習，快去試試吧！若想繼續學習更多相關知識，請繼續關注億速云網站，小編會繼續努力為大家帶來更多實用的文章！

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

sklearn隨機森林的參數有哪些

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

sklearn隨機森林的參數有哪些

猜你喜歡

最新資訊

相關推薦

相關標簽