您好,登錄后才能下訂單哦!
在C++中實現聚類算法時,軟聚類和硬聚類是兩種常見的方法,它們在處理數據點歸屬問題時有著不同的方式。
硬聚類是一種將數據點劃分為固定數量的簇的方法,每個數據點只能屬于一個簇,且簇的邊界是明確的。在C++中,可以使用多種算法來實現硬聚類,如K-means算法。
K-means算法是一種迭代優化算法,旨在將n個觀測值劃分為k個(k≤n)聚類,使得每個觀測值屬于最近的均值(聚類中心)所代表的聚類,同時使得各聚類的內部觀測值之間的平方距離(或歐氏距離)之和最小。
以下是一個簡單的C++ K-means算法實現示例:
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
struct Point {
double x, y;
};
double distance(const Point& a, const Point& b) {
return std::sqrt((a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y));
}
std::vector<Point> kMeans(const std::vector<Point>& points, int k, int maxIterations = 100) {
std::vector<Point> centroids(k);
std::vector<int> assignments(points.size(), -1);
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(0, k - 1);
for (int i = 0; i < maxIterations; ++i) {
// Assign points to the nearest centroid
std::vector<int> counts(k, 0);
for (size_t j = 0; j < points.size(); ++j) {
double minDist = std::numeric_limits<double>::max();
int closestCentroid = -1;
for (int c = 0; c < k; ++c) {
double dist = distance(points[j], centroids[c]);
if (dist < minDist) {
minDist = dist;
closestCentroid = c;
}
}
assignments[j] = closestCentroid;
counts[closestCentroid]++;
}
// Update centroids
for (int c = 0; c < k; ++c) {
if (counts[c] > 0) {
centroids[c] = {0, 0};
for (size_t j = 0; j < points.size(); ++j) {
if (assignments[j] == c) {
centroids[c].x += points[j].x;
centroids[c].y += points[j].y;
}
}
centroids[c].x /= counts[c];
centroids[c].y /= counts[c];
}
}
}
return centroids;
}
int main() {
std::vector<Point> points = {{1, 2}, {1, 4}, {1, 0}, {10, 2}, {10, 4}, {10, 0}};
int k = 2;
std::vector<Point> centroids = kMeans(points, k);
for (const auto& centroid : centroids) {
std::cout << "Centroid: (" << centroid.x << ", " << centroid.y << ")\n";
}
return 0;
}
與硬聚類不同,軟聚類允許數據點屬于多個簇,每個數據點屬于每個簇的概率是一個軟決策。這種方法在處理數據時提供了更大的靈活性,因為它允許數據點部分地屬于一個簇。
在C++中,K-means++是一種常用的軟聚類算法,它是K-means算法的擴展,用于改進初始質心的選擇,從而提高聚類的質量。K-means++通過選擇距離現有質心較遠的點作為新的質心,以避免初始質心選擇的隨機性導致的不穩定性。
以下是一個簡單的C++ K-means++算法實現示例:
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
struct Point {
double x, y;
};
double distance(const Point& a, const Point& b) {
return std::sqrt((a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y));
}
std::vector<Point> kMeansPlusPlus(const std::vector<Point>& points, int k, int maxIterations = 100) {
std::vector<Point> centroids(k);
std::vector<int> assignments(points.size(), -1);
std::default_random_engine generator;
std::uniform_real_distribution<double> distribution(0.0, 1.0);
// Choose the first centroid randomly
centroids[0] = points[distribution(generator) * points.size()];
for (int i = 1; i < k; ++i) {
std::vector<double> distances(points.size());
for (size_t j = 0; j < points.size(); ++j) {
double dist = distance(points[j], centroids[i - 1]);
distances[j] = dist * dist; // Square the distance for selection
}
// Select the next centroid with probability proportional to the squared distance
double sumDistances = 0;
for (size_t j = 0; j < points.size(); ++j) {
sumDistances += distances[j];
if (distribution(generator) < sumDistances / (i * points.size())) {
centroids[i] = points[j];
break;
}
}
}
// Assign points to the nearest centroid
std::vector<int> counts(k, 0);
for (size_t j = 0; j < points.size(); ++j) {
double minDist = std::numeric_limits<double>::max();
int closestCentroid = -1;
for (int c = 0; c < k; ++c) {
double dist = distance(points[j], centroids[c]);
if (dist < minDist) {
minDist = dist;
closestCentroid = c;
}
}
assignments[j] = closestCentroid;
counts[closestCentroid]++;
}
return centroids;
}
int main() {
std::vector<Point> points = {{1, 2}, {1, 4}, {1, 0}, {10, 2}, {10, 4}, {10, 0}};
int k = 2;
std::vector<Point> centroids = kMeansPlusPlus(points, k);
for (const auto& centroid : centroids) {
std::cout << "Centroid: (" << centroid.x << ", " << centroid.y << ")\n";
}
return 0;
}
在這兩個示例中,我們定義了一個Point
結構體來表示二維空間中的點,并實現了計算兩點之間距離的函數distance
。kMeans
函數實現了基本的K-means硬聚類算法,而kMeansPlusPlus
函數實現了K-means++軟聚類算法。在kMeansPlusPlus
中,我們通過選擇距離現有質心較遠的點作為新的質心,來改進初始質心的選擇。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。