Efficient query to locate centroid of clusters in postgis?
Date : March 29 2020, 07:55 AM
seems to work fine To take advantage of the spatial index you could use ST_DWithin. What is you search space? Can the centroid be anywhere in space?
|
Get nearest centroid using Thrust library? (K-Means)
Tag : cpp , By : user181706
Date : March 29 2020, 07:55 AM
may help you . I already finished computing the distances and stored in a thrust vector, for instance, I have 2 centroids and 5 datapoints and the way I computed the distances was that for each centroid I computed the distances with the 5 datapoints first and stored in the array and later with the other centroid in a 1d array in distances, just like this: , Here is one possible approach: DistancesValues = {10, 15, 20, 12, 10, 5, 17, 22, 8, 7}
DatapointsIndex = {1, 2, 3, 4, 5, 1, 2, 3, 4, 5}
CentroidIndex = {1, 1, 1, 1, 1, 2, 2, 2, 2, 2}
DatapointsIndex = {1, 1, 2, 2, 3, 3, 4, 4, 5, 5}
$ cat t428.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <thrust/reduce.h>
#include <thrust/copy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <stdio.h>
#define NUM_POINTS 5
#define NUM_CENTROID 2
#define DSIZE (NUM_POINTS*NUM_CENTROID)
int main(){
int DistancesValues[DSIZE] = {10, 15, 20, 12, 10, 5, 17, 22, 8, 7};
int DatapointsIndex[DSIZE] = {1, 2, 3, 4, 5, 1, 2, 3, 4, 5};
int CentroidIndex[DSIZE] = {1, 1, 1, 1, 1, 2, 2, 2, 2, 2};
thrust::device_vector<int> DV(DistancesValues, DistancesValues + DSIZE);
thrust::device_vector<int> DI(DatapointsIndex, DatapointsIndex + DSIZE);
thrust::device_vector<int> CI(CentroidIndex, CentroidIndex + DSIZE);
thrust::device_vector<int> Ra(NUM_POINTS);
thrust::device_vector<int> Rb(NUM_POINTS);
thrust::sort_by_key(DI.begin(), DI.end(), thrust::make_zip_iterator(thrust::make_tuple(DV.begin(), CI.begin())));
thrust::reduce_by_key(DI.begin(), DI.end(), thrust::make_zip_iterator(thrust::make_tuple(DV.begin(), CI.begin())), thrust::make_discard_iterator(), thrust::make_zip_iterator(thrust::make_tuple(Ra.begin(), Rb.begin())), thrust::equal_to<int>(), thrust::minimum<thrust::tuple<int, int> >());
printf("CountOfCentroid 1 = %d\n", thrust::count(Rb.begin(), Rb.end(), 1));
printf("CountOfCentroid 2 = %d\n", thrust::count(Rb.begin(), Rb.end(), 2));
return 0;
}
$ nvcc -arch=sm_20 -o t428 t428.cu
$ ./t428
CountOfCentroid 1 = 2
CountOfCentroid 2 = 3
$
|
Sklearn: find mean centroid location for clusters?
Date : March 29 2020, 07:55 AM
I hope this helps . The docs of sklearn.decomposition.NMF explain how to get the coordinates of the centroid of each cluster: In [995]: np.set_printoptions(precision=2)
In [996]: nmf.components_
Out[996]:
array([[ 0.54, 0.91, 0. , 0. , 0. , 0. , 0. , 0.89, 0. , 0.89, 0.37, 0.54, 0. , 0.54],
[ 0. , 0.01, 0.71, 0. , 0. , 0. , 0.71, 0.72, 0.71, 0.01, 0.02, 0. , 0.71, 0. ],
[ 0. , 0.01, 0.61, 0.61, 0.61, 0.61, 0. , 0. , 0. , 0.62, 0.02, 0. , 0. , 0. ]])
|
How to calculate the distance between a document and each centroid (k-means)?
Date : March 29 2020, 07:55 AM
hope this fix your issue You can use the method predict to get the closest cluster for each sample in a matrix X: from sklearn.cluster import KMeans
model = KMeans(n_clusters=K)
model.fit(X_train)
label = model.predict(X_test)
|
Can we rank K-Means clusters or assign weights to certain clusters?
Date : March 29 2020, 07:55 AM
this one helps. One "cheat" trick would be to use the feature ratingtwice or three times, then it automatically gets more weight: data = np.asarray([np.asarray(dataset['Rating']), np.asarray(dataset['Rating']), np.asarray(dataset['Maturity']),np.asarray(dataset['Score']),np.asarray(dataset['Bin']),np.asarray(dataset['Price1']),np.asarray(dataset['Price2']),np.asarray(dataset['Price3'])]).T
|