help you fix your problem Just figured it out thanks to the opencv forums, instead of using another list (I used descriptors above), just add the descriptors you find directly to your bag with bow.add(dsc)
dictionarySize = 5
BOW = cv2.BOWKMeansTrainer(dictionarySize)
for p in training_paths:
image = cv2.imread(p)
gray = cv2.cvtColor(image, cv2.CV_LOAD_IMAGE_GRAYSCALE)
kp, dsc= sift.detectAndCompute(gray, None)
dictionary = BOW.cluster()
Python: clustering similar words based on word2vec
Does that help No, not really. For reference, common word2vec models which are trained on wikipedia (in english) consists around 3 billion words. You can use KNN (or something similar). Gensim has the most_similar function to get the closest words. Using a dimensional reduction (like PCA or tsne) you can get yourself a nice cluster. (Not sure if gensim has tsne module, but sklearn has, so you can use it) btw you're referring to some image, but it's not available.
clustering inside clustering that is nested clustering of a data table that is multiclass clustering
it helps some times You will need to carefully balance thresholds in textual similarity and in numerical similarity. There won't be an easy solution, and unless you have really huge data, a manual approach may be best. Textual similarity of short strings is highly unreliable.