it should still fix some issue The output of the clustering algorithm must at least contain the cluster assignments, i.e. which cluster each point belongs to. If you have that, then the k-means clustering cluster centers are simply given by the mean of the points that belong to each cluster.
Boxplot overlaid on dot plot + means, means in wrong position
this one helps. The help for ?position_dodge just says that dodging things with different widths is tricky - I usually tweak this manually. Trying a few values, it looks like you want the points to take a dodge width that is 3/4 of the boxplot width but I don't know why or if that holds for other geoms. I would try changing the width for the stat_summary call to 0.15
Rolling means and applying means at beginning of a series of data
this will help I am clustering the text data based on TFIDF vectorizer. The code works fine. It takes entire TFIDF vectorizer output as input to the K-Means clustering and generate a scatter plots. Instead I would like to send only top n-terms based on TF-IDF scores as input to the k-means clustering. Is there a way to achieve that ? , use max_features in TfidfVectorizer to consider the top n features