Modified 2018-06-22 by Andrea Censi
Clustering is the process of grouping some objects such that similar objects belong to the same group. In the sense of colors it could be that similar colors are grouped together e.g. bright red, ruby and pink belong to the group of red colors where as azure blue, copenhagen blue and dark blue would be grouped to the blue colors.
Various algorithm can solve such a task. They differ in how they define what is a cluster (e.g. the members are within a certain distance) and how efficiently these algorithms can find these clusters.
Following some algorithms for determining clusters are presented.
Modified 2018-06-22 by Andrea Censi
Modified 2018-06-22 by Andrea Censi
Let’s assume we have $d$ data points $x_{1,...,d}$ and $k$ cluster centers $m_{1,...,k}$. Now the algorithm tries to put the centers such that optimally all the clusters existing in the data are found. A data point $x_i$ belongs to the cluster $j$ if the cluster center $m_j$ is the nearest among all the clusters $m_{1,..,k}$.
So the algorithm can be described as follows:
Modified 2018-06-22 by Andrea Censi
(Hartigan, J. A.; Wong, M. A. (1979). “Algorithm AS 136: A K-Means Clustering Algorithm”. Journal of the Royal Statistical Society. Series C (Applied Statistics). 28 (1): 100–108.)
TODO: add bibtex for entry above
The following was marked as "todo".
TODO: add bibtex for entry above
File book/preliminaries/20_signal_processing/68_clusteringMethods.md.
File book/preliminaries/20_signal_processing/68_clusteringMethods.md
in repo duckietown/docs-preliminaries branch master commit c9a641e8
last modified by Andrea Censi on 2018-06-22 20:11:27
create_notes_from_elements
in module mcdp_docs.task_markers
.Modified 2018-06-22 by Andrea Censi
We’ve seen above several drawbacks of k-Means clustering. That’s why there is motivation to search for something better.
Modified 2018-06-22 by Andrea Censi
Simply speaking Gaussian mixture models are a more general version of k-Means assuming we have k components. This means we fit to each cluster a Gaussian distribution. The underlying assumption is that every data point is generated through a mixture of Gaussian distributions. Their parameters are unknown and are determined through the algorithm.
Modified 2018-06-22 by Andrea Censi
Modified 2018-06-22 by Andrea Censi
No questions found. You can ask a question on the website.