# Clustering¶

## Multiview Spectral Clustering¶

class mvlearn.cluster.MultiviewSpectralClustering(n_clusters=2, random_state=None, info_view=None, max_iter=10, n_init=10, affinity='rbf', gamma=None, n_neighbors=10)[source]

An implementation of multi-view spectral clustering using the basic co-training framework as described in [1]. Additionally, this can be effective when the dataset naturally contains features that are of 2 different data types, such as continuous features and categorical features [4], and then the original features are separated into two views in this way.

This algorithm can handle 2 or more views of data.

Parameters: n_clusters : int The number of clusters random_state : int, optional, default=None Determines random number generation for k-means. info_view : int, optional, default=None The most informative view. Must be between 0 and n_views-1 If given, then the final clustering will be performed on the designated view alone. Otherwise, the algorithm will concatenate across all views and cluster on the result. max_iter : int, optional, default=10 The maximum number of iterations to run the clustering algorithm. n_init : int, optional, default=10 The number of random initializations to use for k-means clustering. affinity : string, optional, default='rbf' The affinity metric used to construct the affinity matrix. Options include 'rbf' (radial basis function), 'nearest_neighbors', and 'poly' (polynomial) gamma : float, optional, default=None Kernel coefficient for rbf and polynomial kernels. If None then gamma is computed as 1 / (2 * median(pair_wise_distances(X))^2) for each data view X. n_neighbors : int, optional, default=10 Only used if nearest neighbors is selected for affinity. The number of neighbors to use for the nearest neighbors kernel.

Attributes

 labels_ (array-like, shape (n_samples)) Cluster labels for each sample in the fitted data. embedding_ (array-like, shape (n_samples, n_clusters)) The final spectral representation of the data to be used as input for the KMeans clustering step.

Notes

Multi-view spectral clustering adapts the spectral clustering algorithm to applications where more than one view of data is available. This algorithm relies on the basic assumptions of the co-training, which are: (a) Sufficiency: each view is sufficient for classification on its own, (b) Compatibility: the target functions in both views predict the same labels for co-occurring features with high probability, and (c) Conditional independence: the views are conditionally independent given the class labels. In contrast to multi-view k-means clustering, multi-view spectral clustering performs well on arbitrary shaped clusters, and can therefore be readily used in applications where clusters are not expected to be convex. However multi-view spectral clustering tends to be computationally expensive unless the similarity graph for the data is sparse.

Multi-view spectral clustering works by using the spectral embedding from one view to constrain the similarity graph in the other view. By iteratively applying this procedure, the clustering of the two views tend to each other. Here we outline the algorithm for the Multi-view Spectral clustering algorithm for 2 views.

Multi-view Spectral Clustering Algorithm (for 2 views)

Input: Similarity matrix for both views: $$\mathbf{K}_1, \mathbf{K}_2$$

Output: Assignments to k clusters

1. Initialize: $$\mathbf{L}_v = \mathbf{D}_v^{-1/2} \mathbf{K}_v\mathbf{D}_v^{-1/2}$$ for $$v = 1, 2$$

$$\mathbf{U}_v^0$$ is an $$n \times k$$ matrix with the top k eigenvectors of $$\mathbf{L}_v$$ for $$v = 1, 2$$

2. For $$i = 1$$ to iter:

1. $$\mathbf{S}_1 = sym(\mathbf{U}_2^{i-1} {\mathbf{U}_2^{i-1}}^T\mathbf{K}_1)$$
2. $$\mathbf{S}_2 = sym(\mathbf{U}_1^{i-1} {\mathbf{U}_1^{i-1}}^T\mathbf{K}_2)$$
3. Use $$\mathbf{S}_1$$ and $$\mathbf{S}_2$$ as the new graph similarities and compute the Laplacians. Solve for the largest k eigenvectors to obtain $$\mathbf{U}_1^i$$ and $$\mathbf{U}_2^i$$.
3. Row-normalize $$\mathbf{U}_1^i$$ and $$\mathbf{U}_2^i$$.

4. Form matrix $$\mathbf{V} = \mathbf{U}_v^i$$, where $$v$$ is believed to be the most informative view a priori. If there is no prior knowledge on the view informativeness, matrix $$\mathbf{V}$$ can also be set to the column-wise concatenation of the two $$\mathbf{U}_v^i$$ s.

5. Assign example j to cluster c if the j-th row of $$\mathbf{V}$$ is assigned to cluster c by the k-means algorithm.

References

 [1] Abhishek Kumar and Hal Daumé. A Co-training Approach for Multiview Spectral Clustering. In International Conference on Machine Learning, 2011

Examples

>>> from mvlearn.datasets import load_UCImultifeature
>>> from mvlearn.cluster import MultiviewSpectralClustering
>>> from sklearn.metrics import normalized_mutual_info_score as nmi_score
>>> # Get 5-class data
>>> data, labels = load_UCImultifeature(select_labeled = list(range(5)))
>>> mv_data = data[:2]  # first 2 views only
>>> mv_spectral = MultiviewSpectralClustering(n_clusters=5,
...     random_state=10, n_init=100)
>>> mv_clusters = mv_spectral.fit_predict(mv_data)
>>> nmi = nmi_score(labels, mv_clusters)
>>> print('{0:.3f}'.format(nmi))
0.872

fit(Xs, y=None)[source]

Performs clustering on the multiple views of data.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) This list must be of size n_views, corresponding to the number of views of data. Each view can have a different number of features, but they must have the same number of samples. y : Ignored Not used, present for API consistency by convention. self : returns an instance of self.
fit_predict(Xs, y=None)

A method for fitting then predicting cluster assignments.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) A list of different views to fit the model on. y : array-like, shape (n_samples,) Labels for each sample. Only used by supervised algorithms. labels : array-like, shape (n_samples,) The predicted cluster labels for each sample.
predict(Xs)

A method to predict cluster labels of multiview data. Parameters ---------- Xs : list of array-likes or numpy.ndarray

• Xs length: n_views
• Xs[i] shape: (n_samples, n_features_i)

A list of different views to cluster.

Returns: labels : array-like, shape (n_samples,) Returns the predicted cluster labels for each sample.

## Co-Regularized Multiview Spectral Clustering¶

class mvlearn.cluster.MultiviewCoRegSpectralClustering(n_clusters=2, v_lambda=2, random_state=None, info_view=None, max_iter=10, n_init=10, affinity='rbf', gamma=None, n_neighbors=10)[source]

An implementation of co-regularized multi-view spectral clustering based on an unsupervied version of the co-training framework. This algorithm uses the pairwise co-regularization scheme as described in [2]. This algorithm can handle 2 or more views of data.

Parameters: n_clusters : int The number of clusters v_lambda : float, optional, default=2 The regularization parameter. This parameter trades-off the spectral clustering objectives with the degree of agreement between each pair of views in the new representation. Must be a positive value. random_state : int, optional, default=None Determines random number generation for k-means. info_view : int, optional, default=None The most informative view. Must be between 0 and n_views-1 If given, then the final clustering will be performed on the designated view alone. Otherwise, the algorithm will concatenate across all views and cluster on the result. max_iter : int, optional, default=10 The maximum number of iterations to run the clustering algorithm. n_init : int, optional, default=10 The number of random initializations to use for k-means clustering. affinity : string, optional, default='rbf' The affinity metric used to construct the affinity matrix. Options include 'rbf' (radial basis function), 'nearest_neighbors', and 'poly' (polynomial) gamma : float, optional, default=None Kernel coefficient for rbf and polynomial kernels. If None then gamma is computed as 1 / (2 * median(pair_wise_distances(X))^2) for each data view X. n_neighbors : int, optional, default=10 Only used if nearest neighbors is selected for affinity. The number of neighbors to use for the nearest neighbors kernel.

Attributes

 labels_ (array-like, shape (n_samples,)) Cluster labels for each point. embedding_ (array-like, shape (n_samples, n_clusters)) The final spectral representation of the data to be used as input for the KMeans clustering step. objective_ (array-like, shape (n_views, n_iterations)) The value of the spectral clustering objective for each view at the end of each iteration.

Notes

In standard spectral clustering, the eigenvector matrix U for a given view is the new data representation to be used for the subsequent k-means clustering stage. In this algorithm, the objective function has been altered to encourage the pairwise similarities of examples under the new representation to be similar across all views.

The modified spectral clustering objective for the case of two views is shown and derived in [#4Clu]. In the clustering objective, the hyperparameter lambda trades-off the spectral clustering objectives and the disagreement term.

For a fixed lambda and n, the objective function is bounded from above and non-decreasing. As such, the algorithm is guaranteed to converge.

References

 [2] Kumar A, Rai P, Daumé H (2011) Co-regularized multi-view spectral clustering. Adv Neural Inform Process Syst 24:1413–1421

Examples

>>> from mvlearn.datasets import load_UCImultifeature
>>> from mvlearn.cluster import MultiviewCoRegSpectralClustering
>>> from sklearn.metrics import normalized_mutual_info_score as nmi_score
>>> # Get 5-class data
>>> data, labels = load_UCImultifeature(select_labeled = list(range(5)))
>>> mv_data = data[:2]  # first 2 views only
>>> mv_spectral = MultiviewCoRegSpectralClustering(n_clusters=5,
...     random_state=10, n_init=100)
>>> mv_clusters = mv_spectral.fit_predict(mv_data)
>>> nmi = nmi_score(labels, mv_clusters, average_method='arithmetic')
>>> print('{0:.3f}'.format(nmi))
0.663

fit(Xs)[source]

Performs clustering on the multiple views of data.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) This list must be of size n_views, corresponding to the number of views of data. Each view can have a different number of features, but they must have the same number of samples. self : returns an instance of self.
fit_predict(Xs, y=None)[source]

Performs clustering on the multiple views of data and returns the cluster labels.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) This list must be of size n_views, corresponding to the number of views of data. Each view can have a different number of features, but they must have the same number of samples. y : ignored Included for API compliance. labels : array-like, shape (n_samples,) The predicted cluster labels for each sample.
predict(Xs)

A method to predict cluster labels of multiview data. Parameters ---------- Xs : list of array-likes or numpy.ndarray

• Xs length: n_views
• Xs[i] shape: (n_samples, n_features_i)

A list of different views to cluster.

Returns: labels : array-like, shape (n_samples,) Returns the predicted cluster labels for each sample.

## Multiview K Means¶

class mvlearn.cluster.MultiviewKMeans(n_clusters=2, random_state=None, init='k-means++', patience=5, max_iter=300, n_init=5, tol=0.0001, n_jobs=None)[source]

This class implements multi-view k-means using the co-EM framework as described in [3]. This algorithm is most suitable for cases in which the different views of data are conditionally independent. Additionally, this can be effective when the dataset naturally contains features that are of 2 different data types, such as continuous features and categorical features [4], and then the original features are separated into two views in this way.

This algorithm currently handles two views of data.

Parameters: n_clusters : int, optional, default=2 The number of clusters random_state : int, optional, default=None Determines random number generation for initializing centroids. Can seed the random number generator with an int. init : {'k-means++', 'random'} or list of array-likes, default='k-means++' Method of initializing centroids. 'k-means++': selects initial cluster centers for k-means clustering via a method that speeds up convergence. 'random': choose n_cluster samples from the data for the initial centroids. If a list of array-likes is passed, the list should have a length of equal to the number of views. Each of the array-likes should have the shape (n_clusters, n_features_i) for the ith view, where n_features_i is the number of features in the ith view of the input data. patience : int, optional, default=5 The number of EM iterations with no decrease in the objective function after which the algorithm will terminate. max_iter : int, optional, default=300 The maximum number of EM iterations to run before termination. n_init : int, optional, default=5 Number of times the k-means algorithm will run on different centroid seeds. The final result will be the best output of n_init runs with respect to total inertia across all views. tol : float, default=1e-4 Relative tolerance with regards to inertia to declare convergence. n_jobs : int, default=None The number of jobs to use for computation. This works by computing each of the n_init runs in parallel. None means 1. -1 means using all processors.

Attributes

 labels_ (array-like, shape (n_samples)) Cluster labels for each sample in the fitted data. centroids_ (list of array-likes) - centroids_ length: n_views - centroids_[i] shape: (n_clusters, n_features_i) The cluster centroids for each of the two views. centroids_[0] corresponds to the centroids of view 1 and centroids_[1] corresponds to the centroids of view 2.

Notes

Multi-view k-means clustering adapts the traditional k-means clustering algorithm to handle two views of data. This algorithm requires that a conditional independence assumption between views holds true. In cases where both views are informative and conditionally independent, multi-view k-means clustering can outperform its single-view analog run on a concatenated version of the two views of data. This is quite useful for applications where you wish to cluster data from two different modalities or data with features that naturally fall into two different partitions. Multi-view k-means works by iteratively performing the maximization and expectation steps of traditional EM in one view, and then using the computed hidden variables as the input for the maximization step in the other view. This algorithm, referred to as Co-EM, is described below.

Co-EM Algorithm

Input: Unlabeled data D with 2 views

1. Initialize $$\Theta_0^{(2)}$$, T, $$t = 0$$.

2. E step for view 2: compute expectation for hidden variables given

3. Loop until stopping criterion is true:

1. For v = 1 ... 2:

1. $$t = t + 1$$
2. M step view v: Find model parameters $$\Theta_t^{(v)}$$

that maximize the likelihood for the data given the expected values for hidden variables of view $$\overline{v}$$ of iteration $$t$$ - 1

1. E step view $$v$$: compute expectation for hidden

variables given the model parameters $$\Theta_t^{(v)}$$

4. return combined $$\hat{\Theta} = \Theta_{t-1}^{(1)} \cup \Theta_t^{(2)}$$

The final assignment of examples to partitions is performed by assigning each example to the cluster with the largest averaged posterior probability over both views.

References

 [3] (1, 2) Bickel S, Scheffer T (2004) Multi-view clustering. Proceedings of the 4th IEEE International Conference on Data Mining, pp. 19–26
 [4] (1, 2, 3) Chao, Guoqing, Shiliang Sun, and Jinbo Bi. "A survey on multi-view clustering." arXiv preprint arXiv:1712.06246 (2017).

Examples

>>> from mvlearn.datasets import load_UCImultifeature
>>> from mvlearn.cluster import MultiviewKMeans
>>> from sklearn.metrics import normalized_mutual_info_score as nmi_score
>>> # Get 5-class data
>>> data, labels = load_UCImultifeature(select_labeled = list(range(5)))
>>> mv_data = data[:2]  # first 2 views only
>>> mv_kmeans = MultiviewKMeans(n_clusters=5, random_state=10)
>>> mv_clusters = mv_kmeans.fit_predict(mv_data)
>>> nmi = nmi_score(labels, mv_clusters)
>>> print('{0:.3f}'.format(nmi))
0.770


""

fit(Xs, y=None)[source]

Fit the cluster centroids to the data.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) This list must be of size 2, corresponding to the two views of the data. The two views can each have a different number of features, but they must have the same number of samples. y : Ignored Not used, present for API consistency by convention. self : returns an instance of self.
predict(Xs)[source]

Predict the cluster labels for the data.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) This list must be of size 2, corresponding to the two views of the data. The two views can each have a different number of features, but they must have the same number of samples. labels : array-like, shape (n_samples,) The predicted cluster labels for each sample.
fit_predict(Xs, y=None)

A method for fitting then predicting cluster assignments.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) A list of different views to fit the model on. y : array-like, shape (n_samples,) Labels for each sample. Only used by supervised algorithms. labels : array-like, shape (n_samples,) The predicted cluster labels for each sample.

## Multiview Spherical K Means¶

class mvlearn.cluster.MultiviewSphericalKMeans(n_clusters=2, random_state=None, init='k-means++', patience=5, max_iter=None, n_init=5, tol=0.0001, n_jobs=None)[source]

An implementation of multi-view spherical K-Means using the co-EM framework as described in [3]. This algorithm is most suitable for cases in which the different views of data are conditionally independent. Additionally, this can be effective when the dataset naturally contains features that are of 2 different data types, such as continuous features and categorical features [4], and then the original features are separated into two views in this way.

This algorithm currently handles two views of data.

Parameters: n_clusters : int, optional, default=2 The number of clusters random_state : int, optional, default=None Determines random number generation for initializing centroids. Can seed the random number generator with an int. init : {'k-means++', 'random'} or list of array-likes, default='k-means++' Method of initializing centroids. 'k-means++': selects initial cluster centers for k-means clustering via a method that speeds up convergence. 'random': choose n_cluster samples from the data for the initial centroids. If a list of array-likes is passed, the list should have a length of equal to the number of views. Each of the array-likes should have the shape (n_clusters, n_features_i) for the ith view, where n_features_i is the number of features in the ith view of the input data. patience : int, optional, default=5 The number of EM iterations with no decrease in the objective function after which the algorithm will terminate. max_iter : int, optional, default=None The maximum number of EM iterations to run before termination. n_init : int, optional, default=5 Number of times the k-means algorithm will run on different centroid seeds. The final result will be the best output of n_init runs with respect to total inertia across all views. tol : float, default=1e-4 Relative tolerance with regards to inertia to declare convergence. n_jobs : int, default=None The number of jobs to use for computation. This works by computing each of the n_init runs in parallel. None means 1. -1 means using all processors.

Attributes

 labels_ (array-like, shape (n_samples)) Cluster labels for each sample in the fitted data. centroids_ (list of array-likes) - centroids_ length: n_views - centroids_[i] shape: (n_clusters, n_features_i) The cluster centroids for each of the two views. centroids_[0] corresponds to the centroids of view 1 and centroids_[1] corresponds to the centroids of view 2.

Notes

Multi-view spherical k-means clustering adapts the traditional spherical kmeans clustering algorithm to handle two views of data. This algorithm is similar to the mult-view k-means algorithm, except it uses cosine distance instead of euclidean distance for the purposes of computing the optimization objective and making assignments. This algorithm requires that a conditional independence assumption between views holds true. In cases where both views are informative and conditionally independent, multi-view spherical k-means clustering can outperform its single-view analog run on a concatenated version of the two views of data. This is quite useful for applications where you wish to cluster data from two different modalities or data with features that naturally fall into two different partitions. Multi-view spherical k-means works by iteratively performing the maximization and expectation steps of traditional EM in one view, and then using the computed hidden variables as the input for the maximization step in the other view. This algorithm is described in the section for multi-view k-means clustering.

Examples

>>> from mvlearn.datasets import load_UCImultifeature
>>> from mvlearn.cluster import MultiviewSphericalKMeans
>>> from sklearn.metrics import normalized_mutual_info_score as nmi_score
>>> # Get 5-class data
>>> data, labels = load_UCImultifeature(select_labeled = list(range(5)))
>>> mv_data = data[:2]  # first 2 views only
>>> mv_kmeans = MultiviewSphericalKMeans(n_clusters=5, random_state=5)
>>> mv_clusters = mv_kmeans.fit_predict(mv_data)
>>> # Compute nmi between true class labels and multi-view cluster labels
>>> nmi = nmi_score(labels, mv_clusters)
>>> print('{0:.3f}'.format(nmi))
0.823

fit(Xs, y=None)[source]

Fit the cluster centroids to the data.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) This list must be of size 2, corresponding to the two views of the data. The two views can each have a different number of features, but they must have the same number of samples. y : Ignored Not used, present for API consistency by convention. self : returns an instance of self.
fit_predict(Xs, y=None)

A method for fitting then predicting cluster assignments.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) A list of different views to fit the model on. y : array-like, shape (n_samples,) Labels for each sample. Only used by supervised algorithms. labels : array-like, shape (n_samples,) The predicted cluster labels for each sample.
predict(Xs)

Predict the cluster labels for the data.

Parameters: Xs : list of array-likes or numpy.ndarray Xs length: n_views Xs[i] shape: (n_samples, n_features_i) This list must be of size 2, corresponding to the two views of the data. The two views can each have a different number of features, but they must have the same number of samples. labels : array-like, shape (n_samples,) The predicted cluster labels for each sample.