Multiview Datasets¶
UCI multiple feature dataset (located here)¶

mvlearn.datasets.
load_UCImultifeature
(select_labeled='all', views='all', shuffle=False, random_state=None)[source]¶ Load the UCI multiple features dataset [1], taken from the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Multiple+Features. This data set consists of 6 views of handwritten digit images, with classes 09. The 6 views are the following:
 76 Fourier coefficients of the character shapes
 216 profile correlations
 64 KarhunenLove coefficients
 240 pixel averages of the images from 2x3 windows
 47 Zernike moments
 6 morphological features
Each class contains 200 labeled examples.
Parameters: select_labeled : optional, arraylike, shape (n_features,) default (all)
A list of the examples that the user wants by label. If not specified, all examples in the dataset are returned. Repeated labels are ignored.
views : optional, arraylike, shape (n_views,) default (all)
A list of the data views that the user would like in the indicated order. If not specified, all data views will be returned. Repeated views are ignored.
shuffle : bool, default=False
If
True
, returns each array with its rows and corresponding labels shuffled randomly according torandom_state
.random_state : int, default=None
Determines the order data is shuffled if
shuffle=True
. Used so that data loaded is reproducible but shuffled.Returns: data : list of np.ndarray, each of size (200*num_classes, n_features)
List of length 6 with each element being the data for one of the views.
labels : np.ndarray
Array of labels for the digit
References
[1] M. van Breukelen, R.P.W. Duin, D.M.J. Tax, and J.E. den Hartog, Handwritten digit recognition by combined classifiers, Kybernetika, vol. 34, no. 4, 1998, 381386 Examples
>>> from mvlearn.datasets import load_UCImultifeature >>> # Load 6view dataset with all 10 classes >>> mv_data, labels = load_UCImultifeature() >>> print(len(mv_data)) 6 >>> print([mv_data[i].shape for i in range(6)]) [(2000, 76), (2000, 216), (2000, 64), (2000, 240), (2000, 47), (2000, 6)] >>> print(labels.shape) (2000,)
Data Simulator¶

class
mvlearn.datasets.
GaussianMixture
(n_samples, centers, covariances, class_probs=None, random_state=None, shuffle=False, shuffle_random_state=None, seed=1)[source]¶ Creates an object with a fixed latent variable sampled from a (potentially) multivariate Gaussian distribution according to the specified parameters and class probability priors.
Parameters: n_samples : int
The number of points in each view, divided across Gaussians per class_probs.
centers : 1D arraylike or list of 1D arraylikes
The mean(s) of the Gaussian(s) from which the latent points are sampled. If is a list of 1D arraylikes, each is the mean of a distinct Gaussian, sampled from with probability given by class_probs. Otherwise is the mean of a single Gaussian from which all are sampled.
covariances : 2D arraylike or list of 2D arraylikes
The covariance matrix(s) of the Gaussian(s), matched to the specified centers.
class_probs : arraylike, default=None
A list of probabilities specifying the probability of a latent point being sampled from each of the Gaussians. Must sum to 1. If None, then is taken to be uniform over the Gaussians.
random_state : int, default=None
If set, can be used to reproduce the data generated.
shuffle : bool, default=False
If
True
, data is shuffled so the labels are not ordered.shuffle_random_state : int, default=None
If given, then sets the random state for shuffling the samples. Ignored if
shuffle=False
.Attributes
latent_ (np.ndarray, of shape (n_samples, n_dims)) Latent distribution data. latent[i] is randomly sampled from a gaussian distribution with mean centers[i] and covariance covariances[i]. y_ (np.ndarray, of shape (n_samples)) Integer labels denoting which Gaussian distribution each sample came from. Xs_ (list of arraylike, of shape (2, n_samples, n_dims)) List of views of data created by transforming the latent. centers (ndarray of shape (n_classes, n_dims)) The mean(s) of the Gaussian(s) from which the latent points are sampled. covariances (ndarray of shape (n_classes, n_dims, n_dims)) The covariance matrix(s) of the Gaussian(s). class_probs_ (arraylike of shape (n_classes,)) A list correponding to the fraction of samples from each class and whose entries sum to 1. Notes
For each class \(i\) with prior probability \(p_i\), center and covariance matrix \(\mu_i\) and \(\Sigma_i\), and \(n\) total samples, the latent data is sampled such that:
\[(X_1, y_1), \dots, (X_{np_i}, Y_{np_i}) \overset{i.i.d.}{\sim} \mathcal{N}(\mu_i, \Sigma_i)\]Examples
>>> from mvlearn.datasets import GaussianMixture >>> import numpy as np >>> n_samples = 10 >>> centers = [[0,1], [0,1]] >>> covariances = [np.eye(2), np.eye(2)] >>> GM = GaussianMixture(n_samples, centers, covariances, ... shuffle=True, shuffle_random_state=42) >>> GM = GM.sample_views(transform='poly', n_noise=2) >>> Xs, y = GM.get_Xy() >>> print(y) [1. 0. 1. 0. 1. 0. 1. 0. 0. 1.]

sample_views
(transform='linear', n_noise=1)[source]¶ Transforms one latent view by specified transformation and adds noise.
Parameters: transform : function or one of {'linear', 'sin', poly'},
default = 'linear' Transformation to perform on the latent variable. If a function, applies it to the latent. Otherwise uses an implemented function.
n_noise : int, default = 1
number of noise dimensions to add to transformed latent
Returns: self : returns an instance of self

get_Xy
(latents=False)[source]¶ Returns the sampled views or latent variables.
Parameters: latents : boolean, default=False
If true, returns the latent variables rather than the transformed views.
Returns: (Xs, y) : the transformed views and their labels. If latents=True,
returns the latent variables instead of Xs.
