Multiview Datasets

UCI multiple feature dataset (located here)

mvlearn.datasets.load_UCImultifeature(select_labeled='all', views='all', shuffle=False, random_state=None)[source]

Load the UCI multiple features dataset [1], taken from the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Multiple+Features. This data set consists of 6 views of handwritten digit images, with classes 0-9. The 6 views are the following:

  1. 76 Fourier coefficients of the character shapes
  2. 216 profile correlations
  3. 64 Karhunen-Love coefficients
  4. 240 pixel averages of the images from 2x3 windows
  5. 47 Zernike moments
  6. 6 morphological features

Each class contains 200 labeled examples.

Parameters:

select_labeled : optional, array-like, shape (n_features,) default (all)

A list of the examples that the user wants by label. If not specified, all examples in the dataset are returned. Repeated labels are ignored.

views : optional, array-like, shape (n_views,) default (all)

A list of the data views that the user would like in the indicated order. If not specified, all data views will be returned. Repeated views are ignored.

shuffle : bool, default=False

If True, returns each array with its rows and corresponding labels shuffled randomly according to random_state.

random_state : int, default=None

Determines the order data is shuffled if shuffle=True. Used so that data loaded is reproducible but shuffled.

Returns:

data : list of np.ndarray, each of size (200*num_classes, n_features)

List of length 6 with each element being the data for one of the views.

labels : np.ndarray

Array of labels for the digit

References

[1]M. van Breukelen, R.P.W. Duin, D.M.J. Tax, and J.E. den Hartog, Handwritten digit recognition by combined classifiers, Kybernetika, vol. 34, no. 4, 1998, 381-386

Examples

>>> from mvlearn.datasets import load_UCImultifeature
>>> # Load 6-view dataset with all 10 classes
>>> mv_data, labels = load_UCImultifeature()
>>> print(len(mv_data))
6
>>> print([mv_data[i].shape for i in range(6)])
[(2000, 76), (2000, 216), (2000, 64), (2000, 240), (2000, 47), (2000, 6)]
>>> print(labels.shape)
(2000,)

Data Simulator

class mvlearn.datasets.GaussianMixture(n_samples, centers, covariances, class_probs=None, random_state=None, shuffle=False, shuffle_random_state=None, seed=1)[source]

Creates an object with a fixed latent variable sampled from a (potentially) multivariate Gaussian distribution according to the specified parameters and class probability priors.

Parameters:

n_samples : int

The number of points in each view, divided across Gaussians per class_probs.

centers : 1D array-like or list of 1D array-likes

The mean(s) of the Gaussian(s) from which the latent points are sampled. If is a list of 1D array-likes, each is the mean of a distinct Gaussian, sampled from with probability given by class_probs. Otherwise is the mean of a single Gaussian from which all are sampled.

covariances : 2D array-like or list of 2D array-likes

The covariance matrix(s) of the Gaussian(s), matched to the specified centers.

class_probs : array-like, default=None

A list of probabilities specifying the probability of a latent point being sampled from each of the Gaussians. Must sum to 1. If None, then is taken to be uniform over the Gaussians.

random_state : int, default=None

If set, can be used to reproduce the data generated.

shuffle : bool, default=False

If True, data is shuffled so the labels are not ordered.

shuffle_random_state : int, default=None

If given, then sets the random state for shuffling the samples. Ignored if shuffle=False.

Attributes

latent_ (np.ndarray, of shape (n_samples, n_dims)) Latent distribution data. latent[i] is randomly sampled from a gaussian distribution with mean centers[i] and covariance covariances[i].
y_ (np.ndarray, of shape (n_samples)) Integer labels denoting which Gaussian distribution each sample came from.
Xs_ (list of array-like, of shape (2, n_samples, n_dims)) List of views of data created by transforming the latent.
centers (ndarray of shape (n_classes, n_dims)) The mean(s) of the Gaussian(s) from which the latent points are sampled.
covariances (ndarray of shape (n_classes, n_dims, n_dims)) The covariance matrix(s) of the Gaussian(s).
class_probs_ (array-like of shape (n_classes,)) A list correponding to the fraction of samples from each class and whose entries sum to 1.

Notes

For each class \(i\) with prior probability \(p_i\), center and covariance matrix \(\mu_i\) and \(\Sigma_i\), and \(n\) total samples, the latent data is sampled such that:

\[(X_1, y_1), \dots, (X_{np_i}, Y_{np_i}) \overset{i.i.d.}{\sim} \mathcal{N}(\mu_i, \Sigma_i)\]

Examples

>>> from mvlearn.datasets import GaussianMixture
>>> import numpy as np
>>> n_samples = 10
>>> centers = [[0,1], [0,-1]]
>>> covariances = [np.eye(2), np.eye(2)]
>>> GM = GaussianMixture(n_samples, centers, covariances,
...                      shuffle=True, shuffle_random_state=42)
>>> GM = GM.sample_views(transform='poly', n_noise=2)
>>> Xs, y = GM.get_Xy()
>>> print(y)
[1. 0. 1. 0. 1. 0. 1. 0. 0. 1.]
sample_views(transform='linear', n_noise=1)[source]

Transforms one latent view by specified transformation and adds noise.

Parameters:

transform : function or one of {'linear', 'sin', poly'},

default = 'linear' Transformation to perform on the latent variable. If a function, applies it to the latent. Otherwise uses an implemented function.

n_noise : int, default = 1

number of noise dimensions to add to transformed latent

Returns:

self : returns an instance of self

get_Xy(latents=False)[source]

Returns the sampled views or latent variables.

Parameters:

latents : boolean, default=False

If true, returns the latent variables rather than the transformed views.

Returns:

(Xs, y) : the transformed views and their labels. If latents=True,

returns the latent variables instead of Xs.