View Embedding¶
Generalized Canonical Correlation Analysis¶

class
mvlearn.embed.
GCCA
(n_components=None, fraction_var=None, sv_tolerance=None, n_elbows=2, tall=False, max_rank=False, n_jobs=None)[source]¶ An implementation of Generalized Canonical Correlation Analysis [1] suitable for cases where the number of features exceeds the number of samples by first applying single view dimensionality reduction. Computes individual projections into a common subspace such that the correlations between pairwise projections are minimized (ie. maximize pairwise correlation). An important note: this is applicable to any number of views, not just two.
Parameters: n_components : int (positive), optional, default=None
If
self.sv_tolerance=None
, selects the number of SVD components to keep for each view. If none, another selection method is used.fraction_var : float, default=None
If
self.sv_tolerance=None
, andself.n_components=None
, selects the number of SVD components to keep for each view by capturing enough of the variance. If none, another selection method is used.sv_tolerance : float, optional, default=None
Selects the number of SVD components to keep for each view by thresholding singular values. If none, another selection method is used.
n_elbows : int, optional, default: 2
If
self.fraction_var=None
,self.sv_tolerance=None
, andself.n_components=None
, then compute the optimal embedding dimension usingutils.select_dimension()
. Otherwise, ignored.tall : boolean, default=False
Set to true if n_samples > n_features, speeds up SVD
max_rank : boolean, default=False
If true, sets the rank of the common latent space as the maximum rank of the individual spaces. If false, uses the minimum individual rank.
n_jobs : int (positive), default=None
The number of jobs to run in parallel when computing the SVDs for each view in fit and partial_fit. None means 1 job, 1 means using all processors.
Attributes
projection_mats_ (list of arrays) A projection matrix for each view, from the given space to the latent space ranks_ (list of ints) Number of left singular vectors kept for each view during the first SVD Notes
Consider two views \(X_1\) and \(X_2\). Canonical Correlation Analysis seeks to find vectors \(a_1\) and \(a_2\) to maximize the correlation \(X_1 a_1\) and \(X_2 a_2\), expanded below.
\[\left(\frac{a_1^TC_{12}a_2} {\sqrt{a_1^TC_{11}a_1a_2^TC_{22}a_2}} \right)\]where \(C_{11}\), \(C_{22}\), and \(C_{12}\) are respectively the view 1, view 2, and between view covariance matrix estimates. GCCA maximizes the sum of these correlations across all pairwise views and computes a set of linearly independent components. This specific algorithm first applies principal component analysis (PCA) independently to each view and then aligns the most informative projections to find correlated and informative subspaces. Parameters that control the embedding dimension apply to the PCA step. The dimension of each aligned subspace is the maximum or minimum of the individual dimensions, per the max_ranks parameter. Using the maximum will capture the most information from all views but also noise from some views. Using the minimum will better remove noise dimensions but at the cost of information from some views.
References
[1] B. AfshinPour, G.A. HosseinZadeh, S.C. Strother, H. SoltanianZadeh. Enhancing reproducibility of fMRI statistical maps using generalized canonical correlation analysis in NPAIRS framework. Neuroimage, 60 (2012), pp. 19701981 Examples
>>> from mvlearn.datasets import load_UCImultifeature >>> from mvlearn.embed import GCCA >>> # Load full dataset, labels not needed >>> Xs, _ = load_UCImultifeature() >>> gcca = GCCA(fraction_var = 0.9) >>> # Transform the first 5 views >>> Xs_latents = gcca.fit_transform(Xs[:5]) >>> print([X.shape[1] for X in Xs_latents]) [9, 9, 9, 9, 9]

fit
(Xs, y=None)[source]¶ Calculates a projection from each view to a latent space such that the sum of pairwise latent space correlations is maximized. Each view 'X' is normalized and the left singular vectors of 'X^T X' are calculated using SVD. The number of singular vectors kept is determined by either the percent variance explained, a given rank threshold, or a given number of components. The singular vectors kept are concatenated and SVD of that is taken and used to calculated projections for each view.
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
The data to fit to. Each view will receive its own embedding.
y : ignored
Included for API compliance.
Returns: self : returns an instance of self.

partial_fit
(Xs, reset=False, multiview_step=True)[source]¶ Performs like fit, but will not overwrite previously fitted single views and instead uses them as well as the new data. Useful if the data needs to be processed in batches.
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
The data to fit to. Each view will receive its own embedding.
reset : boolean (default = False)
If True, overwrites all prior computations.
multiview_step : boolean, (default = True)
If True, performs the joint SVD step on the results from individual views. Must be set to True in the final call.
Returns: self : returns an instance of self.

transform
(Xs, view_idx=None)[source]¶ Embeds data matrix(s) using the fitted projection matrices. May be used for outofsample embeddings.
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
A list of data matrices from each view to transform based on the prior fit function. If view_idx is defined, then Xs is a 2D data matrix corresponding to a single view.
view_idx : int, default=None
For transformation of a single view. If not None, then Xs is 2D and views_idx specifies the index of the view from which Xs comes from.
Returns: Xs_transformed : list of arraylikes or arraylike
Same shape as Xs

fit_transform
(Xs, y=None)¶ Fit an embedder to the data and transform the data
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
y : array, shape (n_samples,), optional
Targets to be used if fitting the algorithm is supervised.
Returns: X_transformed : list of arraylikes
 X_transformed length: n_views
 X_transformed[i] shape: (n_samples, n_components_i)

Kernel Canonical Correlation Analysis¶

class
mvlearn.embed.
KCCA
(n_components=2, ktype='linear', constant=0.1, sigma=1.0, degree=2.0, reg=0.1, decomp='full', method='kettenringlike', mrank=2, precision=1e06)[source]¶ The kernel canonical correlation analysis (KCCA) is a method that generalizes the classical linear canonical correlation analysis (CCA) to nonlinear setting. It allows us to depict the nonlinear relation of two sets of variables and enables applications of classical multivariate data analysis originally constrained to linearity relation (CCA).
If the linear kernel is used, this is equivalent to CCA.
Parameters: n_components : int, default = 2
Number of canonical dimensions to keep
ktype : string, default = 'linear'
Type of kernel. If 'linear', KCCA is equivalent to CCA.  value can be 'linear', 'gaussian' or 'poly'
constant : float, default = 1.0
Balances impact of lowerdegree terms in Polynomial kernel
sigma : float, default = 1.0
Standard deviation of Gaussian kernel
degree : float, default = 2.0
Degree of Polynomial kernel
reg : float, default = 0.1
Regularization parameter
decomp : string, default = 'full'
Decomposition type. Incomplete Cholesky Decomposition (ICD) can reduce computation times and storage  value can be 'full' or 'icd'
method : string, default = 'kettenringlike'
Decomposition method  value can be only be 'kettenringlike'
mrank : int, default = 2
The rank of the ICD approximated kernel matrix
precision: float, default = 0.000001
Precision of computing the ICD kernel matrix
Attributes
weights_ (list of arraylikes) Canonical weights for each view. Notes
This class implements kernel canonical correlation analysis as described in [2] and [3].
Traditional CCA aims to find useful projections of the high dimensional variable sets onto the compact linear representations, the canonical components (components_).
Each resulting canonical variate is computed from the weighted sum of every original variable indicated by the canonical weights (weights_).
The canonical correlation quantifies the linear correspondence between the two views of data based on Pearson’s correlation between their canonical components.
Canonical correlation can be seen as a metric of successful joint information reduction between two views and, therefore, routinely serves as a performance measure for CCA.
CCA may not extract useful descriptors of the data because of its linearity. kCCA offers an alternative solution by first projecting the data onto a higher dimensional feature space.
\[\phi: \mathbf{x} = (x_1,...,x_m) \mapsto \phi(\mathbf{x}) = (\phi(x_1),...,\phi(x_N)), (m < N)\]before performing CCA in the new feature space.
Kernels are methods of implicitly mapping data into a higher dimensional feature space, a method known as the kernel trick. A kernel function K, such that for all \(\mathbf{x}, \mathbf{z} \in X\),
\[K(\mathbf{x}, \mathbf{z}) = \langle\phi(\mathbf{x}) \cdot \phi(\mathbf{z})\rangle,\]where \(\phi\) is a mapping from X to feature space F.
The directions \(\mathbf{w_x}\) and \(\mathbf{w_y}\) (of length N) can be rewritten as the projection of the data onto the direction \(\alpha\) and \(\beta\) (of length m):
\[\mathbf{w_x} = X'\alpha\]\[\mathbf{w_y} = Y'\beta\]Letting \(K_x = XX'\) and \(K_x = XX'\) be the kernel matrices and adding a regularization term (\(\kappa\)) to prevent overfitting, we are effectively solving for:
\[\rho = \underset{\alpha,\beta}{\text{max}} \frac{\alpha'K_xK_y\beta} {\sqrt{(\alpha'K_x^2\alpha+\kappa\alpha'K_x\alpha) \cdot (\beta'K_y^2\beta + \kappa\beta'K_y\beta)}}\]Kernel matrices grow exponentially with the size of data. They not only have to store \(n^2\) elements, but also face the complexity of matrix eigenvalue problems. In a Cholesky decomposition a positive definite matrix A is decomposed to a lower triangular matrix \(L\) : \(A = LL'\).
The Incomplete Cholesky Decomposition (ICD) looks for a low rank approximation of \(L\) to reduce the cost of operations of the matrix such that \(A $\approx$ $\tilde{L}$$\tilde{L}$'\). The algorithm skips a column if its diagonal element is small. The diagonal elements to the right of the column being updated are also updated. To select a column to update, it finds the largest diagonal element and pivots the element to the current diagonal by exchanging the corresponding rows and columns. The algorithm ends when all diagonal elemnts are below a specified accuracy.
ICD with rank \(m\) yields storage requirements of \(O(mn)\) instead of \(O(n^2)\) and becomes \(O(nm^2)\) instead of \(O(n^3)\) [4]. Unlike full decomposition, ICD cannot be performed out of sample i.e you must fit and transform on the same data.
References
[2] D. R. Hardoon, S. Szedmak and J. ShaweTaylor, "Canonical Correlation Analysis: An Overview with Application to Learning Methods", Neural Computation, Volume 16 (12), Pages 26392664, 2004. [3] J. R. Kettenring, “Canonical analysis of several sets of variables,”Biometrika, vol.58, no.3, pp.433–451,1971. [4] M. I. Jordan, "Regularizing KCCA, Cholesky Decomposition", Lecture 9 Notes: CS281B/Stat241B, University of California, Berkeley. Examples
>>> import numpy as np >>> from mvlearn.embed.kcca import KCCA >>> np.random.seed(1) >>> # Define two latent variables >>> N = 100 >>> latvar1 = np.random.randn(N, ) >>> latvar2 = np.random.randn(N, ) >>> # Define independent components for each dataset >>> indep1 = np.random.randn(N, 3) >>> indep2 = np.random.randn(N, 4) >>> x = 0.25*indep1 + 0.75*np.vstack((latvar1, latvar2, latvar1)).T >>> y = 0.25*indep2 + 0.75*np.vstack((latvar1, latvar2, ... latvar1, latvar2)).T >>> Xs = [x, y] >>> Xs_train = [Xs[0][:80], Xs[1][:80]] >>> Xs_test = [Xs[0][80:], Xs[1][80:]] >>> kcca = KCCA(ktype ="linear", n_components = 3, reg = 0.01) >>> kcca.fit(Xs_train) >>> linear_transform = kcca.transform(Xs_test) >>> stats = kcca.get_stats() >>> # Print the correlations of first 3 transformed variates >>> # from the testing data >>> print(stats['r']) [0.85363047 0.91171037 0.06029391]

fit
(Xs, y=None)[source]¶ Creates kcca mapping by determining canonical weghts from Xs.
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
The data for kcca to fit to. Each sample will receive its own embedding.
y : ignored
Included for API compliance.
Returns: self : returns an instance of self

transform
(Xs)[source]¶ Uses KCCA weights to transform Xs into canonical components and calculates correlations.
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: 2
 Xs[i] shape: (n_samples, n_features_i)
The data for kcca to fit to. Each sample will receive its own embedding.
Returns: components_ : returns Xs_transformed, a list of numpy.ndarray
 Xs length: 2
 Xs[i] shape: (n_samples, n_samples)

get_stats
()[source]¶ Compute relevant statistics for the KCCA model after fitting and transforming.
Implementations of the statistics generally follow the code in the Matlab implementation of the function canoncorr.
Note: most statistics are only available if the linear kernel is used and decomposition method is full, i.e. self.ktype=='linear' and self.decomp='full'.
Returns: stats : dict
Dict containing the statistics, with the following keys:
 'r' : numpy.ndarray of shape (n_components,)
 Canonical correlations of each component.
 'Wilks' : numpy.ndarray of shape (n_components,)
 Wilks' Lambda likelihood ratio statistic. Only available if self.ktype == 'linear'.
 'df1' : numpy.ndarray of shape (n_components,)
 Degrees of freedom for the chisquared statistic, and the numerator degrees of freedom for the F statistic. Only available if self.ktype == 'linear'.
 'df2' : numpy.ndarray of shape (n_components,)
 Denominator degrees of freedom for the F statistic. Only available if self.ktype == 'linear'.
 'F' : numpy.ndarray of shape (n_components,)
 Rao's approximate F statistic for H_0(k). Only available if self.ktype == 'linear'.
 'pF' : numpy.ndarray of shape (n_components,)
 Righttail significance level for stats['F']. Only available if self.ktype == 'linear'.
 'chisq' : numpy.ndarray of shape (n_components,)
 Bartlett's approximate chisquared statistic for H_0(k) with Lawley's modification. Only available if self.ktype == 'linear'.
 'pChisq' : numpy.ndarray of shape (n_components,)
 Righttail significance level for stats['chisq']. Only available if self.ktype == 'linear'.

fit_transform
(Xs, y=None)¶ Fit an embedder to the data and transform the data
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
y : array, shape (n_samples,), optional
Targets to be used if fitting the algorithm is supervised.
Returns: X_transformed : list of arraylikes
 X_transformed length: n_views
 X_transformed[i] shape: (n_samples, n_components_i)

Deep Canonical Correlation Analysis¶

class
mvlearn.embed.
DCCA
(input_size1=None, input_size2=None, n_components=2, layer_sizes1=None, layer_sizes2=None, use_all_singular_values=False, device=device(type='cpu'), epoch_num=200, batch_size=800, learning_rate=0.001, reg_par=1e05, tolerance=0.001, print_train_log_info=False)[source]¶ An implementation of Deep Canonical Correlation Analysis [5] with PyTorch. It computes projections into a common subspace in order to maximize the correlation between pairwise projections into the subspace from two views of data. To obtain these projections, two fully connected deep networks are trained to initially transform the two views of data. Then, the transformed data is projected using linear CCA. This can be thought of as training a kernel for each view that initially acts on the data before projection. The networks are trained to maximize the ability of the linear CCA to maximize the correlation between the final dimensions.
Parameters: input_size1 : int (positive)
The dimensionality of the input vectors in view 1.
input_size2 : int (positive)
The dimensionality of the input vectors in view 2.
n_components : int (positive), default=2
The output dimensionality of the correlated projections. The deep network wil transform the data to this size. Must satisfy:
n_components
<= max(layer_sizes1[1], layer_sizes2[1]).layer_sizes1 : list of ints, default=None
The sizes of the layers of the deep network applied to view 1 before CCA. For example, if the input dimensionality is 256, and there is one hidden layer with 1024 units and the output dimensionality is 100 before applying CCA, layer_sizes1=[1024, 100]. If
None
, set to [1000,self.n_components_
].layer_sizes2 : list of ints, default=None
The sizes of the layers of the deep network applied to view 2 before CCA. Does not need to have the same hidden layer architecture as layer_sizes1, but the final dimensionality must be the same. If
None
, set to [1000,self.n_components_
].use_all_singular_values : boolean (default=False)
Whether or not to use all the singular values in the CCA computation to calculate the loss. If False, only the top
n_components
singular values are used.device : string, default='cpu'
The torch device for processing. Can be used with a GPU if available.
epoch_num : int (positive), default=200
The max number of epochs to train the deep networks.
batch_size : int (positive), default=800
Batch size for training the deep networks.
learning_rate : float (positive), default=1e3
Learning rate for training the deep networks.
reg_par : float (positive), default=1e5
Weight decay parameter used in the RMSprop optimizer.
tolerance : float, (positive), default=1e2
Threshold difference between successive iteration losses to define convergence and stop training.
print_train_log_info : boolean, default=False
If
True
, the training loss at each epoch will be printed to the console when DCCA.fit() is called.Attributes
input_size1_ (int (positive)) The dimensionality of the input vectors in view 1. input_size2_ (int (positive)) The dimensionality of the input vectors in view 2. n_components_ (int (positive)) The output dimensionality of the correlated projections. The deep network wil transform the data to this size. If not specified, will be set to 2. layer_sizes1_ (list of ints) The sizes of the layers of the deep network applied to view 1 before CCA. For example, if the input dimensionality is 256, and there is one hidden layer with 1024 units and the output dimensionality is 100 before applying CCA, layer_sizes1=[1024, 100]. layer_sizes2_ (list of ints) The sizes of the layers of the deep network applied to view 2 before CCA. Does not need to have the same hidden layer architecture as layer_sizes1, but the final dimensionality must be the same. device_ (string) The torch device for processing. batch_size_ (int (positive)) Batch size for training the deep networks. learning_rate_ (float (positive)) Learning rate for training the deep networks. reg_par_ (float (positive)) Weight decay parameter used in the RMSprop optimizer. deep_model_ ( DeepPairedNetworks
object) 2 view Deep CCA object used to transform 2 views of data together.linear_cca_ ( linear_cca
object) Linear CCA object used to project final transformations from output ofdeep_model
to then_components
.model_ (torch.nn.DataParallel object) Wrapper around deep_model
to allow parallelisation.loss_ ( cca_loss
object) Loss function fordeep_model
. Defined as the negative correlation between outputs of transformed views.optimizer_ (torch.optim.RMSprop object) Optimizer used to train the networks. Warns: In order to run DCCA, pytorch and other certain optional dependencies must
be installed. See the installation page for details.
Notes
Deep Canonical Correlation Analysis is a method of finding highly correlated subspaces for 2 views of data using nonlinear transformations learned by deep networks. It can be thought of as using deep networks to learn the best potentially nonlinear kernels for a variant of kernel CCA.
The networks used for each view in DCCA consist of fully connected linear layers with a sigmoid activation function.
The problem DCCA problem is formulated from [5]. Consider two views \(X_1\) and \(X_2\). DCCA seeks to find the parameters for each view, \(\Theta_1\) and \(\Theta_2\), such that they maximize
\[\text{corr}\left(f_1\left(X_1;\Theta_1\right), f_2\left(X_2;\Theta_2\right)\right)\]These parameters are estimated in the deep network by following gradient descent on the input data. Taking \(H_1, H_2 \in R^{o \times m}\) to be the outputs of the deep network in each column for the input data of size \(m\). Take the centered matrix \(\bar{H}_1 = H_1\frac{1}{m}H_1{1}\), and \(\bar{H}_2 = H_2\frac{1}{m}H_2{1}\). Then, define
\[\begin{split}\begin{align*} \hat{\Sigma}_{12} &= \frac{1}{m1}\bar{H}_1\bar{H}_2^T \\ \hat{\Sigma}_{11} &= \frac{1}{m1}\bar{H}_1\bar{H}_1^T + r_1I \\ \hat{\Sigma}_{22} &= \frac{1}{m1}\bar{H}_2\bar{H}_2^T + r_2I \end{align*}\end{split}\]Where \(r_1\) and \(r_2\) are regularization constants \(>0\) so the matrices are guaranteed to be positive definite.
The correlation objective function is the sum of the top \(k\) singular values of the matrix \(T\), where
\[T = \hat{\Sigma}_{11}^{1/2}\hat{\Sigma}_{12}\hat{\Sigma}_{22}^{1/2}\]Which is the matrix norm of T. Thus, the loss is
\[L(X_1, X2) = \text{corr}\left(H_1, H_2\right) = \text{tr}(T^TT)^{1/2}.\]References
[5] (1, 2, 3, 4) Andrew, G., Arora, R., Bilmes, J., & Livescu, K. (2013, February). Deep canonical correlation analysis. In International conference on machine learning (pp. 12471255). Examples
>>> from mvlearn.embed import DCCA >>> import numpy as np >>> # Exponential data as example of finding good correlation >>> view1 = np.random.normal(loc=2, size=(1000, 75)) >>> view2 = np.exp(view1) >>> view1_test = np.random.normal(loc=2, size=(200, 75)) >>> view2_test = np.exp(view1_test) >>> input_size1, input_size2 = 75, 75 >>> n_components = 2 >>> layer_sizes1 = [1024, 4] >>> layer_sizes2 = [1024, 4] >>> dcca = DCCA(input_size1, input_size2, n_components, layer_sizes1, ... layer_sizes2) >>> dcca = dcca.fit([view1, view2]) >>> outputs = dcca.transform([view1_test, view2_test]) >>> print(outputs[0].shape) (200, 2)

fit
(Xs, y=None)[source]¶ Fits the deep networks for each view such that the output of the linear CCA has maximum correlation.
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
The data to fit to. Each view will receive its own embedding.
y : ignored
Included for API compliance.
Returns: self : returns an instance of self.

transform
(Xs, return_loss=False)[source]¶ Embeds data matrix(s) using the trained deep networks and fitted CCA projection matrices. May be used for outofsample embeddings.
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
A list of data matrices from each view to transform based on the prior fit function. If view_idx defined, then Xs is a 2D data matrix corresponding to a single view.
Returns: Xs_transformed : list of arraylikes or arraylike
Transformed samples. Same structure as Xs, but potentially different n_features_i.
loss : float
Average loss over data, defined as negative correlation of transformed views. Only returned if
return_loss=True
.

fit_transform
(Xs, y=None)¶ Fit an embedder to the data and transform the data
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
y : array, shape (n_samples,), optional
Targets to be used if fitting the algorithm is supervised.
Returns: X_transformed : list of arraylikes
 X_transformed length: n_views
 X_transformed[i] shape: (n_samples, n_components_i)

Omnibus Embedding¶

class
mvlearn.embed.
Omnibus
(n_components=2, distance_metric='euclidean', normalize='l1', algorithm='randomized', n_iter=5)[source]¶ Omnibus computes the pairwise distances for each view. Each of these matrices is a n x n dissimilarity matrix where n is the number of rows in each view. Omnibus embedding [6] is then performed over the dissimilarity matrices and the computed embeddings are returned.
Parameters: n_components : strictly positive int (default = 2)
Desired dimensionality of output embeddings. See graspy docs for additional details.
distance_metric : string (default = 'euclidean')
Distance metric used to compute pairwise distances. Metrics must be found in sklearn.neighbors.DistanceMetric.
normalize : string or None (default = 'l1')
Normalize function to use on views before computing pairwise distances. Must be 'l2', 'l1', 'max' or None. If None, the distance matrices will not be normalized.
algorithm : string (default = 'randomized')
SVD solver to use. Must be 'full', 'randomized', or 'truncated'. See graspy docs for details.
n_iter : positive int (default = 5)
Number of iterations for randomized SVD solver. See graspy docs for details.
Attributes
embeddings_: list of arrays (default = None) List of Omnibus embeddings. One embedding matrix is provided per view. If fit() has not been called, embeddings_ is set to None. Notes
From an implementation perspective, omnibus embedding is performed using the GrasPy package's implementation graspy.embed.OmnibusEmbed for dissimilarity matrices.
References
[6] https://graspy.neurodata.io/tutorials/embedding/omnibus Examples
>>> from mvlearn.embed import omnibus >>> import numpy as np >>> # Create 2 random data views with feature sizes 50 and 100 >>> view1 = np.random.rand(1000, 50) >>> view2 = np.random.rand(1000, 100) >>> embedder = omnibus.Omnibus(n_components=3) >>> embeddings = embedder.fit_transform([view1, view2]) >>> view1_hat, view2_hat = embeddings >>> print(view1_hat.shape, view2_hat.shape) (1000, 3) (1000, 3)

fit
(Xs, y=None)[source]¶ Fit the model with Xs and apply the embedding on Xs. The embeddings are saved as a class attribute.
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
The data to embed based on the prior fit function. Each X in Xs will receive its own embedding.
y : ignored
Included for API compliance.

fit_transform
(Xs, y=None)[source]¶ Fit the model with Xs and apply the embedding on Xs using the fit() function. The resulting embeddings are returned.
Parameters: Xs : list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
The data to embed based on the prior fit function. Each X in Xs will receive its own embedding.
y : ignored
Included for API compliance.
Returns: embeddings : list of arrays
list of (n_samples, n_components) matrices for each X in Xs.

Multiview Multidimensional Scaling¶

class
mvlearn.embed.
MVMDS
(n_components=2, num_iter=15, dissimilarity='euclidean')[source]¶ An implementation of Classical Multiview Multidimensional Scaling for jointly reducing the dimensions of multiple views of data [7]. A Euclidean distance matrix is created for each view, double centered, and the k largest common eigenvectors between the matrices are found based on the stepwise estimation of common principal components. Using these common principal components, the views are jointly reduced and a single view of kdimensions is returned.
MVMDS is often a better alternative to PCA for multiview data. See the
tutorials
in the documentation.Parameters: n_components : int (positive), default=2
Represents the number of components that the user would like to be returned from the algorithm. This value must be greater than 0 and less than the number of samples within each view.
num_iter: int (positive), default=15
Number of iterations stepwise estimation goes through.
dissimilarity : {'euclidean', 'precomputed'}, default='euclidean'
Dissimilarity measure to use:
'euclidean': Pairwise Euclidean distances between points in the dataset.
'precomputed': Xs is treated as precomputed dissimilarity matrices.
Attributes
components_: numpy.ndarray, shape(n_samples, n_components) Joint transformed MVMDS components of the input views. Notes
Classical Multiview Multidimensional Scaling can be broken down into two steps. The first step involves calculating the Euclidean Distance matrices, \(Z_i\), for each of the \(k\) views and doublecentering these matrices through the following calculations:
\[\Sigma_{i}=\frac{1}{2}J_iZ_iJ_i\]\[\text{where }J_i=I_i{\frac {1}{n}}\mathbb{1}\mathbb{1}^T\]The second step involves finding the common principal components of the \(\Sigma\) matrices. These can be thought of as multiview generalizations of the principal components found in principal component analysis (PCA) given several covariance matrices. The central hypothesis of the common principal component model states that given k normal populations (views), their \(p\) x \(p\) covariance matrices \(\Sigma_{i}\), for \(i = 1,2,...,k\) are simultaneously diagonalizable as:
\[\Sigma_{i} = QD_i^2Q^T\]where \(Q\) is the common \(p\) x \(p\) orthogonal matrix and \(D_i^2\) are positive \(p\) x \(p\) diagonal matrices. The \(Q\) matrix contains all the common principal components. The common principal component, \(q_j\), is found by solving the minimization problem:
\[\text{Minimize} \sum_{i=1}^{k}n_ilog(q_j^TS_iq_j)\]\[\text{Subject to } q_j^Tq_j = 1\]where \(n_i\) represent the degrees of freedom and \(S_i\) represent sample covariance matrices.
This class does not support
MVMDS.transform()
due to the iterative nature of the algorithm and the fact that the transformation is done during iterative fitting. UseMVMDS.fit_transform()
to do both fitting and transforming at once.References
[7] Trendafilov, Nickolay T. “Stepwise Estimation of Common Principal Components.” Computational Statistics & Data Analysis, vol. 54, no. 12, 2010, pp. 3446–3457., doi:10.1016/j.csda.2010.03.010. [8] Samir KanaanIzquierdo, Andrey Ziyatdinov, Maria Araceli Burgueño, Alexandre PereraLluna, Multiview: a software package for multiview pattern recognition methods, Bioinformatics, Volume 35, Issue 16, 15 August 2019, Pages 2877–2879 Examples
>>> from mvlearn.embed import MVMDS >>> from mvlearn.datasets import load_UCImultifeature >>> Xs, _ = load_UCImultifeature() >>> print(len(Xs)) # number of samples in each view 6 >>> print(Xs[0].shape) # number of samples in each view (2000, 76) >>> mvmds = MVMDS(n_components=5) >>> Xs_reduced = mvmds.fit_transform(Xs) >>> print(Xs_reduced.shape) (2000, 5)

fit
(Xs, y=None)[source]¶ Calculates dimensionally reduced components by inputting the Euclidean distances of each view, double centering them, and using the _commonpcs function to find common components between views. Works similarly to traditional, singleview Multidimensional Scaling.
Parameters: Xs: list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
y : ignored
Included for API compliance.

fit_transform
(Xs, y=None)[source]¶ " Embeds data matrix(s) using fitted projection matrices
Parameters: Xs: list of arraylikes or numpy.ndarray
 Xs length: n_views
 Xs[i] shape: (n_samples, n_features_i)
The data to embed based on the fit function.
y : ignored
Included for API compliance.
Returns: X_transformed: numpy.ndarray, shape(n_samples, n_components)
Joint transformed MVMDS components of the input views.

Split Autoencoder¶

class
mvlearn.embed.
SplitAE
(hidden_size=64, num_hidden_layers=2, embed_size=20, training_epochs=10, batch_size=16, learning_rate=0.001, print_info=False, print_graph=True)[source]¶ Implements an autoencoder that creates an embedding of a view View1 and from that embedding reconstructs View1 and another view View2, as described in [9].
Parameters: hidden_size : int (default=64)
number of nodes in the hidden layers
num_hidden_layers : int (default=2)
number of hidden layers in each encoder or decoder net
embed_size : int (default=20)
size of the bottleneck vector in the autoencoder
training_epochs : int (default=10)
how many times the network trains on the full dataset
batch_size : int (default=16):
batch size while training the network
learning_rate : float (default=0.001)
learning rate of the Adam optimizer
print_info : bool (default=True)
whether or not to print errors as the network trains.
print_graph : bool (default=True)
whether or not to graph training loss
Attributes
view1_encoder_ (torch.nn.Module) the View1 embedding network as a PyTorch module view1_decoder_ (torch.nn.Module) the View1 decoding network as a PyTorch module view2_decoder_ (torch.nn.Module) the View2 decoding network as a PyTorch module Warns: In order to run SplitAE, pytorch and other certain optional dependencies
must be installed. See the installation page for details.
Notes
In this figure \(\textbf{x}\) is View1 and \(\textbf{y}\) is View2
Each encoder / decoder network is a fully connected neural net with paramater count equal to:
\[\left(\text{input_size} + \text{embed_size}\right) \cdot \text{hidden_size} + \sum_{1}^{\text{num_hidden_layers}1}\text{hidden_size}^2\]Where \(\text{input_size}\) is the number of features in View1 or View2.
The loss that is reduced via gradient descent is:
\[J = \left(p(f(\textbf{x}))  \textbf{x}\right)^2 + \left(q(f(\textbf{x}))  \textbf{y}\right)^2\]Where \(f\) is the encoder, \(p\) and \(q\) are the decoders, \(\textbf{x}\) is View1, and \(\textbf{y}\) is View2.
References
[9] Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. "On Deep MultiView Representation Learning.", ICML, 2015. For more extensive examples, see the
tutorials
for SplitAE in this documentation.
fit
(Xs, validation_Xs=None, y=None)[source]¶ Given two views, create and train the autoencoder.
Parameters: Xs : list of arraylikes or numpy.ndarray.
 Xs[0] is View1 and Xs[1] is View2
 Xs length: n_views, only 2 is currently supported for splitAE.
 Xs[i] shape: (n_samples, n_features_i)
validation_Xs : list of arraylikes or numpy.ndarray
optional validation data in the same shape of Xs. If
print_info=True
, then validation error, calculated with this data, will be printed as the network trains.y : ignored
Included for API compliance.

transform
(Xs)[source]¶ Transform the given view with the trained autoencoder. Provide a single view within a list.
Parameters: Xs : a list of exactly one arraylike, or an np.ndarray
Represents the View1 of some data. The array must have the same number of columns (features) as the View1 presented in the
fit(...)
step. Xs length: 1
 Xs[0] shape: (n_samples, n_features_0)
Returns: embedding : np.ndarray of shape (n_samples, embedding_size)
the embedding of the View1 data
view1_reconstructions : np.ndarray of shape (n_samples, n_features_0)
the reconstructed View1
view2_prediction : np.ndarray of shape (n_samples, n_features_1)
the predicted View2

DCCA Utilities¶

class
mvlearn.embed.
linear_cca
[source]¶ Implementation of linear CCA to act on the output of the deep networks in DCCA.
Consider two views \(X_1\) and \(X_2\). Canonical Correlation Analysis seeks to find vectors \(a_1\) and \(a_2\) to maximize the correlation between \(X_1 a_1\) and \(X_2 a_2\).
Attributes
w_ (list (length=2)) w[i] : ndarray List of the two weight matrices for projecting each view. m_ (list (length=2)) m[i] : ndarray List of the means of the data in each view. 
fit
(H1, H2, n_components)[source]¶ Fit the linear CCA model to the outputs of the deep network transformations on the two views of data.
Parameters: H1: ndarray, shape (n_samples, n_features)
View 1 data after deep network.
H2: ndarray, shape (n_samples, n_features)
View 2 data after deep network.
n_components : int (positive)
The output dimensionality of the CCA transformation.


class
mvlearn.embed.
cca_loss
(n_components, use_all_singular_values, device)[source]¶ An implementation of the loss function of linear CCA as introduced in the original paper for
DCCA
[5]. Details of how this loss is computed can be found in the paper or in the documentation forDCCA
.Parameters: n_components : int (positive)
The output dimensionality of the CCA transformation.
use_all_singular_values : boolean
Whether or not to use all the singular values in the loss calculation. If False, only use the top n_components singular values.
device : torch.device object
The torch device being used in DCCA.
Attributes
n_components_ (int (positive)) The output dimensionality of the CCA transformation. use_all_singular_values_ (boolean) Whether or not to use all the singular values in the loss calculation. If False, only use the top n_components
singular values.device_ (torch.device object) The torch device being used in DCCA.

class
mvlearn.embed.
MlpNet
(layer_sizes, input_size)[source]¶ Multilayer perceptron implementation for fully connected network. Used by
DCCA
for the fully transformation of a single view before linear CCA. Extends torch.nn.Module.Parameters: layer_sizes : list of ints
The sizes of the layers of the deep network applied to view 1 before CCA. For example, if the input dimensionality is 256, and there is one hidden layer with 1024 units and the output dimensionality is 100 before applying CCA, layer_sizes1=[1024, 100].
input_size : int (positive)
The dimensionality of the input vectors to the deep network.
Attributes
layers_ (torch.nn.ModuleList object) The layers in the network. 
forward
(x)[source]¶ Feed input forward through layers.
Parameters: x : torch.tensor
Input tensor to transform by the network.
Returns: x : torch.tensor
The output after being fed forward through network.

bfloat16
() → T¶ Casts all floating point parameters and buffers to
bfloat16
datatype. Returns:
 Module: self

parameters
(recurse: bool = True) → Iterator[torch.nn.parameter.Parameter]¶ Returns an iterator over module parameters.
This is typically passed to an optimizer.
 Args:
 recurse (bool): if True, then yields parameters of this module
 and all submodules. Otherwise, yields only parameters that are direct members of this module.
 Yields:
 Parameter: module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)

requires_grad_
(requires_grad: bool = True) → T¶ Change if autograd should record operations on parameters in this module.
This method sets the parameters'
requires_grad
attributes inplace.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
 Args:
 requires_grad (bool): whether autograd should record operations on
 parameters in this module. Default:
True
.
 Returns:
 Module: self


class
mvlearn.embed.
DeepPairedNetworks
(layer_sizes1, layer_sizes2, input_size1, input_size2, n_components, use_all_singular_values, device=device(type='cpu'))[source]¶ A pair of deep networks for operating on the two views of data. Consists of two
MlpNet
objects for transforming 2 views of data inDCCA
. Extends torch.nn.Module.Parameters: layer_sizes1 : list of ints
The sizes of the layers of the deep network applied to view 1 before CCA. For example, if the input dimensionality is 256, and there is one hidden layer with 1024 units and the output dimensionality is 100 before applying CCA, layer_sizes1=[1024, 100].
layer_sizes2 : list of ints
The sizes of the layers of the deep network applied to view 2 before CCA. Does not need to have the same hidden layer architecture as layer_sizes1, but the final dimensionality must be the same.
input_size1 : int (positive)
The dimensionality of the input vectors in view 1.
input_size2 : int (positive)
The dimensionality of the input vectors in view 2.
n_components : int (positive), default=2
The output dimensionality of the correlated projections. The deep network will transform the data to this size. If not specified, will be set to 2.
use_all_singular_values : boolean (default=False)
Whether or not to use all the singular values in the CCA computation to calculate the loss. If False, only the top
n_components
singular values are used.device : string, default='cpu'
The torch device for processing.
Attributes
model1_ ( MlpNet
object) Deep network for view 1 transformation.model2_ ( MlpNet
object) Deep network for view 2 transformation.loss_ ( cca_loss
object) Loss function for the 2 view DCCA.
forward
(x1, x2)[source]¶ Feed two views of data forward through the respective network.
Parameters: x1 : torch.tensor, shape=(batch_size, n_features)
View 1 data to transform.
x2 : torch.tensor, shape=(batch_size, n_features)
View 2 data to transform.
Returns: outputs : list, length=2
 outputs[i] : torch.tensor
List of the outputs from each view transformation.

bfloat16
() → T¶ Casts all floating point parameters and buffers to
bfloat16
datatype. Returns:
 Module: self

parameters
(recurse: bool = True) → Iterator[torch.nn.parameter.Parameter]¶ Returns an iterator over module parameters.
This is typically passed to an optimizer.
 Args:
 recurse (bool): if True, then yields parameters of this module
 and all submodules. Otherwise, yields only parameters that are direct members of this module.
 Yields:
 Parameter: module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)

requires_grad_
(requires_grad: bool = True) → T¶ Change if autograd should record operations on parameters in this module.
This method sets the parameters'
requires_grad
attributes inplace.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
 Args:
 requires_grad (bool): whether autograd should record operations on
 parameters in this module. Default:
True
.
 Returns:
 Module: self

Dimension Selection¶

mvlearn.embed.
select_dimension
(X, n_components=None, n_elbows=2, threshold=None, return_likelihoods=False)[source]¶ Generates profile likelihood from array based on Zhu and Godsie method [11]. Elbows correspond to the optimal embedding dimension.
Parameters: X : 1d or 2d arraylike
Input array generate profile likelihoods for. If 1darray, it should be sorted in decreasing order. If 2darray, shape should be (n_samples, n_features).
n_components : int, optional, default: None.
Number of components to embed. If None,
n_components = floor(log2(min(n_samples, n_features)))
. Ignored if X is 1darray.n_elbows : int, optional, default: 2.
Number of likelihood elbows to return. Must be > 1.
threshold : float, int, optional, default: None
If given, only consider the singular values that are > threshold. Must be >= 0.
return_likelihoods : bool, optional, default: False
If True, returns the all likelihoods associated with each elbow.
Returns: elbows : list
Elbows indicate subsequent optimal embedding dimensions. Number of elbows may be less than n_elbows if there are not enough singular values.
sing_vals : list
The singular values associated with each elbow.
likelihoods : list of arraylike
Array of likelihoods of the corresponding to each elbow. Only returned if return_likelihoods is True.
References
[10] Code from the https://github.com/neurodata/graspy package, reproduced and shared with permission. [11] Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918930.