Since we only require the top 16 singular vectors of a dataset with size and , the computation time is less than 1s: Note: with the optional parameter ) has the disadvantage that the components extracted by this method have exclusively dense expressions, i.e.

bsd update and kernel source updating-44bsd update and kernel source updating-61bsd update and kernel source updating-57

When truncated SVD is applied to term-document matrices (as returned by ), this transformation is known as latent semantic analysis (LSA), because it transforms such matrices to a “semantic” space of low dimensionality.

In particular, LSA is known to combat the effects of synonymy and polysemy (both of which roughly mean there are multiple meanings per word), which cause term-document matrices to be overly sparse and exhibit poor similarity under measures such as cosine similarity.

Furthermore we know that the intrinsic dimensionality of the data is much lower than 4096 since all pictures of human faces look somewhat alike.

The samples lie on a manifold of much lower dimension (say around 200 for instance).

For instance, the following shows 16 sample portraits (centered around 0.0) from the Olivetti dataset.

On the right hand side are the first 16 singular vectors reshaped as portraits.

There exist sparsity-inducing norms that take into account adjacency and different kinds of structure; see [Jen09] for a review of such methods.

For more details on how to use Sparse PCA, see the Examples section, below.

divides the data into mini-batches and optimizes in an online manner by cycling over the mini-batches for the specified number of iterations.