Abstract:
Dimensionality reduction is a widely studied field that is used to visualize data, cluster
samples, and extract insights from high-dimensional distributions. The classical approaches
such as PCA, Isomap, and Laplacian eigenmaps rely on clear optimization strategies while
more modern approaches such at tSNE and UMAP define gradient descent search spaces
through disparities between the high- and low-dimensional datasets.
In this work, we notice that all of these approaches can be interpreted as minimizing
the difference between two kernel functions – one for the high dimensional space and one
for the low dimensional space. In particular, once we abstract the kernel functions, we can
develop a common framework for any dimensionality reduction problem. Namely, one needs
to identify their high-dimensional distance kernel, the low-dimensional distance kernel, and
the method used for minimization.
With this in mind, we identify the relevant general framework and then proceed to
discuss the ways in which PCA, tSNE, and UMAP all fit into it. For each, we discuss
insights that were obtained during the process. We lastly highlight next steps and directions
for future work.