Dimensionality Reduction – Visualize Embeddings
As humans, when we want to think of data that exists as vectors, we tend to represent the vectors graphically in a coordinate system. This is often useful because we have a spatial imagination and can get a much better intuition for the data than if we look at the numerical values. We can quickly grasp how far apart individual points are and, if necessary, recognize patterns of related points. In the last step, we took the understanding of face recognition models of faces as vectors. Unfortunately, these vectors have 512 dimensions and we cannot simply draw such a high dimensional vector into a coordinate system and visualize it. To get an intuition of the data, we first need to reduce the number of dimensions.
Unfortunately, dimensionality reduction is not a trivial task. Each dimension contains individual information, and we cannot determine in advance that some dimensions contain more important information than others. We can imagine this as if an object is illuminated by the sun and casts a shadow. The objects in our world are three dimensional, but the shadows they cast are only two dimensional. Several different objects can cast the same shadow. Nevertheless, when we look at the shadow, we can often guess what kind of object it is. When we look at two shadows, we can often get an idea of which of the two associated objects is larger, or how they relate to each other in space.
If we look at the whole thing a little more mathematically, there are countless ways to reduce the number of dimensions. Let's consider rectangles (only those whose legs are parallel to the coordinate axes): They are two-dimensional and have a length in the X-direction and a length in the Y-direction. Depending on what we are interested in, there are different ways to reduce such rectangles to one dimension. We can use the perimeter or the area if we are interested in the size. Likewise, the ratio of the two lengths is an option if, for example, the similarity to a square is more interesting for our use case than the absolute size.
Similarly, there are several methods to visualize points in a 512-dimensional space in two or three dimensions. Each of these methods has its own strengths and weaknesses, and there is no single best option. In the following, we will briefly touch on some of the methods and give you the opportunity to try them out and compare the results.
PCA, or Principal Component Analysis, is a widely used method in data analysis that aims to reduce the dimensionality of a data set while preserving its variance. It does this by projecting the points into a low-dimensional space. The projection is constructed so that the points are distributed as differently as possible in the resulting space. In this way, as much information as possible from the higher dimensions can be represented for us. On average, points that are close together in 512-dimensional space are also close together in the projection. However, this is not true for all concrete examples; there are outliers. PCA assumes that the relationships between the dimensions are linear. It is the only deterministic method we discuss here - if the inputs remain the same between two runs, the outputs will also remain the same. It's output space could be used as input for other algorithms, such as clustering.
MDS, or Multidimensional Scaling, is a classic dimensionality reduction technique that constructs a low-dimensional representation of the data that preserves the pairwise distances between data points in high-dimensional space. MDS is a simple and interpretable method that can be useful for identifying clusters and patterns in high-dimensional data. MDS can suffer from the "curse of dimensionality," where the number of dimensions required to preserve the distances between data points grows exponentially with the size of the data set. The distance-preserving nature of MDS makes it an interesting choice for visualizing our face embeddings, as our human intuition of similarity in the reduced space closely matches the similarity in the original space. As we add more points, it becomes increasingly difficult to find a low-dimensional representation that preserves the exact distances between all points. Thus, we reach the limits of the usefulness of MDS for dimensionality reduction of our 512-dimensional face embeddings quite early. We can use a measure called stress to asses how closely the distances in the reduced space match the original ones. While 0 would be a perfect fit, the quality starts to decrease considerably at values above 0.2. This doesn't mean that a visualization with a stress value of, say, 10 can't be useful - but you should be aware of misplaced points.
t-SNE, or t-Distributed Stochastic Neighbor Embedding, and UMAP, or Uniform Manifold Approximation and Projection, are more recent methods for visualizing high-dimensional data. Both construct a low-dimensional representation of the data that preserves both the local and global structure of the high-dimensional space, and can also work with high-dimensional datasets with nonlinear relationships between features. While t-SNE becomes quite computationally intensive as the number of points increases, the newer UMAP is more capable of handling large datasets. Both have a hyperparameter called perplexity, or number of neighbors, which requires some tuning to get the most helpful visualizations. A high perplexity value results in a more global view of the data, while a low perplexity value focuses more on local relationships.
Try the Live Demo to give these algorithms a try. If you're not satisfied with the result of any of the algorithms except PCA, try to recalculate as they are stochastic. You can also try changing the perplexity for t-SNE and UMAP. Don't skip t-SNE if you've added a lot of points - while it takes a second to compute, it can visualize clusters really well in our demo.
Get started with images that contain one or more human faces.
- or -
Please add at least three faces to get a meaningful visualization.
Now we have an idea of how we can imagine the understanding of faces by modern AI methods. The final step is to be able to answer our initial question, "Do these two pictures show the same person?"