Comparing Faces
We now know that face recognition allows us to represent faces as vectors. The photos are likely to be of the same person if these vectors are similar. So we no longer have a task that seems to be intractable and that is best solved by artificial intelligence. The remaining question, how similar two vectors are, can be considered in a more classical mathematical way. In the previous step, we represented the vectors graphically. Intuitively, we could easily see which points belong to the same person because they are close to each other. This intuitive proximity is called Euclidean distance.
However, there are other distance measures that we can use to judge the similarity of two vectors. For the face recognition model we are using, ArcFace, the Euclidean distance also works (as we saw in the previous step), but we can get slightly better results by using the cosine similarity instead. Cosine similarity is the dot product of two normalized vectors. A vector is normalized if it has a length of 1 regardless of its direction. Our face embeddings are already normalized. Intuitively, cosine similarity can be thought of as a measure of how much two vectors point in the same direction. If two vectors are equal we get 1, if they are orthogonal we get 0, if they are exactly opposite we get -1.
So which value means what? In our real world we can move in three independent directions - it has three dimensions. Let's remember that our embeddings have 512 dimensions. This means that these vectors can also be orthogonal to each other in 512 independent ways. You may have noticed above that two exactly equal faces give a value of 1, since their embedding vectors are also equal. Accordingly, we get a number close to 1 if the AI considers the faces to be similar. If we look at two faces of different people, the result is likely to be close to 0. In order to get close to -1, the faces or the embeddings would have to be exactly opposite in as many of the 512 dimensions as possible. Even if we do not have a simple intuitive understanding of what this means, it becomes clear that this is unlikely.
Now we have a single number and we know what it means. But we still don't have a yes or no answer to the question of whether two images are of the same person. We could just use an arbitrary threshold, say 0.5. But in practice, it turns out that we get much better results if we experimentally determine the optimal threshold. For all the steps we perform here in the tutorial - i.e., our face recognition pipeline - and the widely used LFW test dataset, the optimal threshold is 0.42. If we consider all pairs of faces above 0.42 to be the same and the others to be different, we make the right decision for most pairs.
It is exciting to realize that we can change much of the character of our decisions if we do not consider the number merely as an optimal fixed limit. Instead, we can choose other values if we want to avoid certain mistakes more than others. When it comes to unlocking our smartphone, we may prefer to accept that facial recognition unlocking may not work every time, rather than having a stranger unlock our phone. In this case, we can use a value much higher than 0.42, for example 0.6. In other cases it is desirable to rather get too many hits. If we want to find as many photos of a certain person as possible on a smartphone or computer, we may prefer to find a few photos that actually show someone else than to miss real hits.
Please try the following demo to see all of the steps we learned about at work. It finally enables us to decide if two photos show the faces of the same person.
Get started with an image that contains one or more human faces.
- or -
Try this yourself
If you're writing your own application, you can use the exact same face recognition mechanism that is used in the demo above from your own code. The following example uses C# as programming language and leverages the FaceAiSharp library.
- Install a recent version of .NET.
-
Create a new console app project by running this command in your favorite shell in an empty folder:
dotnet new console
-
Install two packages providing the relevant code and models:
dotnet add package Microsoft.ML.OnnxRuntime dotnet add package FaceAiSharp.Bundle
-
Replace the content of the
Program.cs
file with the following code:
using FaceAiSharp; using SixLabors.ImageSharp; using SixLabors.ImageSharp.PixelFormats; using var hc = new HttpClient(); var groupPhoto = await hc.GetByteArrayAsync( "https://raw.githubusercontent.com/georg-jung/FaceAiSharp/master/examples/obama_family.jpg"); var img = Image.Load<Rgb24>(groupPhoto); var det = FaceAiSharpBundleFactory.CreateFaceDetectorWithLandmarks(); var rec = FaceAiSharpBundleFactory.CreateFaceEmbeddingsGenerator(); var faces = det.DetectFaces(img); var first = faces.First(); var second = faces.Skip(1).First(); // AlignFaceUsingLandmarks is an in-place operation so we need to create a clone of img first var secondImg = img.Clone(); rec.AlignFaceUsingLandmarks(img, first.Landmarks!); rec.AlignFaceUsingLandmarks(secondImg, second.Landmarks!); var embedding1 = rec.GenerateEmbedding(img); var embedding2 = rec.GenerateEmbedding(secondImg); var dot = FaceAiSharp.Extensions.GeometryExtensions.Dot(embedding1, embedding2); Console.WriteLine($"Dot product: {dot}"); if (dot >= 0.42) { Console.WriteLine("Assessment: Both pictures show the same person."); } else if (dot > 0.28 && dot < 0.42) { Console.WriteLine("Assessment: Hard to tell if the pictures show the same person."); } else if (dot <= 0.28) { Console.WriteLine("Assessment: These are two different people."); }
-
Run the program you just created:
You should see output similar to:dotnet run
Dot product: 0,21705234 Assessment: These are two different people.
Congratulations, you have completed this tutorial! You now know how face detection and recognition work together and how to use them in your own applications.