Face Recognition
In Divide and Conquer we defined the goal of face recognition as gaining an understanding of the similarity between images of faces. How can we imagine this? How can a software gain an understanding? As in other AI topics, a common approach to this is to reduce the input data to what is essential for the use case. For answering more concrete questions like "Do these faces belong to the same person?", we are actually not interested in the faces themselves, but in the abstract identity information contained in the face. As humans, we have a sense of this that is difficult to formalize. A typical approach from a more mathematical point of view is to encode the information as a many-dimensional vector. Such vectors have defined distances from each other. They can also be very similar in some dimensions, but completely different in others. These properties also apply to our intuition about faces: Two persons can look alike. They can also look very different, but have similarities in individual aspects, e.g., both have large eyes.
Thus, if we have a function that assigns vectors to faces, we can answer many questions about the identity of the persons depicted using the vectors. Now we have formalized the problem and have a clearer task. In addition, vectors have other advantages. They are more compact than images if we reduce the information to the essentials. We don't have to keep the actual photos to compare multiple faces and start the process from scratch each time. Instead, we can determine the identity encoded in an image once and work with it more efficiently and easily in the future.
Such vectors are typically called embeddings. This is not specific to Face Recognition. For example, in the field of Natural Language Processing, text input is converted into embeddings that encode the meaning of words and sentences. On the basis of the embeddings, sentences could be generated again - for example in another language.
Get started with an image that contains one or more human faces.
- or -
If you tried the demo above, you know what an embedding like this looks like. It is a vector with 512 entries, so in a 512-dimensional space. At first glance, this is a rather unintuitive way to visualize a person's identity. In the next step, we'll take a closer look at what we can do with embeddings and how they can help us represent the similarity of people more intuitively.
Try this yourself
If you're writing your own application, you can use the exact same face recognition mechanism that is used in the demo above from your own code. The following example uses C# as programming language and leverages the FaceAiSharp library.
- Install a recent version of .NET.
-
Create a new console app project by running this command in your favorite shell in an empty folder:
dotnet new console
-
Install two packages providing the relevant code and models:
dotnet add package Microsoft.ML.OnnxRuntime dotnet add package FaceAiSharp.Bundle
-
Replace the content of the
Program.cs
file with the following code:
using FaceAiSharp; using SixLabors.ImageSharp; using SixLabors.ImageSharp.PixelFormats; using var hc = new HttpClient(); var groupPhoto = await hc.GetByteArrayAsync( "https://raw.githubusercontent.com/georg-jung/FaceAiSharp/master/examples/obama_family.jpg"); var img = Image.Load<Rgb24>(groupPhoto); var det = FaceAiSharpBundleFactory.CreateFaceDetectorWithLandmarks(); var rec = FaceAiSharpBundleFactory.CreateFaceEmbeddingsGenerator(); var face = det.DetectFaces(img).First(); rec.AlignFaceUsingLandmarks(img, face.Landmarks!); var embedding = rec.GenerateEmbedding(img); Console.WriteLine($"Generated an embedding for one face:\nn{string.Join("\t", embedding)}");
-
Run the program you just created:
You should see output similar to:dotnet run
Generated an embedding for one face: -0,029764613 0,13904491 -0,017976629 0,010621089 0,015310788 -0,077746354 0,026660273 -0,03839961 -0,06403088 0,023788495 -0,1461524 -0,010910285 -0,001902924 -0,07354331 -0,06590692 0,09289711 0,0023482481 0,035271123 <... more numbers between -1 and 1 ...>