Deep Learning is a technology that never fails to excite both industry professionals and people outside the field. The idea of something learning all by itself is fascinating enough to leave folks wondering if there’s a limit to its capabilities.
Unfortunately, like with all things, Deep Learning can’t be the solution to every problem if we take a closer look.
So, if you’re a streaming platform, a TV channel, or any other media company trying to automate celebrity face recognition in video content — I’m here to tell you that there are better ways to do that.
First things first: why would you even want to automate celebrity face recognition?
Companies dealing with video content production and distribution can extract a ton of value from here.
For instance, celebrity face metadata can help video companies categorize and manage all their content in a more convenient manner. The software that can automate celebrity face recognition will make it easy to find every video in which a certain celebrity appears.
It becomes just a single keyword search away.
There are also benefits for the audience as well — celebrity face recognition enables a better viewing experience.
With that data, the streaming service or any other video platform can filter its content by any particular celebrity or cast member. Thus, making it easier for the viewers to discover new content with familiar cast members.
Another significant feature that celebrity face recognition technology enables is cast members data. A viewer hits pause and gets the names of everybody on the screen at that moment. Along with that, they can see the actors’ names, names of the characters they play, and any other useful information.
Now when that’s settled, let’s talk about how you can have the process of celebrity face recognition automated.
Coming across the problem of recognizing faces, it seems like Deep Learning fits the bill pretty well.
After all, neural networks have proven to be better than humans at recognizing faces in images and video. The difference between the capabilities of machines and humans becomes even more apparent on a larger scale.
The way a neural network does so is by describing the recognized faces as vectors, and using the acquired data as the base for face recognition in other scenes of the footage.
Let’s take a look at the process of face recognition with the help of the neural network.
As I mentioned before, it all starts with a neural network that analyzes the video content and presents each face it catches as a vector with a descriptor. The descriptor carries the data on the unique features of each face that the network can use to recognize them down the line.
And… That’s it.
But we need more.
The process I have just described works great at finding out who’s who in an image or on a video.
But if we are talking about practical video content analysis, in most cases we are not analyzing simple videos. For many companies, that content includes movies, sports footage, TV series, news broadcasts, talk shows, and so on.
That’s where Deep Learning ceases as the best fitting solution.
The thing is that all of those types of content feature a lot of faces — too many faces, to be honest. There are extras, audience members, and other participants that are not important.
The importance of any particular face is what Deep Learning cannot differentiate. Deciding between the main and secondary characters in a movie or a TV show is beyond its abilities.
But I think we all can agree that the information is crucial. “Movies with Chris Hemsworth” is a better defined category than “movies with an extra #3”.
But how can you make sure that the technology understands the context of the analyzed video content to recognize main and secondary characters?
So, more sophisticated video content analysis demands a celebrity face recognition that’s more complicated than a narrow approach used by Deep Learning “look at the thing — remember the thing — recognize the thing”.
After all, we are striving to automate the entire process of video analysis.
So, we need the software to make the decision: identify all the faces in the footage and distinguish between main and secondary characters.
To achieve that, we supplement the decision factor with some tech magic so that celebrity face recognition could be performed in relevance to the context.
That magic includes mathematical algorithms and Machine Learning and does the following:
- Makes and ID of all the characters appearing in the video;
- Analyzes the plot to distinguish main, secondary, and other characters;
- Tags the frame the represents each main, secondary, and other character;
- Finds each and every scene where the characters appear;
- Chooses the most representative scene for a particular character;
- Optionally generates the movie poster with a chosen celebrity, catering to different audiences.
And that’s it. Here, Deep Learning can only handle the narrow portion of face detection and recognition.
The decision portion — like identifying characters relevant to the story, choosing representative scenes and whatnot — can only be performed by more complex tech.
Celebrity face recognition technology offers serious benefits to media companies dealing with lots of video content.
It can facilitate content library management by enabling categorization on cast members basis, and even offer a better viewing experience to the audience, allowing them to easily explore the content starring their favorite celebrity.
And, of course, it helps them remember the name of the actor they have totally not forgotten.
I don’t think we need to spell out the demand for automation of such a process: manually labeling every piece of content within the extensive library will take an unfathomable amount of time.
So, we turn to technology to do the heavy lifting for us. And Deep Learning is the first thing that comes to mind for many people.
But the thing is that DL lacks the ability to understand what it recognizes. So, it can’t automate the process of celebrity face recognition completely, since it can’t tell the difference between primary and secondary characters. Labeling every face would be inefficient.
And that’s why you need to have a more complex cocktail of technology to tackle celebrity face recognition. Let the neural networks do their neural networking, and bring in more sophisticated algorithms to dive deeper into the context of analyzed footage, and deliver better results.
That’s how you get “movies with Chris Hemsworth” over “movies with an extra #3”.