YouTube, being the third-most popular website globally after Google and Facebook, takes a huge leap in terms of every small detail across its offering.

If we consider a past few months at YouTube, we can observe that the number of users and improvements outnumbered rapidly in this phase.

Not only viewers but also from the business point of view, it’s distinctly evident that YouTube’s been working on quite a lot of useful things — making a whole video watching experience even better, challenging and versatile for everyone.

Simplifying the viewer’s experience even further, YouTube has already begun improving video thumbnails with the help of deep neural nets. If you’re wondering why is that necessary or why now, let us cross-examine a few facts, which YouTube has also agreed upon over the period.

First and foremost, video thumbnail has become an integral part of the entire video watching experience. No, it’s not just limited to that, absolutely not. Indeed, it essentially begins even before we could play an actual video.

Psychologically, video thumbnail plays an important role in the decision making for viewers, because that’s the only thing viewers come across whenever they begin a search operation on the website.

It may sound less important, but it’s indeed what triggers me off as a viewer if I really want to play or skip that particular video. As a result, YouTube naturally got enough ground to cover so that it can attract even more visitors to its platform. So, YouTube has a valid reason that nobody can question.

Better thumbnails lead to more clicks and views for video creators — says YouTube

As we’ve already seen “what”, let’s now shift to “why.” According to YouTube, the service has done a great job by implementing deep neural networks (DNNs) in computer vision, i.e. image and video classification.

Following the trend, YouTube also enhanced its automatic “thumbnailer” for empowering content creators by living up to their expectations. Here’s how the process works:


If you look at the image carefully, YouTube explains how the frame sampling takes place at 1 FPS for every video that gets uploaded on the website. The video is then assigned a quality score, depending upon the predefined quality model. However, the quality model is something that ends up being the most difficult part of the process.

Finally, the frames with the highest scores are rendered as thumbnails alongside difference in sizes and aspect ratios. But have you wondered on what basis the scores are calculated or what defines the quality model/score?

Demystifying The Quality Model…

Here, YouTube simply divides the whole thing in two concepts — low-quality and high-quality examples. The task of judging a quality of the frames is just too subjective, and for human beings, individual perceptions rule the system. This is where the algorithmic approach kicks in.

YouTube agrees that auto-selecting video thumbnails (low-quality examples) is one of the challenging tasks for the engineering team, as collecting a large set of well-annotated training examples to feed into the neural network is essential.

However, YouTube has somehow simplified the process by empowering creators to opt-in for well-designed custom thumbnails. Such kind of user-defined thumbnails can be termed as high-quality examples. See the difference:


Old Algorithm vs New Algorithm

If you compare YouTube’s earlier mechanism of fetching thumbnails, you’ll come to know that DNN-powered mechanism produceses thumbnails with much better quality. If you don’t believe us, see the difference YouTube is trying to point out:


So next time you upload videos on YouTube, watch for the difference YouTube is trying to make in terms of offering better thumbnail representation for your viewers.