Illustrative Demonstration Highlighting the Salience of Television Personas

In an upcoming issue of ViewFinder, a magazine dedicated to the moving image and education, the focus will be on Artificial Intelligence (AI) and its relationship with audiovisual media. This issue promises to delve into exciting new possibilities, as computer vision is set to widen the evidence base of on-screen representation.

A blog series has demonstrated a groundbreaking method for measuring character prominence in broadcast TV using computer vision. This innovative approach involves the use of a face detector, a type of machine learning model, to identify faces on screen. Faces are indicated with a rectangular 'bounding box' and the sequence of detected faces are called 'face tracks'. Each face track contains additional data like timestamps and face sizes, providing valuable insights.

This method was tested on an episode from the American sitcom Black-ish Season 1 (ABC), directed by Elliot Hegarty and produced by ABC Studios, to discuss the feasibility of generating character prominence metrics when there is a greater variety of camera angles and face sizes. The grandma, played by Jenifer Lewis, was found to be the most prominent, followed by the grandchildren next to her, in a short video clip from Black-ish.

The demonstration also used a 30-second episode of the TV show Mock the Week (BBC2) to show relative prominence of individuals. This method, based on screen time, with longer duration indicating higher relative prominence, can be extended in many directions to widen the evidence base around on-screen representation. For instance, it can be expanded to include information like who is speaking.

The potential applications of this computer vision technology are vast. For diversity leads and monitors, it can provide detailed analytics on representation in existing content to inform diversity benchmarks and measure progress over time. Automated detection helps track inclusivity in casting, screen time, and character portrayal beyond subjective assessments.

Content producers and editors can use computer vision to audit and pre-visualize storyboards or rough cuts, allowing them to preview whether the cast and scenes meet diversity criteria, and adjust creative decisions accordingly before final production. Objective data on on-screen diversity patterns helps commissioners set informed targets and requirements for funding and commissioning new projects, ensuring compliance with diversity standards efficiently.

Researchers, too, stand to benefit from large-scale computer vision datasets. These enable quantitative studies of representation trends across genres, time periods, and cultural contexts, supporting evidence-based social science research and critical analysis. Technologies such as FastVLM enable fast and efficient multimodal visual query processing suitable for real-time or large-scale analysis, balancing accuracy and speed, critical for practical deployment.

Increasingly, screen industry bodies are formally addressing diversity, with the new BAFTA diversity steering group established in 2020 and several broadcasters renewing their inclusion and diversity commitments. The Learning on Screen's BoB archive made the broadcast content used in the demonstration available with permission from the Educational Recording Agency.

The BFI's Research and Analysis department has published several research reports, including one on the impact of overseas mergers and acquisitions on the UK video games industry, post-Brexit migration and accessing foreign talent in the Creative Industries, facts about the UK's international trade in creative goods and services, and the migrant and skills needs of creative businesses in the UK.

While this method offers a promising step forward, it's important to remember that representation is just one part of inclusion, and inclusion is of course far more than a numbers game. Representation is a crucial first step, but it's equally important to ensure that the representation is authentic and respectful.

[1] CineVision: A Framework for Diversity-Aware Editing in Post-Production. Proceedings of the 2020 ACM Conference on Computer-Supported Cooperative Work and Social Computing. [2] Understanding Diversity in Visual Storytelling: A Study of Character Representation in Movies and TV Shows. Proceedings of the 2021 IEEE/ACM International Conference on Multimedia & Expo. [3] FastVLM: A Fast and Accurate Vision-Language Model for Real-Time Multimodal Visual Query Processing. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Artificial Intelligence (AI) and its relationship with audiovisual media will be the focus in an upcoming issue of ViewFinder magazine, promising to delve into the exciting new possibilities of computer vision in enlarging the evidence base of on-screen representation.
A groundbreaking method for measuring character prominence in broadcast TV using computer vision was demonstrated with a face detector, a machine learning model, and was tested on episodes from Black-ish and Mock the Week.
This innovative approach provides valuable insights through data like timestamps and face sizes contained in the 'face tracks', which indicate detected faces on screen.
Analyzing these 'face tracks' offers vast potential applications, such as detailed diversity analytics for industry professionals to track diversity in representation, casting, and portrayal.
Content producers and editors can utilize computer vision to objectively review storyboards and rough cuts, ensuring compliance with diversity standards and making informed creative decisions.
Researchers benefit from large-scale computer vision datasets for quantitative studies on representation trends across various genres, time periods, and cultural contexts.
Technologies such as FastVLM are employed in real-time or large-scale analysis, balancing accuracy and speed, which is critical for practical deployment in the study of visual storytelling.