Mise-en-scène is a theatre and film term that refers to the staging of the action. This involves everything from the set design to where the actors appear on the stage or in the frame. Both theatre directors and film directors go to enormous lengths to ensure that everything about their mise-en-scène is perfect at all times.

Some directors were so obsessed with this attention to detail that their names have become synonymous with this. Akira Kurosawa, Stanley Kubrick, Wes Anderson, and Hou Hsiao-Hsien are just a few of the most famous examples of directors who were considered to be obsessive to the point that they were regarded as hard to work with, at least in some cases that is.

Kubrick, for example, famously had the same bit of background wall painted and repainted numerous times for his film Eyes Wide Shut (1999) until he was happy with it. All this despite the fact that the wall appeared in shot for less than a few seconds.

The reason directors go to such lengths is that how the action is staged and presented on screen is vitally important if they plan to get the precise audience reaction they desire.

In a stark example of this, just consider the grave scene in Edward D. Wood Jr.’s Plan 9 from Outer Space where, due to his enormous size, actor Tor Johnson is unable to climb out from the grave. Rather than scaring the audience as intended, Wood’s poor attention to detail meant that the scene is viewed as humorous by audiences, something which was not the director’s original intention.

Naturally, Plan 9 From Outer Space tanked at the box office, in part due to its terrible mise-en-scène.

This single example highlights the importance of getting it right, and why so many top directors spend so much effort perfecting their mise-en-scène.

The Struggle of AI to Understand Mise-en-scène

Current AI systems really don’t have much in the way of understanding how different objects relate to each other in human terms.

Aside from examples of facial recognition software, which are only trained to recognize faces, very few AI systems have learned anything in the way of how objects relate to each other.

This fact represents an enormous inhibitor to all kinds of AI systems including those used by AI-assisted moviemaking companies. We, humans, derive an enormous amount of insights from objects and how they are related to one another, so understanding the how and why is crucial.

At the current time, AI systems are unable to relate to this and therefore are missing out on vast pools of data that could be used to improve the value and accuracy of the insights that they provide.

This means one simple thing, AI-assisted filmmaking companies are not providing their customers with as accurate insights and as many powerful tools as they could be.

All this is about to change.

An AI that Understands Object Relationships

Before we continue, it is worth noting that the above point was more for dramatic effect than accuracy. You only need to look at the statistics to see that many of the leading AI-assisted movie companies are already achieving accuracy levels of 70% +, some as high as 86%.

The point I was making above is that as they unlock more technological developments, they can and will be able to do much, much better.

A recent article that just appeared in MIT News concerned one of the technological advancements that is set to be a game-changer when it comes to AI object relationship understanding.

The article detailed the news that a group of MIT researchers has successfully “developed a model that understands the underlying relationships between objects in a scene.”

The researchers began by training the system to understand individual objects and then how a group of objects is interrelated, and how together, they describe the overall scene.

Using text descriptions to describe each object, then by constantly rearranging them, gradually they were able to teach the AI to understand the relationships different objects have to one another.

This allowed the system to understand the object and the subtleties of how it relates to other items. With this information, the AI can build a wider picture of a scene in general, and eventually, gain a good understanding of the overall scene and how individual objects and people are affecting it. (https://skinnyninjamom.com/)

While this is a painstaking and very time-consuming process, AI systems are tireless and fast learners. Added to this, they can do much of the learning for themselves, which means that, in no time at all, AI engineers and data scientists will be able to train AIs to understand most kinds of mise-en-scènes that we see onscreen.

How AIs that Understand Mise-en-scène will Change Things

It goes without saying that an AI system that is able to understand not just dialogue, genre information, characterization, actor audience appeal, etc, but much more of the ‘why’ behind films is going to be revolutionary.

Such a system would obviously boost the accuracy of the results current AI film tools are able to provide. Genre recipe tools, for example, would become more accurate as they would be able to analyze the mise-en-scène and understand in much more detail how much horror a scene displays, etc., by the relationship of the objects in the scene.

So, for example, if there are lots of knives, torture implements, blood, etc., this is likely to lead the system to identify the suspense or danger elements normally associated with high-intensity horror or thriller, etc.

Added to this would be a whole new array of powerful tools that will be able to help filmmakers to understand the effect that their staging choices will have on audiences.

One example could be a specific AI-staging of the action tool that provides directors with feedback on the effectiveness of specific objects in a scene, including their placement, and includes suggestions as to any additional items to add/remove to increase the desired effect on audiences.

Such a tool could also help directors plan out a scene in terms of where actors need to stand and where they can move to maximize the effect that is desired. In a crude example, the system might provide feedback suggesting two characters who are secretly in love with each other might stand between an object such as a table to increase the sense of tension, or vice versa, etc.

Once AI systems are trained to master human-level understanding of object relationships then the sky is the limit as to how many exciting new tools can be developed to assist in the filmmaking process.

Now that this team of MIT AI researchers has opened the door, it is only a matter of time before they are in the hands of the directors who will benefit from these powerful new AI tools and the accurate insights that they provide.